Martin Peters

Data, python and visualisation enthusiast. Keen cyclist. Husband and father of three children.

Home

PostGIS Basics

[ postgis  postgres  ireland  aws  ]

Published Feb 02, 2020

In my previous blog posts (geospatial analysis in Ireland and AWS CDK Setup) I introduced some geospatial analysis concepts and how to set up a spatial aware database in the AWS cloud. Here I’ll load and use some spatial data and introduce some of the functionality within Postgis. This will lead up to be being able to answer questions like ‘How many people live within 5 minutes of a Luas stop?’. (Note: Luas is a light rail network in Dublin, Ireland).

Logging into the Aurora database (provisioned previously) and running the following will add the postgis and pgrouting (I’ll use this in a later post to describe how we can route between locations) extensions:

SHOW rds.extensions;
CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;
SELECT PostGIS_full_version();

Postgis

Postgis is a spatial database extender for the PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL.

On macOS it is convenient to install some CLI utilities to interact with postgres and postgis.

brew install postgresql
brew install postgis
brew install osm2pgrouting
brew install osmfilter
brew install gdal

Let’s source and load some spatial data into the database (I’ll dedicate a future post to a more complete set of spatial datasets). Again I’ll use some Irish datasets.

Airports

This is a simple point dataset of the main airports on the island of Ireland. This was generated by hand using Wikipedia (RoI) and Wikipedia (NI) and Google Maps to find the coordinates.

ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "airports.geojson" -nln public.airports

Let’s select some data to see the fields stored:

SELECT
  ogc_fid, name, iata_code, marker_size, marker_color, 
  marker_symbol, ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.airports
WHERE 
  ogc_fid = 3;
Name Value
ogc_fid 3
name Dublin Airport
iata_code DUB
marker_size Medium
marker_color #FF0000
marker_symbol airport
geojson {"type":"Point","coordinates":[-6.24418258666992,53.4269236916967]}

Let’s use some postgis functions to understand the spatial data stored:

--   ST_GeometryType(geometry) returns the type of the geometry
--   ST_NDims(geometry) returns the number of dimensions of the geometry
--   ST_SRID(geometry) returns the spatial reference identifier number of the geometry
--   ST_X(geometry) returns the X ordinate
--   ST_Y(geometry) returns the Y ordinate
SELECT
  name,
  ST_GeometryType(wkb_geometry),
  ST_NDims(wkb_geometry),
  ST_SRID(wkb_geometry),
  ST_X(wkb_geometry),
  ST_Y(wkb_geometry),
  ST_AsText(wkb_geometry)
FROM public.airports
WHERE ogc_fid = 3;
Name Value
name Dublin Airport
st_geometrytype ST_Point
st_ndims 2
st_srid 4326
st_x -6.244182586669922
st_y 53.426923691696736
st_astext POINT(-6.24418258666992 53.4269236916967)

4326 is the most common Spatial Reference System Identifier (SRID) for geographic coordinates. It is called the World Geodetic System and is used by the Global Positioning System (GPS). More on projections can be found here.

To create geometries in Postgis there are a few options. Using maps.ie I grabbed an example location: Newbridge, County Kildare (latitude: 53.180124; longitude: -6.7982137).

-- Using ST_GeomFromText with the SRID parameter
SELECT ST_GeomFromText('POINT(-6.7982137 53.180124)', 4326);

-- Using ST_GeomFromText without the SRID parameter
SELECT ST_SetSRID(ST_GeomFromText('POINT(-6.7982137 53.180124)'), 4326);

-- Using a ST_Make* function
SELECT ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326);

-- Using PostgreSQL casting syntax and ISO WKT
SELECT ST_SetSRID('POINT(-6.7982137 53.180124)'::geometry, 4326);

-- Using PostgreSQL casting syntax and extended WKT
SELECT 'SRID=4326;POINT(-6.7982137 53.180124)'::geometry;

Comparing geometry points in Postgis is carried out using the ST_Equals function:

-- https://postgis.net/workshops/postgis-intro/spatial_relationships.html
-- ST_Equals returns TRUE if two geometries of the same type have identical x,y coordinate values, 
-- i.e. if the second shape is equal (identical) to the first shape.
SELECT 
  name
FROM public.airports
WHERE
  ST_Equals(
    wkb_geometry,
    ST_SetSRID(ST_MakePoint(-8.493798524458427, 51.84173815), 4326)
  );

There’s a number of ways to determine the distance between two points in Postgis: ST_Distance, ST_Distance_Sphere and ST_Distance_Spheroid. The method you select depends on your speed and accuracy requirements. More detailed information can be found here. The example below determines the closest airport to Newbridge, Co. Kildare.

-- ST_Distance - Returns the minimum 2D Cartesian (planar) distance between two geometries
-- 
-- ST_Distance_Sphere — Returns minimum distance in meters between two lon/lat geometries. 
-- Uses a spherical earth and radius of 6370986 meters. Faster than ST_Distance_Spheroid, 
-- but less accurate.
--
-- ST_Distance_Spheroid — Returns the minimum distance between two lon/lat geometries given a particular spheroid. 
SELECT
  name,
  ROUND((ST_Distance(
      wkb_geometry::geography, 
      ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326)::geography)/1000.0)::NUMERIC, 2) AS nearest_airport_km,
  ROUND((ST_Distance_Sphere(
      wkb_geometry, 
      ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326))/1000.0)::NUMERIC, 2) AS nearest_airport_km_v2,
  ROUND((ST_Distance_Spheroid(
      wkb_geometry,
      ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326),
      'SPHEROID["WGS 84",6378137,298.257223563]')/1000.0)::NUMERIC, 2) AS nearest_airport_km_v3
FROM airports
ORDER BY
  ST_Distance(
    wkb_geometry::geography,
    ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326)::geography
  ) ASC;

Settlements

The Settlements dataset is a polygon set describing where the urban centres are located in the Republic of Ireland. I downloaded Settlements_OSI_Generalised_100M.geojson file from the data.gov.ie site.

export DB_DETAILS="host=localhost user=admin_user dbname=slsdb password=<db password> port=5432"
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "Settlements_OSI_Generalised_100M.geojson" -nln public.settlements
SELECT
  ogc_fid, settlement, settl_name, guid,
  ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.settlements
WHERE
  settl_name = 'Sligo';
Name Value
ogc_fid 5
settlement 31004
settl_name Sligo
guid 899C172F-D7B7-4EE1-A17B-C773AE325147
geojson {"type":"Polygon","coordinates":[[[-8.46,54.28],...,[-8.46,54.28]]]}

The following functions are useful to describe the geometries stored in your database and how you’ve various options to export the data.

--   ST_Area(geometry) returns the area of the polygons
--   ST_NRings(geometry) returns the number of rings (usually 1, more of there are holes)
--   ST_ExteriorRing(geometry) returns the outer ring as a linestring
--   ST_InteriorRingN(geometry,n) returns a specified interior ring as a linestring
--   ST_Perimeter(geometry) returns the length of all the rings
--   ST_NumGeometries(geometry) returns the number of parts in the collection
--   ST_GeometryN(geometry,n) returns the specified part
--   ST_Area(geometry) returns the total area of all polygonal parts
--   ST_Length(geometry) returns the total length of all linear parts
--   ST_AsEWKT(geometry) returns Well-known text (WKT)
--   ST_AsEWKB(geometry) returns Well-known binary (WKB)
--   ST_AsGML(geometry) returns Geographic Mark-up Language (GML)
--   ST_AsKML(geometry) returns Keyhole Mark-up Language (KML)
--   ST_AsGeoJSON(geometry) returns GeoJSON
--   ST_AsSVG(geometry) returns Scalable Vector Graphics (SVG)
SELECT
  settlement,
  settl_name,
  ST_GeometryType(wkb_geometry),
  ST_Area(wkb_geometry),
  ST_NRings(wkb_geometry),
  ST_ExteriorRing(wkb_geometry),
  ST_InteriorRingN(wkb_geometry, 1),
  ST_Perimeter(wkb_geometry),
  ST_NumGeometries(wkb_geometry),
  ST_GeometryN(wkb_geometry,1),
  ST_Area(wkb_geometry),
  ST_Length(wkb_geometry),
  ST_AsText(wkb_geometry),
  ST_AsEWKT(wkb_geometry),
  ST_AsEWKB(wkb_geometry),
  ST_AsGML(wkb_geometry),
  ST_AsKML(wkb_geometry),
  ST_AsGeoJSON(wkb_geometry),
  ST_AsSVG(wkb_geometry)
FROM settlements
WHERE
  settl_name IN ('Gorey', 'Galway city and suburbs');
Name
settlement 14011 35005
settl_name Gorey Galway city and suburbs
st_geometrytype ST_Polygon ST_MultiPolygon
st_area 0.0007077848937365223 0.00729630548942928
st_nrings 1 7
st_exteriorrings 01020....3F5A62F2319C05DF082259C574A40
st_interiorring
st_perimeter 0.1588223355751195 0.8207814038117153
st_numgeometries 1 7
st_geometryn 0103000020E610.. 0103000020E610...
st_area 0.0007077 0.0072963
st_length 0 0
st_astext POLYGON((-6.28 52.68,...)) MULTIPOLYGON(((-9.05 53.25,...)))
st_asewkt SRID=4326;POLYGON((-6.28 52.68,...)) SRID=4326;MULTIPOLYGON(((-9.05 53.25,...))))
st_asewkb E'\\x0103000020E610000... E'\\x0106000020E610000...
st_asgml <gml:Polygon srsName="EPSG:4326"><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>-6.28,52.68... <gml:MultiPolygon srsName="EPSG:4326"><gml:polygonMember><gml:Polygon><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>-9.05,53.25...
st_askml <Polygon><outerBoundaryIs><LinearRing><coordinates>-6.28,52.68 <MultiGeometry><Polygon><outerBoundaryIs><LinearRing><coordinates>-9.05,53.25
st_asgeojson {"type":"Polygon","coordinates":[[[-6.28,52.68],...]]]} {"type":"MultiPolygon","coordinates":[[[[-9.05,53.25]]]]}
st_assvg M -6.28 -52.68 L ...Z M -9.05 -53.25 L ...Z

Some additional spatial relationships functions include ST_Intersects, ST_Disjoint, and ST_Crosses. These test whether the interiors of the geometries intersect.

-- ST_Intersects(geometry A, geometry B) returns t (TRUE) if the two shapes have any space in common, i.e., 
-- if their boundaries or interiors intersect.
SELECT settl_name
FROM settlements
WHERE
  ST_Intersects(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));
-- Returns Newbridge settlement

-- The opposite of ST_Intersects is ST_Disjoint(geometry A , geometry B). If two geometries are disjoint, they do not 
-- intersect, and vice-versa. In fact, it is often more efficient to test “not intersects” than to test “disjoint” 
-- because the intersects tests can be spatially indexed, while the disjoint test cannot.
-- For multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and 
-- linestring/multipolygon comparisons, ST_Crosses(geometry A, geometry B) returns t (TRUE) if the intersection 
-- results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and 
-- the intersection set is interior to both source geometries.
SELECT count(settl_name)
FROM settlements
WHERE
  ST_Disjoint(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));
-- Returns 872, all settlements except Newbridge

ST_Contains and ST_Within are functions for finding which polygons a point falls within. Notice the order of the parameters in both functions, they are switched. Both functions return Newbridge as the result.

-- ST_Within(geometry A , geometry B) returns TRUE if the first geometry is completely within the second geometry. 
-- ST_Within tests for the exact opposite result of ST_Contains.
-- ST_Contains(geometry A, geometry B) returns TRUE if the second geometry is completely contained by the first geometry.
SELECT settl_name
FROM settlements
WHERE
  ST_Contains(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));

SELECT settl_name
FROM settlements
WHERE
  ST_Within(ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326), wkb_geometry);

Counties

This is another polygon dataset of the traditional 32 county boundaries. I added the following properties stroke, stroke-opacity, stroke-width, fill, and fill-opacity. These parameters are used on https://gist.github.com to render the geojson, the gist can be found here.

wget https://www.townlands.ie/static/downloads/counties.geojson.zip
unzip counties.geojson.zip
# added properties to each record
# "stroke": "#000000", "stroke-opacity": 0.5, "stroke-width": 1, "fill": "#FF0000", "fill-opacity": 0.5
# store geojson as a gist on GitHub - https://gist.github.com/martinbpeters/34e258dadca967393291b7a128857350
cat counties.geojson | gist create "Ireland Counties GeoJson" --filename counties.geojson --public
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "counties.geojson" -nln public.counties
SELECT
  ogc_fid, osm_id, name_tag, name_ga, name_en, alt_name,
  alt_name_g, logainm_re, osm_user, osm_timest, attributio,
  t_ie_url, area, latitude, longitude, epoch_tstm,
  stroke, stroke_opacity, stroke_width, fill, fill_opacity,
  ST_AsGeoJSON(wkb_geometry) AS geojson
FROM counties
WHERE
  name_tag = 'Tipperary';
Name Value
ogc_fid 29
name_tag Tipperary
name_en County Tipperary
area 4305196662.62
latitude 52.6433140996745
longitude -7.95796422303277
stroke #000000
stroke_opacity 0.5
stroke_width 1
fill #0000FF
fill_opacity 0.5
geojson {"type":"Polygon","coordinates":[[[-8.4802917,52.7572409],...]]}

To determine the settlements in county Dublin a simple spatial join will answer the question using the ST_Contains function. It returns the areas: Naul, Ballyboghil, Donabate, Kinsealy-Drinan, Glencullen, Saggart, Rathcoole, Newcastle, Oldtown, Rivermeade, Garristown, Balrothery, Lusk, Brittas, and Kinsaley

SELECT
  counties.name_tag AS county_name,
  settlements.settl_name AS settlement_name
FROM settlements
JOIN counties
  ON ST_Contains(counties.wkb_geometry, settlements.wkb_geometry)
WHERE
  counties.name_tag = 'Dublin';

Small Area Boundaries

Small Areas are divisions of the Republic of Ireland used for analysis of census data. I downloaded the Small_Areas_OSI_Generalised_100M.geojson from the data.gov.ie site. Each Small Area has between 60 and 90 households. The normalised size lends itself to more robust analysis of the data.

cat Small_Areas_OSI_Generalised_100M.geojson | gist create "Ireland Small Areas Generalised at 100M" --filename Small_Areas_OSI_Generalised_100M.geojson --public
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "Small_Areas_OSI_Generalised_100M.geojson" -nln public.small_areas
SELECT
  ogc_fid,
  guid,
  countyname,
  csoed,
  osied,
  edname,
  sa_pub2011,
  small_area,
  geogid,
  shape__area,
  shape__length,
  ST_AsGeoJSON(wkb_geometry) AS geojson
FROM small_areas
WHERE
  county = 'LS'
  AND edname = 'Abbeyleix'
  AND small_area = '107001003';
Name Value
ogc_fid 972
guid 4c07d11e-115c-851d-e053-ca3ca8c0ca7f
countyname Laois
csoed 08001
osied 107001
edname Abbeyleix
sa_pub2011 107001003
small_area 107001003
geogid A107001003
shape__area 0.000750942565133452
shape__length 0.183375835528692
geojson {"type":"Polygon","coordinates":[[[-7.31661607172085,52.9106086713051],...]]}

Luas Stops

This is a point dataset of the Luas Red and Green lines (stops and depots) in Dublin City. The data from data.tii.ie was extended to include all stops and depots plus the following properties: Line (Red or Green), Zone (Green 1, Red 4, Central, etc.), Start (Start date), Stage (Planning label), Open (Yes/No, two stations remain closed on the Green Line: Racecourse and Brennanstown), PnR (Park and Ride available Yes/No), DublinBikes (Dublin Bikes available Yes/No), marker-size, marker-color, marker-symbol. The latter three properties are used within visualisation applications.

wget http://data.tii.ie/Datasets/Luas/StopLocations/luas-stops.geojson 
# some manual edits were required, creating the luas-stops-2020.geojson file
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "luas-stops-2020.geojson" -nln public.luas_stops
SELECT
  ogc_fid, id, name, type, line, zone, to_date(start, 'DD/MM/YYYY') AS start_date,
  stage, open, pnr, dublinbikes, ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.luas_stops
WHERE 
  id = 1;
Name Value
ogc_fid 1
id 1
name Tallaght
type Stop
line Red
zone Red 4
start_date 2004-09-26
stage A
open Yes
pnr Yes
dublinbikes No
geojson {"type":"Point","coordinates":[-6.37458,53.287494]}

I’ll dedicate a future post to analyse the population that live close to a luas stop.

Irish Population from 1841 to 2016 by County

An interesting dataset from data.gov.ie where the population of each county is supplied for every census since 1841. This data was used in my last blog post to describe the movement of people towards Dublin since the 1840s. This data is provided in px format, a common format for statisticians. I converted it to csv to load it into Postgres using pxaxis and pandas.

pip install pyaxis
wget http://www.cso.ie/px/pxeirestat/Database/Eirestat/Profile%202%20-%20Population%20Distribution%20and%20Movements/E2001.px

It required me to write a simple python file, convert_e2001.py, which reads the px file and dumps a csv file using pandas.

from pyaxis import pyaxis
FILE_PATH="E2001.px"
parsed_px = pyaxis.parse(FILE_PATH, encoding='ISO-8859-2')
df = parsed_px['DATA']
export_csv = df.to_csv ('E2001.csv', index=None, header=True)

Before loading the data, I created a table to store the data:

CREATE TABLE public.e2001
(
  County VARCHAR(64),
  Sex VARCHAR(24),
  CensusYear INTEGER,
  DATA INTEGER
);

Then running the python code and loading into the database using psql.

python convert_e2001.py
psql -d slsdb -h localhost -p 5432 --user admin_user -c "\copy public.e2001 from 'E2001.csv' null as '' CSV HEADER"
SELECT *
FROM public.e2001
WHERE
  county = 'Limerick'
  and sex = 'Both sexes';
county sex censusyear data
Limerick Both sexes 1841 330029
... ... ... ...
Limerick Both sexes 1946 142559
... ... ... ...
Limerick Both sexes 2016 194899

Small Area Census 2016

This is a dataset published after the 2016 census. The data is at a granularity of small area, in other words each row or record represents one small area which has many metrics e.g. Total - Males, people with No religion, French speakers, etc. I will use this dataset to analyse the number of people living close to the Luas in Dublin in a future post.

To load the data into the database I used a utility called pgfutter. This is a really useful command line application to load csv data into Postgres. It creates the DDL during the import, especially handy for csv files with hundreds of columns. Only downside is that the metric data is imported with type text.

wget https://github.com/lukasmartinelli/pgfutter/releases/download/v1.2/pgfutter_darwin_amd64
mv pgfutter_darwin_amd64 pgfutter

Loading the data with pgfutter:

export DB_PASSWORD="<db password>"
./pgfutter --db "slsdb" --port "5432" \
  --user "admin_user" --pw "$DB_PASSWORD" \ 
  --schema public --table sa_2016 csv SAPS2016_SA2017.csv
-- There's over 800 columns in the dataset so just selecting a tiny subset. 
SELECT
  guid,
  geogid,
  geogdesc,
  t1_1age0m,
  t1_2t,
  t15_3_t
FROM sa_2016
WHERE 
  guid = '4c07d11e-11d3-851d-e053-ca3ca8c0ca7f';
Name Value
guid 4c07d11e-11d3-851d-e053-ca3ca8c0ca7f
geogid SA2017_017001001
geogdesc Small Area
t1_1age0m 4
... ..
t1_2t 395
... ..
t15_3_t 128

Recap

Hopefully this post has introduced some spatial functions available within Postgis and how you can use the data loaded to answer some spatial questions. In future posts I’ll use these and more datasets to explore trends and ways of visualising spatial data.

References

  1. Python CDK - VPC + Postgres

Archive

ireland postgres postgis aws random qgis property geospatial