Data, python and visualisation enthusiast. Keen cyclist. Husband and father of three children.
postgis
postgres
ireland
aws
]
Published Feb 02, 2020
In my previous blog posts (geospatial analysis in Ireland and AWS CDK Setup) I introduced some geospatial analysis concepts and how to set up a spatial aware database in the AWS cloud. Here I’ll load and use some spatial data and introduce some of the functionality within Postgis. This will lead up to be being able to answer questions like ‘How many people live within 5 minutes of a Luas stop?’. (Note: Luas is a light rail network in Dublin, Ireland).
Logging into the Aurora database (provisioned previously) and running the following will add the postgis and pgrouting (I’ll use this in a later post to describe how we can route between locations) extensions:
SHOW rds.extensions;
CREATE EXTENSION postgis;
CREATE EXTENSION pgrouting;
SELECT PostGIS_full_version();
Postgis is a spatial database extender for the PostgreSQL object-relational database. It adds support for geographic objects allowing location queries to be run in SQL.
On macOS it is convenient to install some CLI utilities to interact with postgres and postgis.
brew install postgresql
brew install postgis
brew install osm2pgrouting
brew install osmfilter
brew install gdal
Let’s source and load some spatial data into the database (I’ll dedicate a future post to a more complete set of spatial datasets). Again I’ll use some Irish datasets.
This is a simple point dataset of the main airports on the island of Ireland. This was generated by hand using Wikipedia (RoI) and Wikipedia (NI) and Google Maps to find the coordinates.
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "airports.geojson" -nln public.airports
Let’s select some data to see the fields stored:
SELECT
ogc_fid, name, iata_code, marker_size, marker_color,
marker_symbol, ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.airports
WHERE
ogc_fid = 3;
Name | Value |
---|---|
ogc_fid | 3 |
name | Dublin Airport |
iata_code | DUB |
marker_size | Medium |
marker_color | #FF0000 |
marker_symbol | airport |
geojson | {"type":"Point","coordinates":[-6.24418258666992,53.4269236916967]} |
Let’s use some postgis functions to understand the spatial data stored:
-- ST_GeometryType(geometry) returns the type of the geometry
-- ST_NDims(geometry) returns the number of dimensions of the geometry
-- ST_SRID(geometry) returns the spatial reference identifier number of the geometry
-- ST_X(geometry) returns the X ordinate
-- ST_Y(geometry) returns the Y ordinate
SELECT
name,
ST_GeometryType(wkb_geometry),
ST_NDims(wkb_geometry),
ST_SRID(wkb_geometry),
ST_X(wkb_geometry),
ST_Y(wkb_geometry),
ST_AsText(wkb_geometry)
FROM public.airports
WHERE ogc_fid = 3;
Name | Value |
---|---|
name | Dublin Airport |
st_geometrytype | ST_Point |
st_ndims | 2 |
st_srid | 4326 |
st_x | -6.244182586669922 |
st_y | 53.426923691696736 |
st_astext | POINT(-6.24418258666992 53.4269236916967) |
4326 is the most common Spatial Reference System Identifier (SRID) for geographic coordinates. It is called the World Geodetic System and is used by the Global Positioning System (GPS). More on projections can be found here.
To create geometries in Postgis there are a few options. Using maps.ie I grabbed an example location: Newbridge, County Kildare (latitude: 53.180124; longitude: -6.7982137).
-- Using ST_GeomFromText with the SRID parameter
SELECT ST_GeomFromText('POINT(-6.7982137 53.180124)', 4326);
-- Using ST_GeomFromText without the SRID parameter
SELECT ST_SetSRID(ST_GeomFromText('POINT(-6.7982137 53.180124)'), 4326);
-- Using a ST_Make* function
SELECT ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326);
-- Using PostgreSQL casting syntax and ISO WKT
SELECT ST_SetSRID('POINT(-6.7982137 53.180124)'::geometry, 4326);
-- Using PostgreSQL casting syntax and extended WKT
SELECT 'SRID=4326;POINT(-6.7982137 53.180124)'::geometry;
Comparing geometry points in Postgis is carried out using the ST_Equals
function:
-- https://postgis.net/workshops/postgis-intro/spatial_relationships.html
-- ST_Equals returns TRUE if two geometries of the same type have identical x,y coordinate values,
-- i.e. if the second shape is equal (identical) to the first shape.
SELECT
name
FROM public.airports
WHERE
ST_Equals(
wkb_geometry,
ST_SetSRID(ST_MakePoint(-8.493798524458427, 51.84173815), 4326)
);
There’s a number of ways to determine the distance between two points in Postgis: ST_Distance
, ST_Distance_Sphere
and ST_Distance_Spheroid
. The method you select depends on your speed and accuracy requirements. More detailed
information can be found here. The example below
determines the closest airport to Newbridge, Co. Kildare.
-- ST_Distance - Returns the minimum 2D Cartesian (planar) distance between two geometries
--
-- ST_Distance_Sphere — Returns minimum distance in meters between two lon/lat geometries.
-- Uses a spherical earth and radius of 6370986 meters. Faster than ST_Distance_Spheroid,
-- but less accurate.
--
-- ST_Distance_Spheroid — Returns the minimum distance between two lon/lat geometries given a particular spheroid.
SELECT
name,
ROUND((ST_Distance(
wkb_geometry::geography,
ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326)::geography)/1000.0)::NUMERIC, 2) AS nearest_airport_km,
ROUND((ST_Distance_Sphere(
wkb_geometry,
ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326))/1000.0)::NUMERIC, 2) AS nearest_airport_km_v2,
ROUND((ST_Distance_Spheroid(
wkb_geometry,
ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326),
'SPHEROID["WGS 84",6378137,298.257223563]')/1000.0)::NUMERIC, 2) AS nearest_airport_km_v3
FROM airports
ORDER BY
ST_Distance(
wkb_geometry::geography,
ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326)::geography
) ASC;
The Settlements
dataset is a polygon set describing where the urban centres are located in the Republic of Ireland. I downloaded
Settlements_OSI_Generalised_100M.geojson
file from the data.gov.ie
site.
export DB_DETAILS="host=localhost user=admin_user dbname=slsdb password=<db password> port=5432"
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "Settlements_OSI_Generalised_100M.geojson" -nln public.settlements
SELECT
ogc_fid, settlement, settl_name, guid,
ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.settlements
WHERE
settl_name = 'Sligo';
Name | Value |
---|---|
ogc_fid | 5 |
settlement | 31004 |
settl_name | Sligo |
guid | 899C172F-D7B7-4EE1-A17B-C773AE325147 |
geojson | {"type":"Polygon","coordinates":[[[-8.46,54.28],...,[-8.46,54.28]]]} |
The following functions are useful to describe the geometries stored in your database and how you’ve various options to export the data.
-- ST_Area(geometry) returns the area of the polygons
-- ST_NRings(geometry) returns the number of rings (usually 1, more of there are holes)
-- ST_ExteriorRing(geometry) returns the outer ring as a linestring
-- ST_InteriorRingN(geometry,n) returns a specified interior ring as a linestring
-- ST_Perimeter(geometry) returns the length of all the rings
-- ST_NumGeometries(geometry) returns the number of parts in the collection
-- ST_GeometryN(geometry,n) returns the specified part
-- ST_Area(geometry) returns the total area of all polygonal parts
-- ST_Length(geometry) returns the total length of all linear parts
-- ST_AsEWKT(geometry) returns Well-known text (WKT)
-- ST_AsEWKB(geometry) returns Well-known binary (WKB)
-- ST_AsGML(geometry) returns Geographic Mark-up Language (GML)
-- ST_AsKML(geometry) returns Keyhole Mark-up Language (KML)
-- ST_AsGeoJSON(geometry) returns GeoJSON
-- ST_AsSVG(geometry) returns Scalable Vector Graphics (SVG)
SELECT
settlement,
settl_name,
ST_GeometryType(wkb_geometry),
ST_Area(wkb_geometry),
ST_NRings(wkb_geometry),
ST_ExteriorRing(wkb_geometry),
ST_InteriorRingN(wkb_geometry, 1),
ST_Perimeter(wkb_geometry),
ST_NumGeometries(wkb_geometry),
ST_GeometryN(wkb_geometry,1),
ST_Area(wkb_geometry),
ST_Length(wkb_geometry),
ST_AsText(wkb_geometry),
ST_AsEWKT(wkb_geometry),
ST_AsEWKB(wkb_geometry),
ST_AsGML(wkb_geometry),
ST_AsKML(wkb_geometry),
ST_AsGeoJSON(wkb_geometry),
ST_AsSVG(wkb_geometry)
FROM settlements
WHERE
settl_name IN ('Gorey', 'Galway city and suburbs');
Name | ||
---|---|---|
settlement | 14011 | 35005 |
settl_name | Gorey | Galway city and suburbs |
st_geometrytype | ST_Polygon | ST_MultiPolygon |
st_area | 0.0007077848937365223 | 0.00729630548942928 |
st_nrings | 1 | 7 |
st_exteriorrings | 01020....3F5A62F2319C05DF082259C574A40 | |
st_interiorring | ||
st_perimeter | 0.1588223355751195 | 0.8207814038117153 |
st_numgeometries | 1 | 7 |
st_geometryn | 0103000020E610.. | 0103000020E610... |
st_area | 0.0007077 | 0.0072963 |
st_length | 0 | 0 |
st_astext | POLYGON((-6.28 52.68,...)) | MULTIPOLYGON(((-9.05 53.25,...))) |
st_asewkt | SRID=4326;POLYGON((-6.28 52.68,...)) | SRID=4326;MULTIPOLYGON(((-9.05 53.25,...)))) |
st_asewkb | E'\\x0103000020E610000... | E'\\x0106000020E610000... |
st_asgml | <gml:Polygon srsName="EPSG:4326"><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>-6.28,52.68... | <gml:MultiPolygon srsName="EPSG:4326"><gml:polygonMember><gml:Polygon><gml:outerBoundaryIs><gml:LinearRing><gml:coordinates>-9.05,53.25... |
st_askml | <Polygon><outerBoundaryIs><LinearRing><coordinates>-6.28,52.68 | <MultiGeometry><Polygon><outerBoundaryIs><LinearRing><coordinates>-9.05,53.25 |
st_asgeojson | {"type":"Polygon","coordinates":[[[-6.28,52.68],...]]]} | {"type":"MultiPolygon","coordinates":[[[[-9.05,53.25]]]]} |
st_assvg | M -6.28 -52.68 L ...Z | M -9.05 -53.25 L ...Z |
Some additional spatial relationships
functions include ST_Intersects
, ST_Disjoint
, and ST_Crosses
. These test whether the interiors of the geometries
intersect.
-- ST_Intersects(geometry A, geometry B) returns t (TRUE) if the two shapes have any space in common, i.e.,
-- if their boundaries or interiors intersect.
SELECT settl_name
FROM settlements
WHERE
ST_Intersects(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));
-- Returns Newbridge settlement
-- The opposite of ST_Intersects is ST_Disjoint(geometry A , geometry B). If two geometries are disjoint, they do not
-- intersect, and vice-versa. In fact, it is often more efficient to test “not intersects” than to test “disjoint”
-- because the intersects tests can be spatially indexed, while the disjoint test cannot.
-- For multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and
-- linestring/multipolygon comparisons, ST_Crosses(geometry A, geometry B) returns t (TRUE) if the intersection
-- results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and
-- the intersection set is interior to both source geometries.
SELECT count(settl_name)
FROM settlements
WHERE
ST_Disjoint(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));
-- Returns 872, all settlements except Newbridge
ST_Contains
and ST_Within
are functions for finding which polygons a point falls within. Notice the order
of the parameters in both functions, they are switched. Both functions return Newbridge as the result.
-- ST_Within(geometry A , geometry B) returns TRUE if the first geometry is completely within the second geometry.
-- ST_Within tests for the exact opposite result of ST_Contains.
-- ST_Contains(geometry A, geometry B) returns TRUE if the second geometry is completely contained by the first geometry.
SELECT settl_name
FROM settlements
WHERE
ST_Contains(wkb_geometry, ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326));
SELECT settl_name
FROM settlements
WHERE
ST_Within(ST_SetSRID(ST_MakePoint(-6.7982137, 53.180124), 4326), wkb_geometry);
This is another polygon dataset of the traditional 32 county boundaries.
I added the following properties stroke
, stroke-opacity
, stroke-width
, fill
, and fill-opacity
. These
parameters are used on https://gist.github.com to render the geojson, the gist can be
found here.
wget https://www.townlands.ie/static/downloads/counties.geojson.zip
unzip counties.geojson.zip
# added properties to each record
# "stroke": "#000000", "stroke-opacity": 0.5, "stroke-width": 1, "fill": "#FF0000", "fill-opacity": 0.5
# store geojson as a gist on GitHub - https://gist.github.com/martinbpeters/34e258dadca967393291b7a128857350
cat counties.geojson | gist create "Ireland Counties GeoJson" --filename counties.geojson --public
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "counties.geojson" -nln public.counties
SELECT
ogc_fid, osm_id, name_tag, name_ga, name_en, alt_name,
alt_name_g, logainm_re, osm_user, osm_timest, attributio,
t_ie_url, area, latitude, longitude, epoch_tstm,
stroke, stroke_opacity, stroke_width, fill, fill_opacity,
ST_AsGeoJSON(wkb_geometry) AS geojson
FROM counties
WHERE
name_tag = 'Tipperary';
Name | Value |
---|---|
ogc_fid | 29 |
name_tag | Tipperary |
name_en | County Tipperary |
area | 4305196662.62 |
latitude | 52.6433140996745 |
longitude | -7.95796422303277 |
stroke | #000000 |
stroke_opacity | 0.5 |
stroke_width | 1 |
fill | #0000FF |
fill_opacity | 0.5 |
geojson | {"type":"Polygon","coordinates":[[[-8.4802917,52.7572409],...]]} |
To determine the settlements in county Dublin a simple
spatial join will answer the question using the
ST_Contains
function. It returns the areas: Naul, Ballyboghil, Donabate, Kinsealy-Drinan, Glencullen, Saggart,
Rathcoole, Newcastle, Oldtown, Rivermeade, Garristown, Balrothery, Lusk, Brittas, and Kinsaley
SELECT
counties.name_tag AS county_name,
settlements.settl_name AS settlement_name
FROM settlements
JOIN counties
ON ST_Contains(counties.wkb_geometry, settlements.wkb_geometry)
WHERE
counties.name_tag = 'Dublin';
Small Areas are divisions of the Republic of Ireland used for analysis of census data. I downloaded the
Small_Areas_OSI_Generalised_100M.geojson
from the data.gov.ie
site. Each Small Area has between 60 and 90
households. The normalised size lends itself to more robust analysis of the data.
cat Small_Areas_OSI_Generalised_100M.geojson | gist create "Ireland Small Areas Generalised at 100M" --filename Small_Areas_OSI_Generalised_100M.geojson --public
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "Small_Areas_OSI_Generalised_100M.geojson" -nln public.small_areas
SELECT
ogc_fid,
guid,
countyname,
csoed,
osied,
edname,
sa_pub2011,
small_area,
geogid,
shape__area,
shape__length,
ST_AsGeoJSON(wkb_geometry) AS geojson
FROM small_areas
WHERE
county = 'LS'
AND edname = 'Abbeyleix'
AND small_area = '107001003';
Name | Value |
---|---|
ogc_fid | 972 |
guid | 4c07d11e-115c-851d-e053-ca3ca8c0ca7f |
countyname | Laois |
csoed | 08001 |
osied | 107001 |
edname | Abbeyleix |
sa_pub2011 | 107001003 |
small_area | 107001003 |
geogid | A107001003 |
shape__area | 0.000750942565133452 |
shape__length | 0.183375835528692 |
geojson | {"type":"Polygon","coordinates":[[[-7.31661607172085,52.9106086713051],...]]} |
This is a point dataset of the Luas Red and Green lines (stops and depots) in Dublin City. The data from data.tii.ie
was extended to include all stops and depots plus the following properties: Line (Red or Green), Zone (Green 1,
Red 4, Central, etc.), Start (Start date), Stage (Planning label), Open (Yes/No, two stations remain closed on the
Green Line: Racecourse and Brennanstown), PnR (Park and Ride available Yes/No), DublinBikes (Dublin Bikes available
Yes/No), marker-size, marker-color, marker-symbol. The latter three properties are used within visualisation
applications.
wget http://data.tii.ie/Datasets/Luas/StopLocations/luas-stops.geojson
# some manual edits were required, creating the luas-stops-2020.geojson file
ogr2ogr -f "PostgreSQL" PG:"$DB_DETAILS" "luas-stops-2020.geojson" -nln public.luas_stops
SELECT
ogc_fid, id, name, type, line, zone, to_date(start, 'DD/MM/YYYY') AS start_date,
stage, open, pnr, dublinbikes, ST_AsGeoJSON(wkb_geometry) AS geojson
FROM public.luas_stops
WHERE
id = 1;
Name | Value |
---|---|
ogc_fid | 1 |
id | 1 |
name | Tallaght |
type | Stop |
line | Red |
zone | Red 4 |
start_date | 2004-09-26 |
stage | A |
open | Yes |
pnr | Yes |
dublinbikes | No |
geojson | {"type":"Point","coordinates":[-6.37458,53.287494]} |
I’ll dedicate a future post to analyse the population that live close to a luas stop.
An interesting dataset from data.gov.ie
where the population of each county is supplied for every census since 1841.
This data was used in my last blog post to describe the
movement of people towards Dublin since the 1840s. This data is provided in px format, a common format for
statisticians. I converted it to csv to load it into Postgres using pxaxis and pandas.
pip install pyaxis
wget http://www.cso.ie/px/pxeirestat/Database/Eirestat/Profile%202%20-%20Population%20Distribution%20and%20Movements/E2001.px
It required me to write a simple python file, convert_e2001.py, which reads the px file and dumps a csv file using pandas.
from pyaxis import pyaxis
FILE_PATH="E2001.px"
parsed_px = pyaxis.parse(FILE_PATH, encoding='ISO-8859-2')
df = parsed_px['DATA']
export_csv = df.to_csv ('E2001.csv', index=None, header=True)
Before loading the data, I created a table to store the data:
CREATE TABLE public.e2001
(
County VARCHAR(64),
Sex VARCHAR(24),
CensusYear INTEGER,
DATA INTEGER
);
Then running the python code and loading into the database using psql.
python convert_e2001.py
psql -d slsdb -h localhost -p 5432 --user admin_user -c "\copy public.e2001 from 'E2001.csv' null as '' CSV HEADER"
SELECT *
FROM public.e2001
WHERE
county = 'Limerick'
and sex = 'Both sexes';
county | sex | censusyear | data |
---|---|---|---|
Limerick | Both sexes | 1841 | 330029 |
... | ... | ... | ... |
Limerick | Both sexes | 1946 | 142559 |
... | ... | ... | ... |
Limerick | Both sexes | 2016 | 194899 |
This is a dataset published after the 2016 census. The data is at a granularity of small area, in other words each row
or record represents one small area which has many metrics e.g. Total - Males
, people with No religion
,
French speakers
, etc. I will use this dataset to analyse the number of people living close to the Luas in Dublin in
a future post.
To load the data into the database I used a utility called pgfutter. This is a really useful command line application
to load csv data into Postgres. It creates the DDL during the import, especially handy for csv files with hundreds of
columns. Only downside is that the metric data is imported with type text
.
wget https://github.com/lukasmartinelli/pgfutter/releases/download/v1.2/pgfutter_darwin_amd64
mv pgfutter_darwin_amd64 pgfutter
Loading the data with pgfutter:
export DB_PASSWORD="<db password>"
./pgfutter --db "slsdb" --port "5432" \
--user "admin_user" --pw "$DB_PASSWORD" \
--schema public --table sa_2016 csv SAPS2016_SA2017.csv
-- There's over 800 columns in the dataset so just selecting a tiny subset.
SELECT
guid,
geogid,
geogdesc,
t1_1age0m,
t1_2t,
t15_3_t
FROM sa_2016
WHERE
guid = '4c07d11e-11d3-851d-e053-ca3ca8c0ca7f';
Name | Value |
---|---|
guid | 4c07d11e-11d3-851d-e053-ca3ca8c0ca7f |
geogid | SA2017_017001001 |
geogdesc | Small Area |
t1_1age0m | 4 |
... | .. |
t1_2t | 395 |
... | .. |
t15_3_t | 128 |
Hopefully this post has introduced some spatial functions available within Postgis and how you can use the data loaded to answer some spatial questions. In future posts I’ll use these and more datasets to explore trends and ways of visualising spatial data.
ireland
postgres
postgis
aws
random
qgis
property
geospatial