In my previous blog posts (geospatial analysis in Ireland
and AWS CDK Setup) I introduced some geospatial
analysis concepts and how to set up a spatial aware database in the AWS cloud. Here I’ll load and use some spatial data
and introduce some of the functionality within Postgis. This will lead up to be being able to answer questions like
‘How many people live within 5 minutes of a Luas stop?’. (Note: Luas is a light rail network
in Dublin, Ireland).
Logging into the Aurora database (provisioned
previously) and running the following will add the postgis and pgrouting (I’ll use this in a later post to describe
how we can route between locations) extensions:
Let’s use some postgis functions to understand the spatial data stored:
-- ST_GeometryType(geometry) returns the type of the geometry-- ST_NDims(geometry) returns the number of dimensions of the geometry-- ST_SRID(geometry) returns the spatial reference identifier number of the geometry-- ST_X(geometry) returns the X ordinate-- ST_Y(geometry) returns the Y ordinateSELECTname,ST_GeometryType(wkb_geometry),ST_NDims(wkb_geometry),ST_SRID(wkb_geometry),ST_X(wkb_geometry),ST_Y(wkb_geometry),ST_AsText(wkb_geometry)FROMpublic.airportsWHEREogc_fid=3;
4326 is the most common Spatial Reference System Identifier (SRID) for
geographic coordinates. It is called the
World Geodetic System and is used by the Global Positioning
System (GPS). More on projections can be found here.
To create geometries in Postgis there are a few options.
Using maps.ie I grabbed an example location: Newbridge, County Kildare
(latitude: 53.180124; longitude: -6.7982137).
-- Using ST_GeomFromText with the SRID parameterSELECTST_GeomFromText('POINT(-6.7982137 53.180124)',4326);-- Using ST_GeomFromText without the SRID parameterSELECTST_SetSRID(ST_GeomFromText('POINT(-6.7982137 53.180124)'),4326);-- Using a ST_Make* functionSELECTST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326);-- Using PostgreSQL casting syntax and ISO WKTSELECTST_SetSRID('POINT(-6.7982137 53.180124)'::geometry,4326);-- Using PostgreSQL casting syntax and extended WKTSELECT'SRID=4326;POINT(-6.7982137 53.180124)'::geometry;
Comparing geometry points in Postgis is carried out using the ST_Equals function:
-- https://postgis.net/workshops/postgis-intro/spatial_relationships.html-- ST_Equals returns TRUE if two geometries of the same type have identical x,y coordinate values, -- i.e. if the second shape is equal (identical) to the first shape.SELECTnameFROMpublic.airportsWHEREST_Equals(wkb_geometry,ST_SetSRID(ST_MakePoint(-8.493798524458427,51.84173815),4326));
There’s a number of ways to determine the distance between two points in Postgis: ST_Distance, ST_Distance_Sphere
and ST_Distance_Spheroid. The method you select depends on your speed and accuracy requirements. More detailed
information can be found here. The example below
determines the closest airport to Newbridge, Co. Kildare.
-- ST_Distance - Returns the minimum 2D Cartesian (planar) distance between two geometries-- -- ST_Distance_Sphere — Returns minimum distance in meters between two lon/lat geometries. -- Uses a spherical earth and radius of 6370986 meters. Faster than ST_Distance_Spheroid, -- but less accurate.---- ST_Distance_Spheroid — Returns the minimum distance between two lon/lat geometries given a particular spheroid. SELECTname,ROUND((ST_Distance(wkb_geometry::geography,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326)::geography)/1000.0)::NUMERIC,2)ASnearest_airport_km,ROUND((ST_Distance_Sphere(wkb_geometry,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326))/1000.0)::NUMERIC,2)ASnearest_airport_km_v2,ROUND((ST_Distance_Spheroid(wkb_geometry,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326),'SPHEROID["WGS 84",6378137,298.257223563]')/1000.0)::NUMERIC,2)ASnearest_airport_km_v3FROMairportsORDERBYST_Distance(wkb_geometry::geography,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326)::geography)ASC;
dataset is a polygon set describing where the urban centres are located in the Republic of Ireland. I downloaded
Settlements_OSI_Generalised_100M.geojson file from the data.gov.ie site.
The following functions are useful to describe the geometries stored in your database and how you’ve various options
to export the data.
-- ST_Area(geometry) returns the area of the polygons-- ST_NRings(geometry) returns the number of rings (usually 1, more of there are holes)-- ST_ExteriorRing(geometry) returns the outer ring as a linestring-- ST_InteriorRingN(geometry,n) returns a specified interior ring as a linestring-- ST_Perimeter(geometry) returns the length of all the rings-- ST_NumGeometries(geometry) returns the number of parts in the collection-- ST_GeometryN(geometry,n) returns the specified part-- ST_Area(geometry) returns the total area of all polygonal parts-- ST_Length(geometry) returns the total length of all linear parts-- ST_AsEWKT(geometry) returns Well-known text (WKT)-- ST_AsEWKB(geometry) returns Well-known binary (WKB)-- ST_AsGML(geometry) returns Geographic Mark-up Language (GML)-- ST_AsKML(geometry) returns Keyhole Mark-up Language (KML)-- ST_AsGeoJSON(geometry) returns GeoJSON-- ST_AsSVG(geometry) returns Scalable Vector Graphics (SVG)SELECTsettlement,settl_name,ST_GeometryType(wkb_geometry),ST_Area(wkb_geometry),ST_NRings(wkb_geometry),ST_ExteriorRing(wkb_geometry),ST_InteriorRingN(wkb_geometry,1),ST_Perimeter(wkb_geometry),ST_NumGeometries(wkb_geometry),ST_GeometryN(wkb_geometry,1),ST_Area(wkb_geometry),ST_Length(wkb_geometry),ST_AsText(wkb_geometry),ST_AsEWKT(wkb_geometry),ST_AsEWKB(wkb_geometry),ST_AsGML(wkb_geometry),ST_AsKML(wkb_geometry),ST_AsGeoJSON(wkb_geometry),ST_AsSVG(wkb_geometry)FROMsettlementsWHEREsettl_nameIN('Gorey','Galway city and suburbs');
Some additional spatial relationships
functions include ST_Intersects, ST_Disjoint, and ST_Crosses. These test whether the interiors of the geometries
-- ST_Intersects(geometry A, geometry B) returns t (TRUE) if the two shapes have any space in common, i.e., -- if their boundaries or interiors intersect.SELECTsettl_nameFROMsettlementsWHEREST_Intersects(wkb_geometry,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326));-- Returns Newbridge settlement-- The opposite of ST_Intersects is ST_Disjoint(geometry A , geometry B). If two geometries are disjoint, they do not -- intersect, and vice-versa. In fact, it is often more efficient to test “not intersects” than to test “disjoint” -- because the intersects tests can be spatially indexed, while the disjoint test cannot.-- For multipoint/polygon, multipoint/linestring, linestring/linestring, linestring/polygon, and -- linestring/multipolygon comparisons, ST_Crosses(geometry A, geometry B) returns t (TRUE) if the intersection -- results in a geometry whose dimension is one less than the maximum dimension of the two source geometries and -- the intersection set is interior to both source geometries.SELECTcount(settl_name)FROMsettlementsWHEREST_Disjoint(wkb_geometry,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326));-- Returns 872, all settlements except Newbridge
ST_Contains and ST_Within are functions for finding which polygons a point falls within. Notice the order
of the parameters in both functions, they are switched. Both functions return Newbridge as the result.
-- ST_Within(geometry A , geometry B) returns TRUE if the first geometry is completely within the second geometry. -- ST_Within tests for the exact opposite result of ST_Contains.-- ST_Contains(geometry A, geometry B) returns TRUE if the second geometry is completely contained by the first geometry.SELECTsettl_nameFROMsettlementsWHEREST_Contains(wkb_geometry,ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326));SELECTsettl_nameFROMsettlementsWHEREST_Within(ST_SetSRID(ST_MakePoint(-6.7982137,53.180124),4326),wkb_geometry);
This is another polygon dataset of the traditional 32 county boundaries.
I added the following properties stroke, stroke-opacity, stroke-width, fill, and fill-opacity. These
parameters are used on https://gist.github.com to render the geojson, the gist can be
# added properties to each record# "stroke": "#000000", "stroke-opacity": 0.5, "stroke-width": 1, "fill": "#FF0000", "fill-opacity": 0.5# store geojson as a gist on GitHub - https://gist.github.com/martinbpeters/34e258dadca967393291b7a128857350cat counties.geojson | gist create "Ireland Counties GeoJson"--filename counties.geojson --public
ogr2ogr -f"PostgreSQL" PG:"$DB_DETAILS""counties.geojson"-nln public.counties
To determine the settlements in county Dublin a simple
spatial join will answer the question using the
ST_Contains function. It returns the areas: Naul, Ballyboghil, Donabate, Kinsealy-Drinan, Glencullen, Saggart,
Rathcoole, Newcastle, Oldtown, Rivermeade, Garristown, Balrothery, Lusk, Brittas, and Kinsaley
Small Areas are divisions of the Republic of Ireland used for analysis of census data. I downloaded the
Small_Areas_OSI_Generalised_100M.geojson from the data.gov.ie site. Each Small Area has between 60 and 90
households. The normalised size lends itself to more robust analysis of the data.
cat Small_Areas_OSI_Generalised_100M.geojson | gist create "Ireland Small Areas Generalised at 100M"--filename Small_Areas_OSI_Generalised_100M.geojson --public
ogr2ogr -f"PostgreSQL" PG:"$DB_DETAILS""Small_Areas_OSI_Generalised_100M.geojson"-nln public.small_areas
This is a point dataset of the Luas Red and Green lines (stops and depots) in Dublin City. The data from data.tii.ie
was extended to include all stops and depots plus the following properties: Line (Red or Green), Zone (Green 1,
Red 4, Central, etc.), Start (Start date), Stage (Planning label), Open (Yes/No, two stations remain closed on the
Green Line: Racecourse and Brennanstown), PnR (Park and Ride available Yes/No), DublinBikes (Dublin Bikes available
Yes/No), marker-size, marker-color, marker-symbol. The latter three properties are used within visualisation
# some manual edits were required, creating the luas-stops-2020.geojson file
ogr2ogr -f"PostgreSQL" PG:"$DB_DETAILS""luas-stops-2020.geojson"-nln public.luas_stops
An interesting dataset from data.gov.ie where the population of each county is supplied for every census since 1841.
This data was used in my last blog post to describe the
movement of people towards Dublin since the 1840s. This data is provided in px format, a common format for
statisticians. I converted it to csv to load it into Postgres using pxaxis and pandas.
This is a dataset published after the 2016 census. The data is at a granularity of small area, in other words each row
or record represents one small area which has many metrics e.g. Total - Males, people with No religion,
French speakers, etc. I will use this dataset to analyse the number of people living close to the Luas in Dublin in
a future post.
To load the data into the database I used a utility called pgfutter. This is a really useful command line application
to load csv data into Postgres. It creates the DDL during the import, especially handy for csv files with hundreds of
columns. Only downside is that the metric data is imported with type text.
-- There's over 800 columns in the dataset so just selecting a tiny subset. SELECTguid,geogid,geogdesc,t1_1age0m,t1_2t,t15_3_tFROMsa_2016WHEREguid='4c07d11e-11d3-851d-e053-ca3ca8c0ca7f';
Hopefully this post has introduced some spatial functions available within Postgis and how you can use the data loaded
to answer some spatial questions. In future posts I’ll use these and more datasets to explore trends and ways of
visualising spatial data.