It’s been a few weeks since my last post. Previously I focused on the
and the walking distance from the Luas.
Here I’ll analyse a property transactions dataset for County Wicklow in
Ireland. I’ll introduce Docker and how it can be used locally to store geo
data, analyse it with Python Pandas/Matplotlib
using spatial functions, and visualise with QGIS.
My wife and I owned a property in County Wicklow, Ireland from 2005 to 2019.
In 2014, we became accidental landlords when be relocated to County Wexford.
We’d followed the property market closely before selling in late 2019. We had
purchased the property in 2005 close to the peak of the Celtic Tiger.
Since 2010, each property sale in the Republic of Ireland has been published
on the Property Price Register. This
dataset includes address and price information. In Wicklow at the time of
writing had 12,698 residential transactions listed. I downloaded all records
and to view these on a map or carry out spatial analysis I needed to geocode
them. Geocoding is a process to convert addresses to locations (lat/lon).
There are a number of freemium and commercial tools available for this
and HERE to name just a few. I’ve used Google’s
geocoder as there’s a $200 free monthly credit which allowed me to geocode the
12k records free of charge.
Docker allows you to build, run and share applications
with containers. Installation details can be found on the
Docker Documentation page. An example image available
on Docker hub is a Postgres/Postgis Image.
Using this you can have a database running within a container on your local machine.
After installing Docker. Create a folder (e.g. Wicklow_PPR) with a Dockerfile that
has the following content:
version:'3.2'services:db:image:kartoza/postgisenvironment:-POSTGRES_USER=admin-POSTGRES_PASS=superSecretPassword!-POSTGRES_DBNAME=db-POSTGRES_MULTIPLE_EXTENSIONS=plpgsql,postgis,hstore,postgis_topology,pgrouting,cube-ALLOW_IP_RANGE=0.0.0.0/0ports:-25432:5432command:sh -c "echo \"host all all 0.0.0.0/0 md5\" >> /etc/postgresql/10/main/pg_hba.conf && /start-postgis.sh"
Then is as easy as running a docker-compose command which will create a
container with a postgis-enabled postgres instance. Once spun up you can
connect to it using your preferred SQL client or using the cli:
docker-compose up -d --build db
#Access the database using the cli
psql --host localhost --port 25432 --username admin --dbname ddb
Python with Pandas & Matplotlib
Pandas and Matplotlib
are two popular Python packages for carrying out data analysis and visualising
results. You can import various file types into Pandas including csv, xls,
json etc. creating a so called dataframe. You can manipulate the dataframe
like you would a sheet in Excel or table in SQL - selecting, filtering,
joining, pivoting, filling missing data, aggregating etc.
QGIS is an open source Geographic Information System
available available to download
and install on all major operating systems. There’s extensive
documentation on using it and on
QGIS can be used amongst other things to display points and polygons on top of base maps.
The base map I’ve used here comes from OpenStreetmap.
Details of using OSM Tile Servers can
be found here.
The standard osm basemap includes roads and rail networks, and water features.
Dataset & Analysis
I ended up with two tables in the Postgis-enabled Postgres database.
There’s a one-to-one relationship between the the address field in
the PPR table and the input_address field of the geocode table.
Using just the PPR dataset we can see the price trends over time.
I’ve use a Box plot to
illustrate the trend (See code below). The median price (green line) has
steadily improved from 2013 onwards. Unfortunately, I don’t have data from the
Celtic Tiger period (1995-2007) to compare. The quantity of transactions
has also increased as more confidence in the market has returned and as more
accidental landlords are now able to exit more easily. The PPR dataset
also includes dimensions for Vat Exclusive and Not full market value
(reasons not listed). New property transactions are exempt from VAT at
The size of the boxes were adjusted to reflect the number of transactions
in the that year (Sample size in Boxplots).
Not full market value
Taking a closer look at the geocoding results we can use the
‘location_type’ field. The table below shows that over 65% of
geocoding processes resulted in an exact location been returned.
Some notes on Google’s Geocoder, I provided a bounding box:
and a region (“ie”) to ensure Irish results were preferred
(More details can be found here).
I also filtered out any record that did not return with Ireland as the
value of country. More details of the code to geocode addresses
will be the focus of another post.
Here are the transactions plotted on a map coloured by ‘location_type’ field.
The APPROXIMATE results are often in the countryside where addresses are often
The analysis gets more interesting when we combine both the PPR and Geocoding
results together. The price field is a continuous variable ranging from
€11,000 to €1,850,000. To display this on a map it is convenient to convert to
a dimension of price bins. Here we can easily see the most expensive
properties are location around the Bray/Greystones area.
The bin sizes were automatically defined with QGIS using a graduated scheme of 10 evenly sized buckets:
There is some obvious clusters of property transactions within County Wicklow
based around the population centres such as Bray, Greystones, Wicklow Town,
Arklow, Baltinglass, Blessington etc. Using a
k-means algorithm following
an example on stackexchange
I clustered the data into 15 groups (arbitrarily set number) and I gave each
group a label (largest town name). This allowed me to repeat the time series
analysis but one level deeper.
Though only a couple of months into 2020, the median price trend is down from
previous years and similar to 2017 levels. See the Greystones Box plot above.
Note, this analysis does not take into account details of each property such
as its condition, number of bedrooms/bathrooms, square footage, garden size,
etc. Thus the trends in price should be taken with a pinch of salt.
I hope this post illustrated some of the things you can do with free resources
such as Postgres/PostGIS, Pandas & Matplotlib and QGIS.
Code to generate the Box plots
requirements.txt file for python packages.
importpsycopg2importsys,osimportnumpyasnpimportpandasaspdimportpandas.io.sqlaspsqlimportmatplotlib.pyplotaspltfromdotenvimportload_dotenv,find_dotenvimportmatplotlibasmpl# .env file contains variables PGHOST, PGDATABASE, PGUSER, PGPORT and PGPASSWORD
load_dotenv(find_dotenv())PGHOST=os.getenv('PGHOST')PGDATABASE=os.getenv('PGDATABASE')PGUSER=os.getenv('PGUSER')PGPORT=os.getenv('PGPORT')PGPASSWORD=os.getenv('PGPASSWORD')conn_string="host="+PGHOST+" port="+"5432"+" dbname="+PGDATABASE+" user="+PGUSER \
date_part('YEAR', date_of_sale)::character varying as year
address5 = 'Wicklow'
and date_of_sale < '2020-01-01'