GeoPandas and Map Shapefiles

Python and Pandas have some powerful packages to make use of map data. Map data can be imported for example as a shapefile and visualised along with data using GeoPandas.

These shapefiles are not images based on pixels, but use vectors to effectively redraw the postcode boundaries in whatever environment it is used. They provide much more definition than is needed to display on a screen on a web page, so there are tools in Python, namely topojson, which resample the vectors to suit a webpage, greatly simplifying the file and making it smaller and quicker to render.

Once again, the map/shapefile data for Postcodes in Australia is readily available from the ABS website, with a suitable Creative Commons Licence.

To reduce time and resources needed for this workflow, we have created a preprocessed shapefile named pc_sf_raw.shz, which is already a simplified version of the map file and can be loaded with the code below.

pc_sf_raw = geopandas.read_file('pc_sf_raw.shz')
pc_sf_raw.head(5)
POA_CODE21 POA_NAME21 AUS_CODE21 AUS_NAME21 AREASQKM21 LOCI_URI21 SHAPE_Leng SHAPE_Area geometry
0 0800 0800 AUS Australia 3.1731 http://linked.data.gov.au/dataset/asgsed3/POA/... 0.081893 0.000264 POLYGON ((130.85017 -12.45301, 130.85192 -12.4...
1 0810 0810 AUS Australia 24.4283 http://linked.data.gov.au/dataset/asgsed3/POA/... 0.241859 0.002031 POLYGON ((130.87154 -12.39532, 130.84584 -12.4...
2 0812 0812 AUS Australia 35.8899 http://linked.data.gov.au/dataset/asgsed3/POA/... 0.278788 0.002983 POLYGON ((130.90841 -12.40479, 130.87154 -12.3...
3 0820 0820 AUS Australia 39.0642 http://linked.data.gov.au/dataset/asgsed3/POA/... 0.409134 0.003248 POLYGON ((130.85017 -12.45301, 130.83331 -12.4...
4 0822 0822 AUS Australia 150775.8030 http://linked.data.gov.au/dataset/asgsed3/POA/... 90.601831 12.564238 MULTIPOLYGON (((136.75021 -12.23023, 136.7697 ...

See also the list of points to draw the postcode boundary/polygon for Bentley, Perth (6102).

pc_sf_raw[pc_sf_raw['POA_CODE21'] == '6102']['geometry'].get_coordinates()
x y
2216 115.899067 -32.013453
2216 115.886680 -32.012399
2216 115.886629 -31.998110
2216 115.888878 -31.991374
2216 115.902449 -32.003307
2216 115.915000 -31.993887
2216 115.932771 -32.002420
2216 115.921517 -32.012744
2216 115.899067 -32.013453

Full citation for source of modified map/shapefile:

Australian Bureau of Statistics (2021) 'Non ABS Structures: Postal Areas - 2021 [https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files]' [Shapefile], Digital boundary files: Australian Statistical Geography Standard (ASGS) Edition 3, accessed 27th February 2024.

How was this file created?

The file was created using the code below, there is no need to execute the code in this workflow, it can take a while and depending on the prequantize/epsilon values chosen (which here determine the level of simplification for each postcode’s boundaries), can cause ‘out of memory’ type errors.

import topojson as tp
pc_sf_url = 'https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/POA_2021_AUST_GDA2020_SHP.zip'
pc_sf_in = geopandas.read_file(pc_sf_url).to_crs(epsg=4283)
pc_sf_topo = tp.Topology(pc_sf_in.dropna(), prequantize=False, topology=True)
pc_sf_raw = pc_sf_topo.toposimplify(epsilon=0.006).to_gdf()
pc_sf_raw.to_file('pc_sf_raw.shz')

Demonstration 1 - GeoPandas Visualisation

Here we create a visualisation demonstration including a map of Australia with postcodes shown in colours reflecting the Index of Education and Occupation, and hovering over each postcode will display a label detailing the combined tax/seifa data from earlier in abbreviated form.

Though explaining the code for the visualisation is beyond the scope of this workflow, a powerful visualisation has been created with relatively little code. We can also see the effects of not ‘cleaning’ the data earlier. We can see areas missing data, where tax data was summarised into ‘Other’ categories, or there was a change in postcodes between the two data sources. The merge commands from earlier couldn’t find a match between the two datasets for these Postcodes and thus there is no corresponding data.

# Create a dataframe from earlier # Tax data combined workflow
tax_seifa = (tax2022_raw.query('~State.isin(["Unknown","Overseas"])')
            .assign(TaxableIncome_dollarspr = lambda x: round(x.TaxableIncome_dollars/x.Returns/1000,0))
            .assign(PrivateHealth_percentpp = lambda x: round(x.PrivateHealth_returns/x.Returns*100,0))
            .merge(seifa2021_raw, how="inner", on="Postcode"))

# Combine map shapefile and tax data
pc_sf = pc_sf_raw.merge(tax_seifa, left_on='POA_CODE21', right_on='Postcode', how="inner")

# Add a label to data which combines all of the tax data into a single abbreviated field
pc_sf["Details"] = "PCode:" + pc_sf["POA_CODE21"] + " Income:$" + pc_sf["TaxableIncome_dollarspr"].astype('str') + "K PrivHlth:" + pc_sf["PrivateHealth_percentpp"].astype('str') + "% IEO:" + pc_sf["ieo_percentile"].astype('str')

# Demo Visualisation 1
pc_sf.explore("ieo_percentile",
              cmap = "YlOrRd_r",
              scheme='quantiles',
              tiles = "cartodb voyager",
              tooltip = ["Details"],
              style_kwds = dict(color='black',weight=0.25),
              highlight_kwds = dict(color="black",weight=0.75,fillOpacity=0.75),
              legend_kwds = dict(caption="SEIFA Index of Education/Occupation percentile"))
Make this Notebook Trusted to load map: File -> Trust Notebook