Curtin logo

Introduction to R - Data Literacy

Version 1.8 - July 2024

COPYRIGHT © Curtin University 2024

Workshop 1 - Part 5 - Map visualisations

Leaflet and Map Shapefiles

R has some powerful libraries to make use of map data. Map data can be imported for example as a shapefile and visualised along with data using the interactive library Leaflet.

These shapefiles are not images based on pixels, but use vectors to effectively redraw the postcode boundaries in whatever environment it is used. They provide much more definition than is needed to display on a screen on a web page, so R provides tools which resample the vectors to suit a webpage, greatly simplifying the file and making it smaller and quicker to render.

Once again, the map/shapefile data for Postcodes in Australia is readily available from the ABS website, with a suitable Creative Commons Licence.

To reduce time and resources needed for this workflow - we have created an R object file named pc_sf_raw.RData, which is already a simplified version of the map file and can be loaded with the code below. Full citation for source of modified map/shapefile: Australian Bureau of Statistics (2021) ‘Non ABS Structures: Postal Areas - 2021 [https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files]’ [Shapefile], Digital boundary files: Australian Statistical Geography Standard (ASGS) Edition 3, accessed 27th February 2024.

load(file = "pc_sf_raw.RData")
head(pc_sf_raw)
## Simple feature collection with 6 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 129.3556 ymin: -14.89182 xmax: 136.982 ymax: -10.90649
## Geodetic CRS:  GDA2020
## # A tibble: 6 × 11
##   POA_CODE21 AUS_CODE21 POA_NAME21 AUS_NAME21 AREASQKM21 LOCI_URI21   SHAPE_Leng
##   <chr>      <chr>      <chr>      <chr>           <dbl> <chr>             <dbl>
## 1 0800       AUS        0800       Australia        3.17 http://link…     0.0819
## 2 0810       AUS        0810       Australia       24.4  http://link…     0.242 
## 3 0812       AUS        0812       Australia       35.9  http://link…     0.279 
## 4 0820       AUS        0820       Australia       39.1  http://link…     0.409 
## 5 0822       AUS        0822       Australia   150776.   http://link…    90.6   
## 6 0828       AUS        0828       Australia       28.7  http://link…     0.246 
## # ℹ 4 more variables: SHAPE_Area <dbl>, geometry <GEOMETRY [°]>, long <dbl>,
## #   lat <dbl>

The file was created using the code below, there is no need to execute the code in this workflow, but how long does it take to execute on your machine?

pc_sf_url = 'https://www.abs.gov.au/statistics/standards/australian-statistical-geography-standard-asgs-edition-3/jul2021-jun2026/access-and-downloads/digital-boundary-files/POA_2021_AUST_GDA2020_SHP.zip'
download.file(pc_sf_url, 'POA_2021_AUST_GDA2020_SHP.zip', mode = 'wb')
unzip("POA_2021_AUST_GDA2020_SHP.zip")
pc_sf_raw <- sf::read_sf("POA_2021_AUST_GDA2020.shp") %>% 
  ms_simplify()
pc_sf_raw$long <- st_coordinates(st_centroid(pc_sf_raw$geometry))[,"X"]
pc_sf_raw$lat <- st_coordinates(st_centroid(pc_sf_raw$geometry))[,"Y"]
save(pc_sf_raw, file = "pc_sf_raw.RData")

Leaflet Visualisation 1

Here we create a visualisation demonstration including a map of Australia with postcodes shown in colours reflecting the Index of Education and Occupation, and hovering over each postcode will display a label detailing the combined tax/seifa data from earlier in abbreviated form.

Though explaining the code for the visualisation is beyond the scope of this workflow, a powerful visualisation has been created with relatively little code. We can also see the effects of not ‘cleaning’ the data earlier. We can see areas missing data, where tax data was summarised into ‘Other’ categories, or the older SEIFA data was missing for new Postcodes. The join commands from earlier couldn’t find a match between the two datasets for these Postcodes and thus there is no corresponding data.

# Create a dataframe from earlier # Tax data combined workflow
tax_seifa <- tax2020_raw %>% 
  filter( State !="Unknown" & State!="Overseas" ) %>% 
  mutate(TaxableIncome_dollarspr = TaxableIncome_dollars/Returns) %>% 
  mutate(PrivateHealth_percentpp = round(PrivateHealth_returns/Returns*100,0)) %>% 
  inner_join( x= ., y = seifa2016_raw, by = "Postcode")

# Data cleaning - add leading zero to three digit postcodes from tax data
tax_seifa$Postcode <- sprintf("%04d",as.numeric(tax_seifa$Postcode))

# Combine map shapefile and tax data into a new R object
pc_sf <- pc_sf_raw %>% 
  inner_join(x=.,y=tax_seifa,by = c('POA_CODE21'='Postcode'))

# Add a label to data which combines all of the tax data into a single abbreviated field
pc_sf$data_label <- paste0("PCode:",pc_sf$POA_CODE21," Income:$",round(pc_sf$TaxableIncome_dollarspr/1000,0),"K PrivHlth:",pc_sf$PrivateHealth_percentpp,"% IEO:",pc_sf$ieo_percentile)

# Create a colour palette based on index of educational opportunity precentile
pc_v1_palette <- colorQuantile("YlOrRd", pc_sf$ieo_percentile, n = 9, reverse = TRUE)

# Leaflet Visualisation 1 
pc_v1 <- leaflet(pc_sf) %>% 
  addPolygons(color="black", weight=0.5, smoothFactor=0.2, fillOpacity=0.5, fillColor = ~pc_v1_palette(pc_sf$ieo_percentile), label = ~pc_sf$data_label, highlightOptions = highlightOptions(color="white",weight=1,bringToFront = TRUE)) %>% 
  addProviderTiles(providers$CartoDB.Voyager) %>% 
  addLegend(pal=pc_v1_palette, values=~pc_sf$ieo_percentile, title="SEIFA<br> Index of<br>Education/<br>Occupation<br>percentile", position="bottomleft" )

pc_v1

Previous

Part 4 - Combining two datasets

Next

Part 6 - Interactive visualisations

Curtin logo

Version 1.8 - July 2024

COPYRIGHT © Curtin University 2024