Version 1.8 - July 2024
COPYRIGHT © Curtin University 2024
The ABS produces The Socio-Economic Indexes for Areas (SEIFA) based on the Census of Population and Housing. These indexes provide a measure of relative socio-economic advantage and disadvantage across different areas of Australia. It includes this dataset, which includes from Table 5 an Index of Education and Occupation, which reflects the education and occupational level per postcode. The dataset is also published under a suitable Creative Commons Licence.
Looking at the spreadsheet, the data we are interested in are the Postcode and Percentile/Ranking within Australia columns of the Table 5 sheet. The data starts on row 7. We can import this data direct using the readxl library. With this library we import all columns then discard all but 1 and 7 (Postcode and Percentile/Rank).
The import warns about some cell values however these columns are discarded so not an issue for us.
seifa2016_url <- 'https://www.ausstats.abs.gov.au/ausstats/subscriber.nsf/0/DC124D1DAC3D9FDDCA25825D000F9267/$File/2033055001%20-%20poa%20indexes.xls'
download.file(seifa2016_url, 'poa_indexes.xls', mode = 'wb')
seifa2016_raw <- read_excel('poa_indexes.xls', sheet='Table 5', range=cell_rows(7:2636), col_names=FALSE, na = ' ', .name_repair = 'minimal')[,c(1,7)]
names(seifa2016_raw) <- c('Postcode','ieo_percentile')
seifa2016_raw$Postcode <- as.numeric(seifa2016_raw$Postcode)
head(seifa2016_raw)
## # A tibble: 6 × 2
## Postcode ieo_percentile
## <dbl> <dbl>
## 1 2000 94
## 2 2007 93
## 3 2008 99
## 4 2009 96
## 5 2010 97
## 6 2011 98
Combining datasets offers great opportunities for data analysis and insights.
Our datasets both have postcode columns, so can be combined to compare education levels with income and private health status data.
R and the Tidyverse offer simple merge functions which can combine two datasets, based on common column(s) in both datasets.
We will use the inner_join() function. We only have one column in common; if there are more than one columns in common, they would simply need adding to the by parameter as a vector using the combine c() function.
Finally a note about cleaning - there may have been changes in the postcode allocations between the SEIFA data of 2016 and the Taxation data from 2019/20, we will leave this discrepancy for now and see the impact later in this workflow. (Hint: there are at least 10 changes!)
# Change variable type for Postcode column as excel import chooses it to be 'character' type
tax2020_raw$Postcode <- as.numeric(tax2020_raw$Postcode)
tax2020_raw %>%
filter( State !="Unknown" & State!="Overseas" ) %>%
mutate(TaxableIncome_dollarspr = TaxableIncome_dollars/Returns) %>%
mutate(PrivateHealth_percentpp = round(PrivateHealth_returns/Returns*100,0)) %>%
inner_join( x= ., y = seifa2016_raw, by = "Postcode") %>%
select(State, Postcode, ieo_percentile, TaxableIncome_dollarspr, PrivateHealth_percentpp ) %>%
filter(Postcode == '6102')
## # A tibble: 1 × 5
## State Postcode ieo_percentile TaxableIncome_dollarspr PrivateHealth_percentpp
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 WA 6102 65 48324. 48
For those with a background in SQL, R interfaces with many different databases and queries can be executed as if using the native SQL interface.
In the example below, the library RSQLite is used to
tax_seifa_db <- dbConnect(RSQLite::SQLite(), "")
head(tax2020_raw)
## # A tibble: 6 × 5
## State Postcode Returns TaxableIncome_dollars PrivateHealth_returns
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 ACT 2600 5945 710218557 4822
## 2 ACT 2601 3159 214621509 1670
## 3 ACT 2602 22009 1747368144 14747
## 4 ACT 2603 7165 906618402 5617
## 5 ACT 2604 8617 821838536 6118
## 6 ACT 2605 7973 748963372 6126
head(seifa2016_raw)
## # A tibble: 6 × 2
## Postcode ieo_percentile
## <dbl> <dbl>
## 1 2000 94
## 2 2007 93
## 3 2008 99
## 4 2009 96
## 5 2010 97
## 6 2011 98
dbWriteTable(tax_seifa_db,"seifa2016_tbl",seifa2016_raw)
dbWriteTable(tax_seifa_db,"tax2020_tbl",tax2020_raw)
dbListTables(tax_seifa_db)
## [1] "seifa2016_tbl" "tax2020_tbl"
dbGetQuery(tax_seifa_db, 'SELECT seifa2016_tbl.ieo_percentile, tax2020_tbl.* FROM seifa2016_tbl JOIN tax2020_tbl ON seifa2016_tbl."Postcode" = tax2020_tbl."Postcode" WHERE seifa2016_tbl."Postcode" = 6102')
## ieo_percentile State Postcode Returns TaxableIncome_dollars
## 1 65 WA 6102 7899 381711165
## PrivateHealth_returns
## 1 3814
dbDisconnect(tax_seifa_db)
Part 3 - Data analysis and visualisation
Version 1.8 - July 2024