In the previous visualisations the user can’t change the core information which is shown, in effect the visualisations are ‘static’.
The most useful visualisations don’t limit the user to a static analysis, they incorporate interactivity, where the visualisation can be changed by the user varying relevant parameters.
The user is not dictated to, instead they are guided and can choose their own analysis journey. Incorporating intuitive elements can allow even more insights.
There is always one key issue which needs to be addressed with data analysis and interactive visualisations - how to share them with those who will use them?
Essentially the humble web browser is the one application we know will most likely be on any device capable of using the visualisations.
Assuming we have a method of sharing a webpage (eg GitPages, static website instance), let’s explore some visualisations which function completely within a web browser without the need for a separate data server.
Bokeh is a powerful package for creating static and interactive visualisations. Whilst a little more complicated to setup than Matplotlib from earlier, Bokeh allows interactive visualisations which work in a web browser alone and are light on resource use. Interactivity is achieved using JavaScript callbacks, where Javascript ‘aware’ Bokeh functions and small amounts of JavaScript are included to update a visualisation when an interactive element is changed, such as a slider or a button.
Let’s create a visualisation which explores correlations between the two data sources earlier. We will plot SEIFA percentile/rank versus Private Health participation for each Postcode for a user chosen range of Taxable Income.
This is achieved by
Can you draw any interesting conclusions from the visualisations?
from bokeh.plotting import figure, show
from bokeh.layouts import layout
from bokeh.io import output_notebook
from bokeh.models import ColumnDataSource, CDSView, BooleanFilter, RangeSlider, CustomJS, Range1d
from bokeh.transform import factor_cmap
from bokeh.palettes import Paired
output_notebook()
# Create a dataframe from earlier # Tax data combined workflow
# map the number of Returns for each Postcode into a range 0.5 thru 4, to limit range of circle sizes in plot
tax_seifa = (tax2022_raw.query('~State.isin(["Unknown","Overseas"])')
.assign(TaxableIncome_dollarspr = lambda x: round(x.TaxableIncome_dollars/x.Returns/1000,0))
.assign(PrivateHealth_percentpp = lambda x: round(x.PrivateHealth_returns/x.Returns*100,0))
.assign(Returns_scaled = lambda x: np.interp(x.Returns,[tax_seifa['Returns'].min(),tax_seifa['Returns'].max()],[0.5,4]))
.merge(seifa2021_raw, how="inner", on="Postcode"))
# The legend in Bokeh @ver:3.6.3 behaves incorrectly when using a single circle plot command for all States along with JS Callbacks
# Solution is to use separate circle plot commands maintained in arrays.
sources = []
tifilters = []
states = tax_seifa['State'].unique()
for state in states:
sources.append(ColumnDataSource(tax_seifa[tax_seifa['State'] == state]))
tifilters.append(BooleanFilter([True]*len(tax_seifa))) # initialise BooleanFilter to all True
range_sliderti = RangeSlider(
title='Mean Taxable Income per return ($K)',
start=tax_seifa['TaxableIncome_dollarspr'].min(),
end=tax_seifa['TaxableIncome_dollarspr'].max(),
step=1,
value=(tax_seifa['TaxableIncome_dollarspr'].min(),tax_seifa['TaxableIncome_dollarspr'].max())
)
callback = CustomJS(args=dict(tifilters=tifilters, sources=sources), code="""
const start = cb_obj.value[0];
const end = cb_obj.value[1];
for (var sourceno = 0; sourceno < sources.length; sourceno++) {
const bools = []
for (var i = 0; i < sources[sourceno].length; i++) {
if (sources[sourceno].data['TaxableIncome_dollarspr'][i] >= start && sources[sourceno].data['TaxableIncome_dollarspr'][i] <= end) {
bools.push(true);
}
else {
bools.push(false);
}
}
tifilters[sourceno].booleans = bools;
}
sources[sourceno].change.emit();
""")
range_sliderti.js_on_change('value', callback)
TOOLTIPS = [
("ieo", "@ieo_percentile"),
("ph", "@PrivateHealth_percentpp"),
("rt", "@Returns"),
("ti", "@TaxableIncome_dollarspr"),
("pc", "@Postcode")
]
p = figure(title='Demonstration 2 - Bokeh Visualisation with interactive slider',
x_axis_label='ieo_percentile',
y_axis_label='PrivateHealth_percentpp',
tooltips=TOOLTIPS,
lod_threshold=None,
match_aspect=False,
width=1000)
for idx, state in enumerate(states):
p.circle(x='ieo_percentile',
y='PrivateHealth_percentpp',
radius='Returns_scaled',
fill_color=factor_cmap('State', palette=Paired[8], factors=states),
fill_alpha=0.5,
source=sources[idx],
legend_label=state,
view=CDSView(filter=tifilters[idx]))
p.legend.location = "top_left"
p.legend.click_policy="hide"
p.x_range = Range1d(-10,110)
p.y_range = Range1d(-10,110)
layout = layout(
[
[range_sliderti],
[p],
],
)
show(layout)