So Python is…?

Briefly, Python is

  • a programming language.
  • thanks to many open-source extension packages is a very useful tool for working with data.
  • rumoured to be named after the Monty Python’s Flying Circus comedy series.

Python is used by executing statements of the Python programming language, or code, commonly in a Jupyter Notebook.

Whilst Python is an ‘interpreted language’, many of the extension packages are ‘compiled’, which makes them seriously fast and powerful for processing data.

Preflight

The next steps in this workflow need the following libraries/packages to extend the capabilities of Python. The code can be copy and pasted! Python may return a number of messages related to loading these packages, they can be helpful when developing workflows and can be suppressed when no longer required.

from math import sqrt, sin, pi
import numpy as np
from statistics import mean
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
import sqlite3
import geopandas
import geojson
import json
plt.rcParams["axes.formatter.limits"] = (-99, 99) # avoid scientific notation on matplotlib plots

Let’s begin

Simple mathematics is possible with Python in a Jupyter Notebook. Much like a calculator.

Let’s start with the following code. Execute the code by using the keyboard shortcut Ctrl+Enter (Command+Enter on a Mac).

The output response from Python will appear below the executed code.

1 + 1 
2

Hopefully the answer 2 appeared below the code. The code can be typed over and re-executed again. (Try changing the code above, perhaps to 2 + 3 for example, and re-execute to observe the changed output).

All sorts of maths is possible, including a variety of functions similar to those available on a calculator.

Try the following

( 5 * 6 ) + 12
42

try also

6**2 + 6
42

and finally also try

sqrt(49) * mean(range(1,12)) * sin(pi/2)
42.0

Note. Trig functions are in radians. The range(1,12) command returns a number series, starting from 1 and finishing with 11.

Variables

Variables are at the heart of coding. Variables are placeholders for sets of data, and allow shorthand style code statements to powerfully manipulate data, repeatedly as required.

To see the value of a variable, simply execute the variable name, or use print().

For example, lets assign the variable integer1 with the integer value 42 and then print it.

integer1 = 42
integer1
42

Try the following one at a time to explore some common variable data types and structures.

  • Strings
string1 = "forty two"
string1
'forty two'
  • Lists
list_string1 = ["apples","oranges","lemons"]
list_string1
['apples', 'oranges', 'lemons']
  • Dictionary Strings
dict_string1 = {"fruit1":"apples","fruit2":"oranges","fruit3":"lemons"} 
dict_string1
{'fruit1': 'apples', 'fruit2': 'oranges', 'fruit3': 'lemons'}

One of the packages we loaded earlier was Pandas, which allows for easy manipulation of the data frame type/structure in Python.

The data frame type/structure is very commonly used in data analysis.

Below is an example of creating a data frame manually, showing how it is a variable and how it is stored.

data_frame1 = pd.DataFrame({"fruit":["apples","oranges","lemons"],
                                   "quantity":[7,14,21]})
data_frame1
fruit quantity
0 apples 7
1 oranges 14
2 lemons 21

Sample Data in Python

Data in tabular format, or tables, is a very common starting point when working with data.

Python and the Pandas package don’t include sample datasets, but is possible to use the well known R sample datasets. One way is to load statsmodel package, as we have done previously in the preflight.

One such data set is called mtcars, which has various features for 32 now ancient cars from a 1974 survey for a US car magazine.

Running this with head() will give a stylised table consisting of all columns for the first five rows of data.

mtcars = sm.datasets.get_rdataset('mtcars')
pd.DataFrame(mtcars.data).head(5)
mpg cyl disp hp drat wt qsec vs am gear carb
rownames
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2