An Introduction to ML

Tom Read Cutting

Notes and Repository

What is ML?

ML (MetaLanguage), is a general purpose functional programming language with popular derivatives including F# and OCaml.


Wrong Presentation!

An Introduction to Machine Learning and Neural Networks

The logo of Electric Square Ltd.

Tom Read Cutting

Workshop Goals

  1. Introduce Machine Learning
  2. Explain the theory behind simple Neural Networks
  3. Complete a fun neural network challenge
  4. Introduce a wide array of resources for further learning


You will need:

We will also download these libraries later:

Why Python?

Basically, it is what everyone uses.

It is an incredibly powerful interpreted language with many useful machine learning and visualisation libraries that can interact well with each other.

Furthermore, python is incredibly widely used for other applications, allowing Python+ML to enhance those massively. (eg. Houdini+scikit-learn)

What is Machine Learning?

Machine Learning (ML) is a form of computation where a program is designed to solve specific tasks using models and inference, as opposed to relying on explicit hard-coded instructions.

What types of machine learning are there?

There are three broad categories of machine learning

  1. Supervised Learning
  2. Unsupervised Learning
  3. Reinforcement Learning

Supervised Learning

When a machine learning algorithm attempts to infer a function from labeled training data.

Examples Include

Unsupervised Learning

When a machine learning algorithm attempts to infer some kind underlying structure to unlabelled data.

Examples Include

Reinforcement Learning

When a machine learning algorithm seeks to take actions in order to maximize some kind of reward.

Examples Include


We have..

  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning

Neural Networks: An Example

Neural Networks: A Trained Example

The goal of a neural network (in this case!) is to model some function $f(\mathbf{x})$.

Where $\mathbf{x}$ is a multi-dimensional vector, eg. $784$ values representing the pixels of a $28 \cdot 28$ image.

The output could be a number the image represents.

Neural Networks: A Trained Example

So in this example, we want our neural network to take an image of the number $5$ and turn it into the number $5$.

A 28 by 28 pixel handwritten image of the number five $\rightarrow$ $5$

Neural Networks: An Explanation

A simple network, the multilayer perceptron:

Diagram of a simple arbitrary multilayer perceptron neural network

What do we have?

  1. We have some layers: $1$ input layer, $1$ output layer and $0$+ hidden layers.
  2. Each layer has a number of neurons.
  3. The first layer has 784, the last layer has 5.
  4. Every neuron in each layer is connected to every neuron in the layer before and after.
  5. Each connection, has some kind of weight.

How does the data flow through the network?

  1. The input neurons just output the value of the pixel that they correspond to.
  2. For all other neurons, we have the following equation: $$ a^L_j=\sigma(b^L_j + \sum_{k=1}^{n_{L-1}} w^L_{jk} a^{L-1}_k) $$ We will explore the meaning of this with drawing now...

The activation function

$\sigma$ is the activation function, processing all the inputs into a neuron. We will use the sigmoid function:

$$ \sigma (x) = \frac{1}{1 + e^{-x}} $$

The activation function

The output of the sigmoid function looks like this:

A graph of the sigmoid logistics curve, showing how it forms an 'S' shape that tends towards 1 at inputs > 6 and to 0 at inputs < -6.

Neural Networks: An Explanation


How to train a network: A single-neuron

Diagram of a neural network with only a single neuron, taking a single weighted input value and bias to produce an output.

Some contrived example data

A diagram of the data from the table plotted onto a graph. the points form some kind of downwards slope.

Some contrived example data

The same diagram of the graph as before, but with a regression line running through the data. The line has the equation: y = -0.68x + 2.75.

The maths of a single neuron

We can simplify the maths down to (1 layer, 1 neuron, no activation function $\sigma$): $$ a(x)=b + w x $$ This closely matches our desired output function: $$ y = c + m x $$

Finding $b$ and $w$

We need to train the neural network so that it finds $b$ and $w$ to best fit the data.

$$ a(x)=b + w x $$

We can do this with a cost function.

$$ C=\sum_{i=0}^{N-1} (a(x_i) - y_i)^2 $$

Where $(x_i, y_i)$ represents each sample from our training data.

Reducing the cost

Now we have a cost, we can calculate a new weight and bias which minimizes the cost:

$$ w'=w-\mu\frac{\partial C}{\partial w} $$

$$ b'=b-\mu\frac{\partial C}{\partial b} $$

Here, $\mu$ is what we call the learning rate.

Working through the maths...

$$ \frac{\partial C}{\partial w} = \sum_{i=0}^{N-1} 2 x_i ((b + wx_i) - y_i) $$

$$ \frac{\partial C}{\partial b} = \sum_{i=0}^{N-1} 2 ((b + wx_i) - y_i) $$

Example with $\mu=0.01$

$m$ = 0.68, $c$ = 2.75

  1. $w$ = 1.00, $b$ = 1.00
  2. $w$ = -1.46, $b$ = 0.95
  3. $w$ = 0.30, $b$ = 1.50
  4. $w$ = -1.08, $b$ = 1.52
  5. $w$ = -0.10, $b$ = 1.86
  6. $w$ = -0.77, $b$ = 2.16
  7. $w$ = -0.68, $b$ = 2.66
  8. $w$ = -0.68, $b$ = 2.73
  9. $w$ = -0.68, $b$ = 2.75

Example with $\mu=0.001$

$m$ = 0.68, $c$ = 2.75

  1. $w$ = 1.00, $b$ = 1.00
  2. $w$ = 0.75, $b$ = 1.00
  3. $w$ = 0.55, $b$ = 1.00
  4. $w$ = 0.24, $b$ = 1.01
  5. $w$ = -0.05, $b$ = 1.06
  6. $w$ = -0.40, $b$ = 1.28
  7. $w$ = -0.56, $b$ = 1.97
  8. $w$ = -0.64, $b$ = 2.50
  9. $w$ = -0.68, $b$ = 2.74
  10. $w$ = -0.68, $b$ = 2.75

Example with $\mu=0.1$

$m$ = 0.68, $c$ = 2.75

  1. $w$ = 1.00, $b$ = 1.00
  2. $w$ = -23.61, $b$ = 0.54
  3. $w$ = 373.85, $b$ = 59.07
  4. $w$ = -6166, $b$ = -937.5
  5. $w$ = 1.015e+05, $b$ = 1.549e+04
  6. $w$ = -4.535e+08, $b$ = -6.919e+07
  7. $w$ = -6.652e+20, $b$ = -1.015e+20
  8. $w$ = -3.599e+30, $b$ = -5.49e+29
  9. $w$ = -5.279e+42, $b$ = -8.054e+41

In conclusion

We have worked through the maths of a single-neuron network.

  1. Representing it with a function $a(x)=b + w x$.
  2. Applying a cost when applied to training data $C=\sum_{i=0}^{N-1} (a(x_i) - y_i)^2$.
  3. Calculating cost derivative w.r.t. $w$ and $b$ $\frac{\partial C}{\partial w}$, $\frac{\partial C}{\partial b}$.
  4. Obtaining $w'=w-\mu\frac{\partial C}{\partial w}$ and $b'=b-\mu\frac{\partial C}{\partial b}$.
  5. Run the above multiple times to train our network.
  6. Choose a good $\mu$!

Any Questions?

What glaring flaw do you potentially see with our values for $w$ and $b$?

What about the model overall?

Applying maths to a "full" network

Given $a^L_j=\sigma(b^L_j + \sum_{k=1}^{n_{L-1}} w^L_{jk} a^{L-1}_k)$.

We can use the same cost function and similarly calculate its derivative w.r.t. all weights and biases.

This makes extensive use of the chain rule to backpropagate the cost through the network.

Links at the end the presentation show the full maths behind this.

Before the practical...

Any Questions?

Applying the Principle: the MNIST dataset

Rows and Columns of different digits demonstrating the handwriting found in the MNIST database

Demonstration Time!

  1. Setup the project folder and install Python dependencies
  2. Download and load the dataset
  3. Train and evaluate a neural network on the dataset

Sorting project and dependencies

Create folder and add a text file called "".

Then, ensuring you have Python installed, run:

				pip install scikit-learn
				pip install numpy

Downloading the dataset

The dataset can be found at

However, we will be using MNIST-for-Numpy to download and then load the data into a "".

Training and evaluating a neural network

Your Turn!

By the end of the session, who can come-up with the best accuracy on the testing data?

The next step...

Go back to playground, cover RELU and the like.

Useful Resources

Special Thanks


Please give your feedback!