ML (MetaLanguage), is a general purpose functional programming language with popular derivatives including F# and OCaml.
Wrong Presentation!
You will need:
We will also download these libraries later:
Basically, it is what everyone uses.
It is an incredibly powerful interpreted language with many useful machine learning and visualisation libraries that can interact well with each other.
Furthermore, python is incredibly widely used for other applications, allowing Python+ML to enhance those massively. (eg. Houdini+scikit-learn)
Machine Learning (ML) is a form of computation where a program is designed to solve specific tasks using models and inference, as opposed to relying on explicit hard-coded instructions.
There are three broad categories of machine learning
When a machine learning algorithm attempts to infer a function from labeled training data.
When a machine learning algorithm attempts to infer some kind underlying structure to unlabelled data.
When a machine learning algorithm seeks to take actions in order to maximize some kind of reward.
We have..
The goal of a neural network (in this case!) is to model some function $f(\mathbf{x})$.
Where $\mathbf{x}$ is a multi-dimensional vector, eg. $784$ values representing the pixels of a $28 \cdot 28$ image.
The output could be a number the image represents.
So in this example, we want our neural network to take an image of the number $5$ and turn it into the number $5$.
$\rightarrow$ $5$
A simple network, the multilayer perceptron:
$\sigma$ is the activation function, processing all the inputs into a neuron. We will use the sigmoid function:
$$ \sigma (x) = \frac{1}{1 + e^{-x}} $$
The output of the sigmoid function looks like this:
DRAWING TIME!!!! :D
$x$ | $y$ |
---|---|
-2.0 | 4.4 |
3.7 | 0.5 |
4.7 | -0.3 |
1.0 | 1.4 |
3.4 | 0.4 |
1.9 | 1.4 |
0.5 | 2.0 |
3.7 | 0.5 |
-3.7 | 5.6 |
-1.4 | 3.5 |
$x$ | $y$ |
---|---|
-2.0 | 4.4 |
3.7 | 0.5 |
4.7 | -0.3 |
1.0 | 1.4 |
3.4 | 0.4 |
1.9 | 1.4 |
0.5 | 2.0 |
3.7 | 0.5 |
-3.7 | 5.6 |
-1.4 | 3.5 |
We can simplify the maths down to (1 layer, 1 neuron, no activation function $\sigma$): $$ a(x)=b + w x $$ This closely matches our desired output function: $$ y = c + m x $$
We need to train the neural network so that it finds $b$ and $w$ to best fit the data.
$$ a(x)=b + w x $$
We can do this with a cost function.
$$ C=\sum_{i=0}^{N-1} (a(x_i) - y_i)^2 $$
Where $(x_i, y_i)$ represents each sample from our training data.
Now we have a cost, we can calculate a new weight and bias which minimizes the cost:
$$ w'=w-\mu\frac{\partial C}{\partial w} $$
$$ b'=b-\mu\frac{\partial C}{\partial b} $$
Here, $\mu$ is what we call the learning rate.
$$ \frac{\partial C}{\partial w} = \sum_{i=0}^{N-1} 2 x_i ((b + wx_i) - y_i) $$
$$ \frac{\partial C}{\partial b} = \sum_{i=0}^{N-1} 2 ((b + wx_i) - y_i) $$
$m$ = 0.68, $c$ = 2.75
$m$ = 0.68, $c$ = 2.75
$m$ = 0.68, $c$ = 2.75
We have worked through the maths of a single-neuron network.
What glaring flaw do you potentially see with our values for $w$ and $b$?
What about the model overall?
Given $a^L_j=\sigma(b^L_j + \sum_{k=1}^{n_{L-1}} w^L_{jk} a^{L-1}_k)$.
We can use the same cost function and similarly calculate its derivative w.r.t. all weights and biases.
This makes extensive use of the chain rule to backpropagate the cost through the network.
Links at the end the presentation show the full maths behind this.
Create folder and add a text file called "process_digits.py".
Then, ensuring you have Python installed, run:
pip install scikit-learn
pip install numpy
The dataset can be found at http://yann.lecun.com/exdb/mnist/.
However, we will be using MNIST-for-Numpy to download and then load the data into a "process_digits.py".
By the end of the session, who can come-up with the best accuracy on the testing data?
Go back to playground, cover RELU and the like.