Models used in the project

This project uses two machine learning models for digit recognition: feedforward neural network written on numpy and convolutional neural network on Tensorflow. These models were inspired by cs231n course. Both were trained on the same dataset of 800 images (drawn by me) which were augmented up to 19200, validation set had 200 images (also drawn by me) and the accuracy of these models was measured by predicting on MNIST test set.

Feed Forward neural net

The structure of model is simple:

FNN architecture

Code explanation

Model's training and accuracy

I have tried a lot of different values for parameters: learning rate, regularization, number of neurons in hidden layer. I even tried to use 2 or more hidden layers or to use more sophisticated ways to adapt learning rate (RMSprop and ADAM), but they added little. Also at first I wasn't sure what batch size to use, but as each image was augmented up to 24, I decided to use the same value for batch size. In the end the model had the following parameters:

The process of training this model looked like this:


FNN training

It took several epochs to reach high accuracy, but model trained more so that weights stabilized and learning rate decreased. It could be worth using lower learning rate so that loss will be lower, but in the end it doesn't matter. Training/validation set is easily recognized, high accuracy on MNIST can't be achieved due to difference in data. And it is difficult to imagine how exactly the digits will be drawn on the site.

FNN MNIST confusion matrix

This confusion matrix shows quality of predicitons on MNIST. 9 is often mistaken with 4 and 7 with 1. I think it is more or less reasonable and can be explained by difference in drawing styles. I hope that the accuracy will improve after the model is trained on additional data.

Changes in model so that it can be trained on additional data.

I wanted to use digits drawn by other people as training data for continuous improvement of the model. It required certain changes. The actual file is available on Github. The main differences are:

In fact it was quite easy to modify the code so that it could be trained continuously. The main question was what learning rate should be used. Initial training rate was 0.1, but it was decayed 24 times and became 0.1 * (0.95 ^ 24). Also this learning rate was used not for a single iteration, but for a whole epoch. Using this rate could change weights too significantly. After trials and errors I used the learning rate 0.1 * (0.95 ^ 24) / 32. With it each image doesn't change the weights too much, but several similar images may positively influence the accuracy.

Convolutional neural net

The structure of model is more complex than in FNN:

CNN architecture

Code explanation

Model's training and accuracy

I have tried using various values for parameters, adding or dropping layers and changing layers' and weights' shape. You can see the final version int he code above.

Here is an example of a bad combination of parameters:


bad CNN training

The process of training the final model looked like this:


CNN training

It took several epochs to reach high accuracy, but then accuracy and loss hardly changed. As a result I trained the model again and stopped it at ~100 iteration.

CNN MNIST confusion matrix

This confusion matrix shows quality of predicitons on MNIST. Definitely better than FNN

Changes in model so that it can be trained on additional data.

I wanted to use digits drawn by other people as training data for continuous improvement of the model. It required certain changes. The actual file is available on Github. The main differences are:

It took me some time to be able to train TF model continuously, most of it was spent on learning TF possibilities. As for learning rate - I wasn't sure what value to use. CNN has more iterations and it's architecture is more complex, but using a very low learning rate may be not good enough for updating the model. I decided to use 0.00001 which is 100 time lower than the initial value. And maybe it is too low.