The pytorch (neural network library) examples include a script to try out the training process for MNIST digit recognition data set: https://github.com/pytorch/examples/tree/main/mnist
This builds up a convolutional neural network that takes one of these pictures and processes it down to 10 neurons. The training process uses two sets of labelled data (examples of pictures of digits and which of the 10 possible digits they are): One training set and one testing set. The training set is used to manipulate all of the "weights" inside the neural network by moving in the (very high dimensional) direction of fastest descent, aiming to get the output neurons to produce the intended label given the input picture. The testing set is used as a metric to say how well the neural network is doing.
I ran this, creating mnist_cnn.pt with 99% accuracy on the test data set.
Then I wanted to see if it worked, so I drew images of all 10 digits. There was no way to try this out so I wrote the attached script (try.py
). I also wrote a script to display the results as dots attached to the images I drew. Here are the results that I got:
These results are terrible. It isn't able to correctly classify the digits I drew. I thought that maybe I drew them with too thick a paintbrush or something, so I looked at the actual MNIST digits and tried to do ones that looked similar to it.
Again, awful results. Then I realized that the data set may be inverted in this particular setup. So I tried inverting my images . This gave good results:
Here are the actual outputs of the NN:
tensor([[ 91.81, 0.00, 8.13, 0.05, 0.00, 0.00, 0.00,
0.00, 0.01, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.13, 59.88, 1.23, 0.15, 7.22, 0.94, 20.30, 0.08, 9.98, 0.08]],
grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.02, 99.98, 0.00, 0.00, 0.00, 0.00,
0.00, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.00, 0.00, 100.00, 0.00, 0.00, 0.00,
0.00, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.00, 0.00, 0.00, 100.00, 0.00, 0.00,
0.00, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.00, 0.00, 0.50, 0.00, 99.49, 0.00,
0.02, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 100.00,
0.00, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 3.67, 0.02, 0.00, 0.00, 0.00, 0.00,
96.31, 0.00, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 1.52, 0.00, 0.02, 0.04, 0.00, 0.00, 0.00,
0.00, 98.42, 0.00]], grad_fn=<MulBackward0>)
tensor([[ 0.00, 0.00, 0.01, 0.36, 0.29, 0.44, 0.00,
0.10, 0.10, 98.70]], grad_fn=<MulBackward0>)
Next I wanted to experiment with putting in some garbage data and seeing what I get out:
tensor([[ 3.99, 15.13, 1.83, 35.38, 1.20, 40.62, 0.12, 0.43, 0.17, 1.14]],
grad_fn=<MulBackward0>) #prince
tensor([[ 0.00, 0.00, 0.00, 92.05, 0.00, 3.18, 0.00,
4.76, 0.00, 0.01]], grad_fn=<MulBackward0>) #35
tensor([[ 1.74, 0.87, 5.31, 2.10, 0.68, 0.24, 1.08, 0.52, 86.35, 1.11]],
grad_fn=<MulBackward0>) #noise1
tensor([[ 0.30, 0.29, 2.11, 12.38, 2.04, 1.31, 1.65, 0.39, 59.00, 20.53]],
grad_fn=<MulBackward0>) #noise2
tensor([[10.68, 10.36, 8.21, 7.92, 9.92, 11.62, 10.18, 9.50, 10.65, 10.97]],
grad_fn=<MulBackward0>) #noise3
I wanted to double check my data so I rerun the NN on noise1.png, and I get random values each time. I don't understand why that is. Although they do lean towards a similar classification every time.
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[ 1.70, 0.06, 1.87, 5.79, 2.25, 6.93, 0.38,
1.10, 67.72, 12.21]], grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[ 0.58, 0.02, 0.79, 5.73, 0.64, 0.17, 1.34,
0.01, 90.01, 0.73]], grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[ 0.58, 0.99, 8.43, 4.11, 8.19, 3.20, 0.56, 4.44, 64.37, 5.12]],
grad_fn=<MulBackward0>)
(env-pytorch) [river@river mnist]$ python try.py --input my-ex-3/noise1.png --no-cuda --no-mps
tensor([[ 0.16, 0.00, 0.05, 0.33, 1.79, 0.45, 0.06,
0.04, 70.44, 26.67]], grad_fn=<MulBackward0>)
I think I was clouded by skepticism a bit. I didn't believe that this neural network would work on my own data, so when it didn't it took me a while to before it hit me that it was because of my own mistake. I'm pretty impressed by the capability of this neural network to correctly classify digits, which takes 3 mins to train to 99% accuracy.
On the other hand there is a problem with using neural networks like this. It just has 10 outputs. There isn't really a way for it to express uncertaintly or say "I don't know" which is something that I feel could be important in some contexts. I was hoping it would give a 0.1 value for every output neuron for random noise, but it confidently misclassifies it. You could probably add a new "I don't know" neuron and train it to recognize noise on that, but it wouldn't activate properly for non-noise garbage inputs. The NN has 6 layers, data that goes in wil be pushed through to the very end, there is fundamentally no way for it to reject nonsense inputs.
I've heard about an interesting experiment that was done with this data set from here. One is to just give every image a completely random label (So there will be pictures of 3s which are labelled as a 7, and other pictures of 3s which are labelled as a 4). Then train it. This forces the neural network to 'memorize' the input data set instead of learning to 'actually recognize numbers'. Something that requires a large enough neural network to do - as per the universal approximation theorem. The 'loss' (accuracy of the model) over time decreases much slow than correctly labelled data. This implies that the NN is somehow working more efficiently on correctly labelled data.
Some things to explore in future:
- Include letters as well. How would you recognize characters as part of a larger image, instead of the image being just a digit.
- Can we produce pictures of numbers out of nothing by running the neural network backwards?
- Can we produce heatmaps of parts of the input images that are more or less releveant to e.g. being an 8.
- Can we 'interpret' what particular groups of neurons do.
- If we train the NN from a different random seed, will its outputs on the random noises images change?
- Can we train a NN to classify whether an output classification vector was produced from the training data or the test data set.