#7-8-20 Excercises
In the video, First steps in computer vision, Laurence Maroney introduces us to the Fashion MNIST data set and using it to train a neural network in order to teach a computer “how to see.” One of the first steps towards this goal is splitting the data into two groups, a set of training images and training labels and then also a set of test images and test labels. Why is this done?
The data is split into two sets to reduce overfitting and to better verify its accuracy. The model is only trained off of the training data. It’s accuracy on new data can be tested using the test data, as it has never seen the test data. The test data stands in for what we would be trying to predict, but with answers so you can verify it.
The fashion MNIST example has increased the number of layers in our neural network from 1 in the past example, now to 3. The last two are .Dense layers that have activation arguments using the relu and softmax functions. What is the purpose of each of these functions. Also, why are there 10 neurons in the third and last layer in the neural network.
relu is used to remove negative outputs. Negative outputs from a neuron can skew results further down the neural network and cause unpredictably and incorrect predictions.
Soft max turns the final values into a spread of values between 0 and 1. These represent the probability that the category represented is the correct clssification. A value of .6 implies a 60% probability that this classification is the correct classification.
In the past example we used the optimizer and loss function, while in this one we are using the function adam in the optimizer argument and sparse_categorical-crossentropy for the loss argument. How do the optimizer and loss functions operate to produce model parameters (estimates) within the model.compile() function?
Model optimizers use the loss function to find a local minimum. This local minimum is the most effective version of that current model. The loss function is set by what option of model accuracy you want to optimize. For example, this model uses mean squared error as its accuracy test. This model’s optimizer tries to make the loss function produce the minimum mean squared error.
Using the mnist drawings dataset (the dataset with the hand written numbers with corresponding labels) answer the following questions.
What is the shape of the images training set (how many and the dimension of each)?
60,000 entries, in a 28 by 28 array.
What is the length of the labels training set?
60,000 entries.
What is the shape of the images test set?
10,000 entries, in a 28 by 28 array.
Estimate a probability model and apply it to the test set in order to produce the array of probabilities that a randomly selected image is each of the possible numeric outcomes (look towards the end of the basic image classification exercises for how to do this — you can apply the same method applied to the Fashion MNIST dataset but now apply it to the hand written letters MNIST dataset)
This is done in my private repository.