This is a step-by-step guide to build an image classifier. I mainly used Torch for building the model.
- Importing the libraries: We import the necessary libraries first.
2. Creating the Dataset: I have scrapped off pictures from the internet for making my Marvel dataset. Each folder has images of the respective superhero. After physically downloading and moving them to the respective folders, we now make it into a pandas data structure.
Converting our non-numerical labels to numerical labels.
3. Splitting the dataset into train and test:
The first step in splitting any dataset is to split and shuffle the indices.
transforms.ToTensor() converts the values in range 0–255 to 0–1. transforms.Normalize() does the following for each channel:
img=(img-mean) / std. Here mean and std are 0.5, 0.5. This normalizes the image to be in range [-1,1].
Why normalization?
It helps CNN perform better by reducing the skewness/distortion and allows the values to stay in a particular range.
4. The Model:
nn.conv2d applies the 2D convolution over input images.
nn.MaxPool2d is a pooling layer. Max pooling is done to prevent over-fitting and greatly helps in reducing the computational cost by reducing the number of parameters to deal with.
It basically takes the maximum value in the region and uses it.
nn.dropout2d randomly zeros values during training. It basically deactivates random neurons to prevent overfitting.
For any given neuron in the hidden layer, representing a given learned abstract representation, there are two possible cases: either that neuron is relevant, or it isn’t.
If the neuron isn’t relevant, this doesn’t necessarily mean that other possible abstract representations are also less likely as a consequence. If we used an activation function whose image includes R- , this means that, for certain values of the input to a neuron, that neuron’s output would negatively contribute to the output of the neural network. This is generally undesirable.
So to prevent this we use ReLU.
This is what we have created.
5. Training the model: The model is trained on the gpu.
train_loss and val_loss stores the training and validation loss after every epoch. train_acc and val_acc stores the accuracy after every epoch.
Plotting the model: We now plot the graphs for both accuracy and loss of both training and validation set.