Deep Learning Experiments on CIFAR-10 Dataset


In this blog I will share my experience of playing with CIFAR-10 dataset using deep learning. I will show the impact of some deep learning techniques on the performance of a neural network.

Contents

Motivation

I have been studying deep learning and reinforcement learning for quite some time now. I have always been eager to know how each component can influence the performance of a neural network. However, I never get the chance to have a systematic study of this topic. That is why this time I decide to spend some time (and money) to run these experiments and write this blog.

Neural Network Architecture

In the experiments I use the following network architecture:

The convolutional block contains:

Each residual block contains:

Below lists the output dimension of each layer:

Layer Output Dimension
Input Image (None, 32, 32, 3)
Convolutional Block (None, 32, 32, 32)
Residual Block 1 (None, 32, 32, 32)
Residual Block 2 (None, 16, 16, 64)
Residual Block 3 (None, 8, 8, 128)
Residual Block 4 (None, 4, 4, 256)
Global Average Pooling (None, 1, 1, 256)
Dense Layer (None, 10)
Softmax (None, 10)


Experiments

Unless otherwise mentioned, all experiments use the default settings below:

epoch = 120                        # Number of epochs
batch_size = 100                   # Minibatch size

optimizer = "Adam"                 # Available optimizer, choose between ("Momentum" | "Adam")
learning_rate = [1e-3, 1e-4, 1e-5] # Learning rate for each phase
lr_schedule = [60, 90]             # Epochs required to reach the next learning rate phase

normalize_data = False             # Whether input images are normalized
flip_data = False                  # Whether input images are flipped with 50% chance
crop_data = False                  # Whether input images are zero-padded and randomly cropped

network_type = "Res4"              # Network type, choose between ("Res4" | "Conv8" | "None")
dropout_rate = 0.2                 # Dropout rate, value of 0 means no dropout
c_l2 = 0.0                         # L2 regularization, also known as weight decay
batch_norm = True                  # Whether batch normalization is applied
global_average_pool = True         # Whether global average pooling is applied

Fig. 1 shows the performance of the network with default settings. Bold lines represent the test loss (and error), while thin lines represent the training loss (and error). For convenience, the default network is denoted as res4 and later on it will be comapred to other variants.

<b>Fig. 1:</b> Performance of the network with default settings.
Fig. 1: Performance of the network with default settings.
Network Type

To compare different network structures, I trained the following variants:

From Fig. 2 we can see that

This result implies that

<b>Fig. 2:</b> Comparison of different network types.
Fig. 2: Comparison of different network types.
Regularizations

To compare different regularization methods, I trained the following variants:

From Fig. 3 we can see that

This result implies that

<b>Fig. 3:</b> Comparison of different regularization methods.
Fig. 3: Comparison of different regularization methods.
Batch Normalization

To see the impact of batch normalization, I trained the following variant:

From Fig. 4 we can see that

This result implies that

<b>Fig. 4:</b> Impact of batch normalization.
Fig. 4: Impact of batch normalization.
Global Average Pooling

To see the impact of gloal average pooling, I trained the following variant:

From Fig. 5 we can see that

This result implies that

<b>Fig. 5:</b> Impact of global average pooling.
Fig. 5: Impact of global average pooling.
Data Normalization

To see the impact of data normalization, I trained the following variant:

From Fig. 6 we can see that

This result implies that

<b>Fig. 6:</b> Impact of data normalization.
Fig. 6: Impact of data normalization.
Data Augmentation

To see the impact of data augmentation, I trained the following variants:

From Fig. 7 we can see that

This result implies that

<b>Fig. 7:</b> Impact of data augmentation.
Fig. 7: Impact of data augmentation.
Optimizer

To compare different optimizers, I trained the following variant:

From Fig. 8 we can see that

This result implies that

<b>Fig. 8:</b> Comparison of different optimizers.
Fig. 8: Comparison of different optimizers.

Conclusion

In this blog I showed you how different techniques can influence the performance of a neural network. However, those experiments are only tested under the CIFAR-10 dataset, so some of those results may not hold true for other tasks.

Besides, I only used very small networks in my experiments because training deep neural networks will comsume a lot of resources. Therefore, some interesting experiments (e.g. comparison between plain neural networks and residual networks with more layers) are not shown in this blog.

Resources