by Subhaditya Mukherjee
Here we will talk about VGG networks and how to implement VGG16 and VGG19.
I took a break for a few days because I was Seriously burnt out. But I am back! So here we go. In the meanwhile I decided to fix Flux.jl model zoo because it is an absolute disaster. I hope it will get accepted. I spoke to one of the maintainers - Dhairya Gandhi who helped me out a bit. PR
I also could not figure out the … operator which I got to know from the community. I would encourage everyone to participate there because it truly is an awesome place to learn and share stuff regarding Julia and Scientific Machine Learning in general.
Without further delay, what is VGG?? Simply put, it is a really popular Deep learning architecture. It is widely used for image recognition and does pretty well. There are two major variants of it - VGG16 and VGG19 both of which differ slightly in their architectures.
Do read the paper if you get a chance - Link But an executive summary of sorts and some things I learnt from the paper are as follows.
As usual let us get all our packages
using Flux, Statistics using Flux: onehotbatch, onecold, crossentropy, throttle using Base.Iterators: repeated, partition using Metalhead:trainimgs, CIFAR10 using Images
We will also define a function to make the images into an array of Float32 and the order required by flux for every image dimension.
getarray(X) = Float32.(permutedims(channelview(X), (2, 3, 1)))
Now let us load CIFAR10. Then we make a list of all the images in it. And make them into batches. We also one hot encode the labels and push the batches to the GPU. This is our train set.
Rinse and repeat for test set.
X = trainimgs(CIFAR10) imgs = [getarray(X[i].img) for i in 1:50000]; labels = onehotbatch([X[i].ground_truth.class for i in 1:50000],1:10); train = gpu.([(cat(imgs[i]..., dims = 4), labels[:,i]) for i in partition(1:49000, 100)]); valset = collect(49001:50000) valX = cat(imgs[valset]..., dims = 4) |> gpu valY = labels[:, valset] |> gpu
Since a lot of this model has repeats, we will define blocks that will reduce the redundancy in the model. We first define a block of [Conv -> ReLU -> Batchnorm]. Note that we take into account the input and output channels.
conv_block(in_channels, out_channels) = ( Conv((3,3), in_channels => out_channels, relu, pad = (1,1), stride = (1,1)), BatchNorm(out_channels))
Once we have that, we can define a second block of [Conv -> ReLU -> Batchnorm -> Conv -> ReLU -> Batchnorm -> Max Pool].
double_conv(in_channels, out_channels) = ( conv_block(in_channels, out_channels)..., conv_block(out_channels, out_channels)..., MaxPool((2,2)))
And finally we can define another bigger block of [Conv -> ReLU -> Batchnorm -> Conv -> ReLU -> Batchnorm -> Conv -> ReLU -> MaxPool]
triple_conv(in_channels, out_channels) = ( conv_block(in_channels, out_channels), conv_block(out_channels, out_channels), Conv((3,3), out_channels => out_channels, relu, pad = (1,1), stride = (1,1)), MaxPool((2,2)))
The above blocks save a looot of redundancy while taking into account the input and output, so we can now directly go to the architecture. There is nothing special here except the way the number of channels increase from 3 to 4096. The last few layers are linear which we are using for classification. Of course softmax is also needed. The … is the splat operator. Yes its called that xD All it does is unrolls a structure like our tuple here.
We finally pop these into the GPU.
vgg16(initial_channels, num_classes) = Chain( double_conv(initial_channels, 64)..., double_conv(64,128)..., conv_block(128, 256)..., double_conv(256, 256)..., conv_block(256, 512)..., double_conv(512, 512)..., conv_block(512, 512)..., double_conv(512, 512)..., x -> reshape(x, :, size(x, 4)), Dense(512, 4096, relu), Dropout(0.5), Dense(4096, 4096, relu), Dropout(0.5), Dense(4096, num_classes), softmax ) |> gpu
How about VGG19??
vgg19(initial_channels, num_classes) = Chain( double_conv(initial_channels, 64)..., double_conv(64, 128)..., conv_block(128,256), triple_conv(256,256)..., conv_block(256,512), triple_conv(512,512)..., conv_block(512,512), triple_conv(512,512)..., x -> reshape(x, :, size(x, 4)), Dense(512, 4096, relu), Dropout(0.5), Dense(4096, 4096, relu), Dropout(0.5), Dense(4096, num_classes), softmax) |> gpu
Yes. That’s it. Of course, please dont be fooled by the size. We have hidden away the layers in the functions we defined before. But it is that simple to define these things.
Now let us initialize our model.
m = vgg19(3, 10) # OR m = vgg16(3, 10)
Let us use cross entropy loss. And we will also define an accuracy function which is basically a sum of the number of labels that match from our test set on which we predict using our model.
loss(x, y) = crossentropy(m(x), y) accuracy(x, y) = mean(onecold(m(x)) .== onecold(y))
Now that we have that, we can define the number of epochs(10), the optimizer (Adam) and finally run our models.
evalcb = throttle(() -> @show(accuracy(valX, valY)), 10) opt = ADAM() Flux.train!(loss, params(m), train, opt, cb = evalcb)
Please learn to take breaks. Do not be me. I spent the past few months doing Deep learning everyday and burnt out like a candle. Took me a week or so of doing other things to even be able to write this. Take breaks. Qurantine has really messed with us all. But hey! We will get through it. And come out better, smarter and more in love with people and this world.