Home page

Deconstructing Deep Learning + ╬┤eviations

Drop me an email | RSS feed link : Click
Format : Date | Title
  TL; DR

Total number of posts : 89


View My GitHub Profile

Go to index

Pooling layers

Reading time : ~7 mins

by Subhaditya Mukherjee

Here we will look at the pooling operation and its types.

Okay so what is a pooling layer? Let us take this matrix.

1 2
3 4

If we take max pool the output should be 4. Simple enough? That is pretty much it.

The challenge is to apply it to a whole image while taking into account different pooling sizes, different image depths (2 for gray 3 for color) as well as different strides (how many columns to move while doing a pool).

Let us get to it.

Max pool

We first check if the image has a depth or not (aka does it have more than 2 channels). Note that most of the code remains the same for both except a few minor changes (I have to reduce the code size and remove redundancy but for the sake of this example I willl leave it as it is).

With depth

Okay. Case 1. Where there is a depth. We first calculate the size of the output image so we can pre allocate an array.

\[output_{height} = \frac{input_{height} - poolsize}{stride} ; output_{width} = \frac{input_{width} - poolsize}{stride}\]

Once we have that, we can allocate an array with those dimensions.

function maxpool(img, pool_size=2,stride = 2,depth=false)

    if depth == true
        input_d , input_h, input_w = size(img)
         output_h = Integer((input_h - pool_size)/stride )
        output_w = Integer((input_w - pool_size)/stride )
        result = zeros(input_d,output_h, output_w)  

Now for the main loop. We iterate over the width and height of the output, and since we have a depth, we can use the : index (which takes all the channels) , i, j. Then we can allocate this to the maximum of the current window we are considering.

for i in collect(1:output_h),j in collect(1:output_w)  
            result[:, i,j] .= maximum(img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])  

Without depth

If we do not have a depth, then we simply ignore the : and use the rest of it.

        input_h, input_w = size(img)
        output_h = Integer((input_h - pool_size)/stride )
        output_w = Integer((input_w - pool_size)/stride )
        result = zeros(output_h, output_w)  

        for i in collect(1:output_h),j in collect(1:output_w)
            result[i,j] = maximum(img[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])

    return result

Now if we take our beloved mandrill and apply max pool to one channel of it. Say pool size 10 and stride 2. (These are the actual output sizes)


Pool size 11, stride 1. Note that this image is bigger because our stride is lesser.


Average pooling

Okay how about average pooling? We just need to change a bit from the previous code. We create a view into the current window, then instead of maximum, we take the sum and divide it by the total number of elements in the window. (aka average)

temp = @view img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size]

result[:, i,j] .= sum(temp)/prod(size(temp))

Pool size 10 and stride 1.


See the difference? This is why max pool is more commonly used. Average pool blurs out too much of the information.

Related posts:  FP16  AI Superpowers Kai Fu Lee  Digital Minimalism Cal Newport  More Deep Learning, Less Crying - A guide  Super resolution  Federated Learning  Taking Batchnorm For Granted  A murder mystery and Adversarial attack  Thank you and a rain check  Pruning