Home page

Deconstructing Deep Learning + ╬┤eviations

Drop me an email | RSS feed link : Click
Format : Date | Title
  TL; DR

Total posts : 78

View My GitHub Profile

Index page

Pooling layers

Here we will look at the pooling operation and its types.

Okay so what is a pooling layer? Let us take this matrix.

1 2
3 4

If we take max pool the output should be 4. Simple enough? That is pretty much it.

The challenge is to apply it to a whole image while taking into account different pooling sizes, different image depths (2 for gray 3 for color) as well as different strides (how many columns to move while doing a pool).

Let us get to it.

Max pool

We first check if the image has a depth or not (aka does it have more than 2 channels). Note that most of the code remains the same for both except a few minor changes (I have to reduce the code size and remove redundancy but for the sake of this example I willl leave it as it is).

With depth

Okay. Case 1. Where there is a depth. We first calculate the size of the output image so we can pre allocate an array.

$$output_{height} = \frac{input_{height} - poolsize}{stride} ; output_{width} = \frac{input_{width} - poolsize}{stride}$$

Once we have that, we can allocate an array with those dimensions.

function maxpool(img, pool_size=2,stride = 2,depth=false)

    if depth == true
        input_d , input_h, input_w = size(img)
         output_h = Integer((input_h - pool_size)/stride )
        output_w = Integer((input_w - pool_size)/stride )

        result = zeros(input_d,output_h, output_w)  

Now for the main loop. We iterate over the width and height of the output, and since we have a depth, we can use the : index (which takes all the channels) , i, j. Then we can allocate this to the maximum of the current window we are considering.

for i in collect(1:output_h),j in collect(1:output_w)  
            result[:, i,j] .= maximum(img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])  

Without depth

If we do not have a depth, then we simply ignore the : and use the rest of it.

        input_h, input_w = size(img)
        output_h = Integer((input_h - pool_size)/stride )
        output_w = Integer((input_w - pool_size)/stride )

        result = zeros(output_h, output_w)  

        for i in collect(1:output_h),j in collect(1:output_w)
            result[i,j] = maximum(img[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])


    return result

Now if we take our beloved mandrill and apply max pool to one channel of it. Say pool size 10 and stride 2. (These are the actual output sizes)


Pool size 11, stride 1. Note that this image is bigger because our stride is lesser.


Average pooling

Okay how about average pooling? We just need to change a bit from the previous code. We create a view into the current window, then instead of maximum, we take the sum and divide it by the total number of elements in the window. (aka average)

temp = @view img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size]

result[:, i,j] .= sum(temp)/prod(size(temp))

Pool size 10 and stride 1.


See the difference? This is why max pool is more commonly used. Average pool blurs out too much of the information.