Deconstructing Deep Learning + δeviations

Drop me an email
| RSS feed link : Click

Format :
Date | Title

TL; DR

Go to index

# Pooling layers

### Reading time : ~7 mins

## Max pool

### With depth

### Without depth

## Average pooling

Related posts:
FP16
AI Superpowers Kai Fu Lee
Digital Minimalism Cal Newport
More Deep Learning, Less Crying - A guide
Super resolution
Federated Learning
Taking Batchnorm For Granted
A murder mystery and Adversarial attack
Thank you and a rain check
Pruning

by Subhaditya Mukherjee

Here we will look at the pooling operation and its types.

Okay so what is a pooling layer? Let us take this matrix.

```
1 2
3 4
```

If we take max pool the output should be 4. Simple enough? That is pretty much it.

The challenge is to apply it to a whole image while taking into account different pooling sizes, different image depths (2 for gray 3 for color) as well as different strides (how many columns to move while doing a pool).

Let us get to it.

We first check if the image has a depth or not (aka does it have more than 2 channels). Note that most of the code remains the same for both except a few minor changes (I have to reduce the code size and remove redundancy but for the sake of this example I willl leave it as it is).

Okay. Case 1. Where there is a depth. We first calculate the size of the output image so we can pre allocate an array.

\[output_{height} = \frac{input_{height} - poolsize}{stride} ; output_{width} = \frac{input_{width} - poolsize}{stride}\]Once we have that, we can allocate an array with those dimensions.

```
function maxpool(img, pool_size=2,stride = 2,depth=false)
if depth == true
input_d , input_h, input_w = size(img)
output_h = Integer((input_h - pool_size)/stride )
output_w = Integer((input_w - pool_size)/stride )
result = zeros(input_d,output_h, output_w)
```

Now for the main loop. We iterate over the width and height of the output, and since we have a depth, we can use the : index (which takes all the channels) , i, j. Then we can allocate this to the maximum of the current window we are considering.

```
for i in collect(1:output_h),j in collect(1:output_w)
result[:, i,j] .= maximum(img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])
end
```

If we do not have a depth, then we simply ignore the : and use the rest of it.

```
else
input_h, input_w = size(img)
output_h = Integer((input_h - pool_size)/stride )
output_w = Integer((input_w - pool_size)/stride )
result = zeros(output_h, output_w)
for i in collect(1:output_h),j in collect(1:output_w)
result[i,j] = maximum(img[i*stride:i*stride+pool_size, j*stride:j*stride+pool_size])
end
end
return result
end
```

Now if we take our beloved mandrill and apply max pool to one channel of it. Say pool size 10 and stride 2. (These are the actual output sizes)

Pool size 11, stride 1. Note that this image is bigger because our stride is lesser.

Okay how about average pooling? We just need to change a bit from the previous code. We create a view into the current window, then instead of maximum, we take the sum and divide it by the total number of elements in the window. (aka average)

```
temp = @view img[:, i*stride:i*stride+pool_size, j*stride:j*stride+pool_size]
result[:, i,j] .= sum(temp)/prod(size(temp))
```

Pool size 10 and stride 1.

See the difference? This is why max pool is more commonly used. Average pool blurs out too much of the information.