Deconstructing Deep Learning + δeviations
Drop me an email
| RSS feed link : Click
Format :
Date | Title
TL; DR
by Subhaditya Mukherjee
Paper notes for the paper
[27] HRNet (WIP)
We pretrain our network, which is augmented by a classification head shown in Figure 11, on ImageNet/ The classification head is described as below. First, the four-resolution feature maps are fed into a bottleneck and the output channels are increased from C , 2C , 4C , and 8C to 128, 256, 512, and 1024, respectively. Then, we downsample the high- resolution representation by a 2-strided 3x3 convolution outputting 256 channels and add it to the representation of the second-high-resolution. This process is repeated two times to get 1024 feature channels over the small resolution. Last, we transform the 1024 channels to 2048 channels through a 1x1 convolution, followed by a global average pooling operation. The output 2048-dimensional representation is fed into the classifier.