Home page

Deconstructing Deep Learning + δeviations

Drop me an email | RSS feed link : Click
Format : Date | Title
  TL; DR

Total number of posts : 89


View My GitHub Profile

Go to index

More Deep Learning, Less Crying - A guide

Reading time : ~15 mins

by Subhaditya Mukherjee

This is a guide to make deep learning less messy and hopefully give you a way to use less tissues next time you code.

Who is this article for? A checklist

If you can answer yes to most of them. Read. Or cry. Your choice of course.

Oh yay. You made it here. Wipe your eyes for one last time because this will be a ride :)

PS. This might be a long-ish checklist but trust me, it will save you many tears. A note that the materials were compiled from way too many papers and slides so I do not have the proper citation for every statement here. In the references section you can find a list of all the ones I could find.

What is covered here?

In this article, I have tried to cover the major parts that frustrate me on a daily basis and their potential solutions.

Sensible defaults

Most of the time, contrary to popular belief we can actually get pretty great results by using some default values. Or sticking to simpler architectures before using some complicated one and messing everything up.


Let us look at some defaults we can look at while building a network. Note that this goes from easy -> complicated

Training choices

What about training? Once you have set up everything, you might be faced with endless options. What do you stick to?

Tricks to use while training

Now this is just beautiful. Do give this paper by Tong He et al a read. It’s amazing and covers these points in detail. So instead of repeating content, I have just given a tiny brief.


Have too many to choose from? Here are some you can look at in order of importance. (Thank you Josh Tobin).


Some of the most common bugs we might face and how to begin solving them.

Tackling out of memory errors

Sometimes your GPU starts cursing at you. Sometimes it’s your fault. Sometimes you just forgot the clear the cache. This is for the other times.

Your tensors are too big

You have stuffed it with too much data

You are doing the same thing too many times (Duplicated operations)

Single batch overfitting

Want a quick way to identify a bunch of errors? Just pass the same data batch again and again. And check for these signs. (Talk about a hack). Basically just do the opposite if any of these happen.

How well do you fit?

No I am not talking about that snazzy dress you got before the lockdown.


Your model cries over test data.


Your model just cries anyway.

A good choice for either

Okay cool.. now what?

Well that about covers what I wanted to say here. It is by no means an exhaustive list. But that’s why we have stack overflow right? I sincerly hope this helped you out a bit. And made you feel a bit more confident. Do let me know!! You can always reach out in the comments or connect with me from my website.


Related posts:  FP16  AI Superpowers Kai Fu Lee  Digital Minimalism Cal Newport  Super resolution  Federated Learning  Taking Batchnorm For Granted  A murder mystery and Adversarial attack  Thank you and a rain check  Pruning  Documentation using Documenter.jl