Deconstructing Deep Learning + δeviations

Drop me an email
| RSS feed link : Click

Format :
Date | Title

TL; DR

Go to index

# Linear Model

### Reading time : ~4 mins

Related posts:
FP16
AI Superpowers Kai Fu Lee
Digital Minimalism Cal Newport
More Deep Learning, Less Crying - A guide
Super resolution
Federated Learning
Taking Batchnorm For Granted
A murder mystery and Adversarial attack
Thank you and a rain check
Pruning

by Subhaditya Mukherjee

Let us start the fun with a simple model which will be extended to fit complex needs.

The first and easiest model we can think of to start with is the Dense or fully connected Linear layer model. Once we have this working, this can be extended further. So what is this layer? Simply put, W*x.+b where W and b are the initialized weights and biases. For now these are random. Later we will check out Kaiming/glorot initialization instead.

Let us make W and b first. And let us also take a random x, y array so we can perform the task at hand.

```
W = rand(2,5)
b = rand(2)
x,y = rand(5),rand(2)
```

Okay now that we have this. We need to find the gradients aka backprop. In Julia, we use a library called Zygote. Basically this is a differential programming library which will allow us to calculate the gradients on the fly and pretty easily. To do this we use.

```
gs = gradient(() -> loss(x, y), Params([W, b]))
```

Once we have this, we can use it to find the updated weights. We can now apply our linear layer that we defined earlier. We take the previous weight and update it as α.*W̄ where α is the learning rate. An extremely important parameter which we will keep coming back to as we go along.

```
W̄ = gs[W]
W.= α.*W̄
ŷ = W*x.+b
```

Okay now on to finding exactly how good our model did. To do this we define a simple loss function. Here we will use Mean Squared error. It is the simplest metric and all it does is find the square of the mean of the differences between the predicted and original. This is also simple to define.

```
sum((y-ŷ).^2)
```

So now we have a proper loop! Which actually runs and gives us a minimized loss! We run this bit as many times as needed to get results.

```
for a in collect(1:100)
gs = gradient(() -> loss(x, y), Params([W, b]))
W̄ = gs[W]
W.= α.*W̄
ŷ = W*x.+b
@info sum((y-ŷ).^2)
end
```