Deconstructing Deep Learning + δeviations

Drop me an email
| RSS feed link : Click

Format :
Date | Title

TL; DR

Index page

# swish

# Notes

## Tips for activation fns

**[24]** swish
- Ramachandran, P., Zoph, B., & Le, Q. V. (2017). Searching for activation functions. arXiv preprint arXiv:1710.05941. Paper

- Supposedly does better than selu and relu sometimes (Nope)
- Slower than relu
- f (x) = x · sigmoid(βx)
- found using RL techniques using "core units"
- simple search space inspired by the optimizer search space of Bello et al. (2017) that composes unary and binary functions to construct the activation function
- activation function is constructed by repeatedly composing the the “core unit”, which is defined as b(u1 (x1 ), u2 (x2 )). The core unit takes in two scalar inputs, passes each input independently through an unary function, and combines the two unary outputs with a binary function that outputs a scalar

- Complicated activation functions consistently underperform simpler activation functions, potentially due to an increased difficulty in optimization.
- A common structure shared by the top activation functions is the use of the raw preactiva- tion x as input to the final binary function
- The searches discovered activation functions that utilize periodic functions, such as sin and cos.
- Functions that use division tend to perform poorly because the output explodes when the denominator is near 0