The Unreasonable Popularity of TensorFlow

by  on

Here I look at how Tensorflow is gaining momentum over other competing frameworks. ..Continue Reading

Gradient Noise Injection Is Not So Strange After All

by  on

Yesterday, I wrote about a  gradient noise injection  result at ICLR 2016, and noted the authors of the paper, despite detailed experimentation, were very wishy washy in their explanation of why it works. Fortunately, my Twitter friends, particularly Tim Vieira  and Shubhendu Trivedi , grounded this much better than the authors themselves! Shubhendu pointed out Rong Ge (of MSR) and friends tried this in the context of Tensor Decomposition in 2015 (at some point I should write about connection between backprop and matrix factorization). Algorithm 1. in that paper is pretty much the update equation of the recent ICLR paper (modulo actual values of the constants). Ge, R., Huang, F., Jin, C. and Yuan, Y., 2015. Escaping From Saddle Points---Online Stochastic Gradient for Tensor Decomposition. arXiv preprint arXiv:1503.02101 . Shubhendu also added this goes further back in literature. And indeed it does. Tim pointed an optimization paper from 1993  where they call gradient noise ..Continue Reading

Make your Stochastic Gradient Descent more Stochastic

by  on

Results in Deep Learning never cease to surprise me. One  ICLR 2016 paper from Google Brain team suggests a simple 1-line code change to improve your parameter estimation across the board — by adding a Gaussian noise to the computed gradients. Typical SGD updates parameters by taking a step in the direction of the gradient (simplified): Instead of doing that the suggestion is add a small random noise to the update: Further, is prescribed to be: and is one of ! Stop. Stare at that for a while. Enjoy this magic. As with such things, the authors give no theoretical justification other than showing it to work on a variety of networks (kudos for that), and a hand wavy connection to simulated annealing, but examining the expression for  should tell that the additive noise is highest at the beginning and has little to no effect during later stages of training. Like other methods for careful initialization, this should be effective in breaking symmetries and getting the training ..Continue Reading

Should you get the new NVIDIA DGX-1 for your startup/lab?

by  on

Is a $129,000 computer an investment or a distraction for your lab ..Continue Reading

Science, Practice, and Frustratingly Simple Ideas

by  on

Yesterday, I wrote (excitedly) about stochastic depth in neural networks . The reactions I saw for that paper ranged from, "dang! I should've thought of that" to, umm, shall we say annoyed? This reaction is not surprising at all. The idea was one of those "Frustratingly Simple" ideas that worked. If you read the paper, there is no new theory or model there. Neither do the authors spend a lot of time on why things work other than a hand-wavy explanation of ensembles. Critics might argue if there was any "contribution to science" -- I'm sure some reviewers will. The fact of the matter is nobody knows * exactly *  why this works. My guess is this: A lot of the regularization we are seeing is probably coming from preventing the layers from co-adapting with each other. Just as dropout  discourages  adjacent layers from coadapting with each other, my guess is stochastic depth is discouraging entire subsets of layers from coadapting with each other. No doubt, there is an army of ..Continue Reading

Stochastic Depth Networks will Become the New Normal

by  on

A new way to train robust networks ..Continue Reading

Chances are your Models are Racist, Sexist, or both

by  on

How one consulting experience sent me tumbling down a fairness rabbit hole and lessons I learned from that. ..Continue Reading

Word Embedding as a Learning To Rank Problem

by  on

Can we embed words by looking at it as a ranking problem? ..Continue Reading

Swivel by Google – a bizarre word embedding paper

by  on

A new word embedding paper came out of Google that promises to look at things missed by Word2Vec and Glove , provide a better understanding, and a better embedding.   ..Continue Reading

On Reading Papers

by  on

One of the most important skill I learnt in grad school was to break down any complex paper into bite sized parts, digest them, and ask critical questions about them. On the face of it, it sounds like an obvious skill, right? Just like knowing how to talk doesn't make you a good public speaker, knowing how to read doesn't automatically make you prepared to read (and absorb) papers.  ..Continue Reading

Copyright © 2021. Delip Rao