In this post, I will talk about Language Models, when (and when not) to use LSTMs for language modeling, and some state of the art results. While I mostly discuss the “Exploring Limits” paper, I’m adding a few things elementary (for some) here for completeness sake. The Exploring Limits paper is not new, but I think it’s a good illustration […]
Tag Archives | Deep Learning
The Unreasonable Popularity of TensorFlow
In this post, I will look at how TensorFlow has gained momentum over competing projects. Unless you’re living away from all of this on a beach (or under a rock if you wish), you already know TensorFlow is a Computational Graph framework, and you hear it being tossed around in the context of Deep Learning/Neural Networks. I […]
Make your Stochastic Gradient Descent more Stochastic
Results in Deep Learning never cease to surprise me. One ICLR 2016 paper from Google Brain team suggests a simple 1-line code change to improve your parameter estimation across the board — by adding a Gaussian noise to the computed gradients. Typical SGD updates parameters by taking a step in the direction of the gradient (simplified): […]
Should you get the new NVIDIA DGX-1 for your startup/lab?
NVIDIA announced DGX-1, their new “GPU supercomputer”. The spec is impressive. Performance, even more so (training AlexNet in 2 hours with 1 node). Costs $129K. Running this would take around 3KW. That’s like keeping an oven going. The cheapest (per hour) best config you can currently get from AWS is g2.8xlarge: So for $129K you […]
Science, Practice, and Frustratingly Simple Ideas
Yesterday, I wrote (excitedly) about stochastic depth in neural networks. The reactions I saw for that paper ranged from, “dang! I should’ve thought of that” to, umm, shall we say annoyed? This reaction is not surprising at all. The idea was one of those “Frustratingly Simple” ideas that worked. If you read the paper, there […]
Stochastic Depth Networks will Become the New Normal
.. in deep learning that is. Update: This post apparently made a lot of people mad. Check out my next post after this :-) Everyday a half dozen or so new deep learning papers come out on ArXiv, but very few catch my eye. Yesterday, I read about “Deep Networks with Stochastic Depth“. I think, like dropout […]
Universal Function Approximation using TensorFlow
A multilayered neural network with even a single hidden layer can learn any function. This universal function approximation property of multilayer perceptrons was first noted by Cybenko (1989) and Hornik (1991). In this post, I will use TensorFlow to implement a multilayer neural network (also known as a multilayer perceptron) to learn arbitrary Python lambda expressions. (more…)