TLDR: No! Your Machine Translation Model is not "prophesying", but let's look at the six major issues with neural machine translation (NMT). ..Continue Reading
This post is an excerpt from the final chapter of our upcoming book on Deep Learning and NLP with PyTorch . The book is still a draft under review so your comments on this section are appreciated! ..Continue Reading
Two exciting NLP papers at ICML 2018! ICML 2018 accepts are out, and I am excited about two papers that I will briefly outline here. I think both papers are phenomenally good and will bring back structured prediction in NLP to modern deep learning architectures. ..Continue Reading
Different gender error rates for speech products exist mainly because of 1) our lack of better models even when the data is balanced, 2) inherent hardness of the problem ..Continue Reading
I review a recent systems paper from Google, why it is a wake-up call to the industry, and the recipe it provides for nonlinear product thinking. ..Continue Reading
In this post, I want to highlight two recent complementary results on transfer learning applied to audio -- one related to music, another related to speech. ..Continue Reading
We are preparing for the second edition of our PyTorch-based Deep Learning for NLP training. It's a two-day affair, crammed with a lot of learning and hands-on model building where we get to play the intricate dance of introducing the topics from the ground up while still making sure folks are not far from the state-of-the-art. Compared to our first attempt at NYC this year, we are adding new content, changing existing content to explain some basic ideas well. One subtopic I am quite excited to add is a discussion of "When to use Deep Learning for NLP and when not to". This post will be expanding on that. ..Continue Reading
Sometimes it's useful to put people in boxes to understand where they are coming from and the conversations they like to have. Let's talk about my tribe -- the NLP folks. ..Continue Reading
In this post, I will talk about Language Models, when (and when not) to use LSTMs for language modeling, and some state of the art results. ..Continue Reading
In the previous post , we saw how the backprop algorithm itself is a bottleneck in training, and how the Synthetic Gradient approach proposed by DeepMind reduces/avoids network locking during training. While very clever, there is something unsettling about the solution. It seems very contrived, and definitely resource intensive. For example, a simple feed forward network under the scheme has a Rube-Goldbergesque feel to it. image courtesy: Jaderberg et al (2016) . A fully unlocked feed forward net using DNI. Every time you see a solution that looks unnatural, you want to go back and ask are we solving the right problem or are we even asking the right question. Naturally, this begs the question: Is backprop is the right way to train neural networks? To answer look at this, let's step back a few paces. All machine learning algorithms are solving one kind of optimization problem or another. A majority of those optimization problems (esp. those involving real-world tasks) are non ..Continue Reading