Guide to LSTMs for beginners.

Ritesh Ranjan
Analytics Vidhya
Published in
4 min readMar 19, 2020

--

In this blog, you will get to learn what is LSTM, why we need this and overview of the internal architecture of LSTM.

Taken from google

Before going to LSTMs you need to know what are RNNs. This will help you to understand why LSTMs were needed in the first place.

Brief Overview of RNNs:

Traditional neural networks were not designed to handle sequence data hence we needed an architecture that can work where we have sequence data. An example of this can be the price of a share in the share market or heartbeat data or a sentence. In each of these examples if you want to predict what will be the price of share tomorrow or the heartbeat reading for the next minute or the next word of the sentence then our conventional neural networks will fall apart. Hence we needed new architecture and RNNs were created.

All was good but RNNs suffered from a major drawback. If the length of sequence considered to predict the next part of the sequence is small then it worked fine but for longer sequence length it was failing. This was happening because of the vanishing gradient while updating the weights during backpropagation. Therefore we need to modify the RNNs such that it also preserves the previous context if it is required. This led to the creation of LSTMs. Below is the architecture of RNN.

The image is taken from Colah’s blog.

RNNs have a loop so that they preserve the previous context and use them to predict the next word/timestep in the sequence. Each cell of the chain in RNN contains a single neural layer.

Long Short Term Memory (LSTMs) :

LSTMs also have a similar structure to that of RNNs but four neural networks and they interact with themself which helps it to overcome long term dependency and remove the drawbacks of RNNs. Below is the diagram of an LSTM network.

The image is taken from Colah’s blog.

In this blog, I will not be explaining how LSTMs work rather I will explain the architecture only. To know the intuition behind LSTM please do read Colah’s blog as he has explained it beautifully. The following will help to understand the notations used.

The image is taken from Colah’s blog.

In the above diagram, you can see we have a horizontal running through the cell from one end to another. This is the cell state. If you don’t understand all these then I will suggest you go through Colah’s blog first.

Well, now you know how all these layers works let me explain to you how LSTM architecture will be once you define it in Keras. Now I will be defining an LSTM in Keras and will explain to you how will the architecture look.

lstm_layer1 = LSTM(64)(input)

This will create an LSTM network. The following diagram will explain it clearly. I will show you the architecture of one neural network. All four will have the same architecture.

This shows how to calculate the number of parameters of an LSTM network.

For bidirectional LSTMs just multiply this by 2 and you will get the number of parameters that will get updated during training the model.

The value between 0 and 1 that we get from this network is multiplied with each element of the cell state. This decides how much we should remember from the previous timestep. To get a better understanding of what other neural networks does please go through Colah’s blog.

Thank You. If you have any doubt or need a more descriptive explanation then feel free to contact me.

If you are new to ML and DL and wondered what does calibration means and why we even use this the go through this blog.

--

--

Ritesh Ranjan
Analytics Vidhya

Machine Learning Enthusiast. Writer in Towards Data Science, Analytics Vidhya, and AI In Plain English. LinkedIn: https://www.linkedin.com/in/riteshranjan11055/