Deep Learning: Perceptron and Multi-Layered Perceptron

Published in

Analytics Vidhya

5 min readJan 24, 2021

In this article, I will explain what is a perceptron and multi-layered perceptron and the maths behind it.

The above diagram is the building block of the whole of deep learning. Perceptrons bear similarity to neurons as the structure is very similar. Perceptron also takes input and give output in the same fashion as a neuron does. Hence the name neural network is generally used to name the models in deep learning.

Perceptrons are the building block of all the architectures in deep-learning.

The input given to the perceptron is the dot product of weights and the input. The function takes this input and gives some output. If the output is greater than 0 then the final output(y^) is 1 else 0. You can choose any function as an activation function. For ex: sigmoid, tanh, relu, etc.

Multi-Layered Perceptron(MLP):

As the name suggests that in MLP we have multiple layers of perceptrons. MLPs are feed-forward artificial neural networks. In MLP we have at least 3 layers. The first layer is called the input layer, the next ones are called hidden layers and last on is called the output layer. The nodes in the input layer don’t have activation, in fact, the nodes in the input layers represent the data point. If the data point is represented using a d-dimensional vector then the input layer will have d nodes. The below diagram will make the point more clear.

In the above diagram, we have one input layer, 2 hidden layers, and the last final layer. All layers are fully connected. This means the current node is connected with the nodes from the previous layer. We have a weight matrix in each layer that stores all the weight for that layer. This essentially is what we get once training is over. All these weights get updated during training using back-propagation. Don’t worry about back-propagation. We will see this in detail in the next article. Please get comfortable with the notation and subscript used in the diagram. This will help you to follow this article.

The function that we use in the nodes is non-linear. By non-linear iI mean that the output will not depend linearly on the input given to the function. This means if the input increases by 10% then it does not mean that the output will also increase by 10%, the output may increase by more than 10% or less than 10%. Activation functions such as sigmoid, tanh, etc are examples of non-linear activation functions. In my articles, I will mostly use sigmoid as an activation function. One point is worth mentioning that sigmoid will always give output between 0 and 1 only.

In the above diagram, x1 and x2 represent two values of the vector representation of the data point. Here I have taken a 2-dimensional data point. In the lines above I have mentioned that MLPs are feed-forward neural networks. Feed-forward means that the connections between nodes don’t make cycles and the connections are in one direction. Here from left to right.

Now we will how the input is feed to the network. Below is the diagram for the input given to the first hidden layer.

Now similarly the other two neurons will get input a12 and a13. Here the subscript of both input and output is the same as the neurons. Now let us see what input does the next hidden layer gets. In this, we will consider both the neurons of the second hidden layer. It will get clear from the diagram given below.

Diagram showing how input is given to the second hidden layer.

Now we will move to the final and the last layer, which is the output layer. This layer will give the final output. We will take the case of binary classification where we will give only two values. Either 1 or 0. If the output of the final layer is greater than 0.5 then 1 else 0. The below diagram will make the point more clear.

Diagram showing output generation from the output layer.

The whole process is shown till now in which we fed input to the MLP or neural network in the forward direction. Hence the terminology used is a feed-forward neural network. Now since we have the predicted label for a given data point we can calculate the loss. We may calculate loss using mean-squared error or log-loss(also called cross-entropy) formula. If you want to get a better understanding of what is cross-entropy the go through this article.

If you don’t understand the loss functions, don't worry. I will cover this when we will learn how the weights get updated in MLP using back-propagation.

I hope till this point you might have got a clear picture of how we give input to an MLP. In the next articles, we will learn how an MLP gets trained when we give the input.

Articles in the sequence. These articles will be coming soon.

Back-propagation and maths behind it.

Coding your first perceptron.

Coding your first MLP.

If you are new to ML and DL and wondered how to prevent models from overfitting and underfitting the please go through this blog.

Overfitting And Underfitting In Machine Learning

In this article, you will learn what overfitting and underfitting are. You will also learn how to prevent the model…

towardsdatascience.com

Refrences:

Deep Learning: Perceptron and Multi-Layered Perceptron

Multi-Layered Perceptron(MLP):

Articles in the sequence. These articles will be coming soon.

Overfitting And Underfitting In Machine Learning

In this article, you will learn what overfitting and underfitting are. You will also learn how to prevent the model…

Written by Ritesh Ranjan