What happens when the input goes into the model? It makes its way through the network architecture to the output layer. This process is called Feed-forward process and results in the prediction or classification of the problem we’re building the model for. We will begin with a brief overview and then peel the layers as we proceed.

*Things to know: *

- Vectors
- Matrices
- Matrix Multiplication /Dot Product

*Components of a Feed-forward process:*

- Input Layer
- Weight matrices
- Activation functions
- Bias

*The Process- A brief overview:*

Your data goes into the model in the form of an input vector. This vector is then multiplied by a matrix of weights. The result of this multiplication or dot product is another vector. An activation is applied to this vector which gives us scaled down values we can work with. This vector then becomes our input vector for the hidden layer.

If we have more layers, the process continues until we reach the output layer and have our prediction. If we have only one hidden layer, the layer after it is the output layer which gives us our prediction.

*Peeling the layers:*

**Input Layer- ** Consider our housing data example and suppose our data looks something like this,

INSTANCE | Sqft | num_bed | num_bath |
---|---|---|---|

house 1 | 1500 | 2 | 2 |

house 2 | 1700 | 3 | 2 |

house 3 | 1750 | 3 | 3 |

Our input vector will be a vector of features of instance 1, i.e. house 1. We will have a feature vector of all features of the instances and pass them through the model. We keep it simple with only 3 features and 3 instances.

** Why do we represent it as vectors?** The data needs to be represented numerically for the model to be able to understand it, so we use vectors to represent our input data. Every instance of data has it’s own n-dimensional vector, where n is the number of features. In the diagram, x1 will be the feature ‘sqft’, x2 will be ‘num_bed’ and x3 will be ‘num_bath’.

**Weights- **Weights determine how much importance or weightage needs to be given to a particular feature. Weights are usually initialized randomly and then adjusted in the back-propagation step. Before the inputs are transferred to the hidden layer, we take the dot product of input vector and weight matrix and sum them up to get weighted input values which are then passed through the activation function.

**Activation Functions –** Applying activation function is the step we take after taking the dot product. Remember how sometimes you’re struggling with a concept and you go through tutorial after tutorial but nothing “clicks”. Some are too vague, some don’t answer your question and some are all over the place leaving you more confused. Then you come to “the one” that is perfect, concise yet comprehensive and well, it “clicks”. This “clicking” is called ‘firing” in terms of neurons and the activation decides whether to fire the neuron or not. If the product of the input and weight looks promising, it “clicks” and the neuron fires and connects to the next one in the next layer.

Activations determine how strong of a connection neurons have with each other. There are many different types of activations we can use and our choice depends on the problem we’re solving. For example, we’ll use a sigmoid when we use logistic regression to predict or classify between two classes and we use a softmax activation when we do a multi-class classification. Both of these crush the dot product to a value between 0 and 1. After applying an activation the result goes on to the hidden layer.

**Bias- **Bias is called bias because that’s what it does in a neuron. It decides in favor of or against whether a neuron should be fired. Biases are added to the sum of the weighted inputs before applying the activation. Biases can be learned and changed during learning to produce better results.

While some networks may work fine without bias, it’s always preferable to add bias to our model. In most examples you’ll find bias being initialized to 1.

So,

Step 1 – vector representation of inputs.

Step 2 – dot product of input vector and randomly initialized weight matrix.

Step 3 – Sum of weighted inputs.

Step 4 – Apply activation to the sum of weighted inputs.

Step 5 – Goes on to a neuron in the hidden layer.

It keeps on progressing this way till we reach the output layer, where we will analyze our output and begin with the back-propagation step. **Coming next: **Neural Networks – Back-propagation*Thank you for your time!!! All input is welcome, constructive criticism is deeply appreciated.*