Linear Regression (III) ML

Greetings, In the previous blog we discussed the maths and understanding behind the linear regression. In this blog we will focus on :

Example of Linear Regression.

Example of Linear Regression :

Let's take a simple example with c = 0 and very linear data points for simplicity. Our training dataset is :

X : 1 , 2 , 3 , 4.

y : 2 , 4 , 6 , 8.

1st Iteration :

Just because it is the first iteration, the code needs to initialize the weights by himself. Hence the weights at the beginning close to 0 and continuous.

The equations are :

`y = W \times X `

`W = 0.1`

`m = 4` i.e number of rows or observations or data points.

X	y	ŷ = W`\times` X + c Where ( W = 1.5 )
1	2	1`\times` 0.1 = 0.1
2	4	2`\times` 0.1 = 0.2
3	6	3`\times` 0.1 = 0.3
4	8	4`\times` 0.1 = 0.4

Now, We have to calculate the loss i.e error.

`loss = \sum \frac{(y - ŷ)^{2}}{2m}`

`loss = \frac{(2 - 0.1)^{2} + (4 - 0.2)^{2} + (6 - 0.3)^{2} + (8 - 0.4)^{2}}{2m}`

`loss = \frac{(2 - 0.1)^{2} + (4 - 0.2)^{2} + (6 - 0.3)^{2} + (8 - 0.4)^{2}}{2 \times 4}`

`loss = \frac{(1.9)^{2} + (3.8)^{2} + (5.7)^{2} + (7.6)^{2}}{8}`

`loss = \frac{(3.61) + (14.44) + (32.49) + (57.76)}{8}`

`loss = \frac{108.29}{8}`

`loss = 13.53`

The visualization will be:

Fig 1. 1st iteration X and y plot (desmos.com)

Fig 2. Weights vs loss (desmos.com)

Now the weights should be updated :

The equation of gradient descent is :

Wnew `= Wold - \eta\frac{\partial Loss}{\partial Wold}`

`\eta\frac{\partial Loss}{\partial Wold}` will be negative because slope i.e gradient is negative. the equation of the optimization will be :

Wnew `= Wold - ( - \eta\frac{\partial Loss}{\partial Wold} )`

Wnew `= 0.1 + 0.1\frac{\partial 13.53}{\partial 0.1}`

where

`\eta` i.e learning rate is 0.1.

The value will be approximately equal to 1.5 (just for simplicity and to solve the problem quickly).

Wnew = 1.5

Hence the new weight i.e W will be 1.5

2nd Iteration :

X	y	ŷ = W`\times` X + c Where ( W = 1.5 )
1	2	1`\times` 1.5 = 1.5
2	4	2`\times` 1.5 = 3
3	6	3`\times` 1.5 = 4.5
4	8	4`\times` 1.5 = 6

Now, We have to calculate the loss i.e error.

`loss = \sum \frac{(y - ŷ)^{2}}{2m}`

`loss = \frac{(2 - 1.5)^{2} + (4 - 3)^{2} + (6 - 4.5)^{2} + (8 - 6)^{2}}{2m}`

`loss = \frac{(2 - 1.5)^{2} + (4 - 3)^{2} + (6 - 4.5)^{2} + (8 - 6)^{2}}{2 \times 4}`

`loss = \frac{(0.5)^{2} + (1)^{2} + (1.5)^{2} + (2)^{2}}{8}`

`loss = \frac{(0.25) + (1) + (2.25) + (4)}{8}`

`loss = \frac{7.5}{8}`

`loss = 0.93`

The visualization will be:

Fig3. 2nd iteration X vs y plot (desmos.com)

Fig 4. 2nd iteration weights vs loss (desmos.com)

Now the weights should be updated :

The equation of gradient descent is :

Wnew `= Wold - \eta\frac{\partial Loss}{\partial Wold}`

`\eta\frac{\partial Loss}{\partial Wold}` will be negative because slope i.e gradient is negative. the equation of the optimization will be :

Wnew `= Wold - ( - \eta\frac{\partial Loss}{\partial Wold} )`

Wnew `= 0.1 + 0.1\frac{\partial 0.93}{\partial 1.5}`

where

`\eta` i.e learning rate is 0.1.

The value will be approximately equal to 1.5 (just for simplicity and to solve the problem quickly).

Wnew = 1.9

Hence the new weight i.e W will be 1.9.

3rd Iteration :

X	y	ŷ = W`\times` X + c Where ( W = 1.9)
1	2	1`\times` 1.9 =1.9
2	4	2`\times` 1.9 =3.8
3	6	3`\times` 1.9 =5.7
4	8	4`\times` 1.9 =7.6

Now, We have to calculate the loss i.e error.

`loss = \sum \frac{(y - ŷ)^{2}}{2m}`

`loss = \frac{(2 - 1.9)^{2} + (4 - 3.8)^{2} + (6 - 5.7)^{2} + (8 - 7.6)^{2}}{2m}`

`loss = \frac{(2 - 1.9)^{2} + (4 - 3.8)^{2} + (6 - 5.7)^{2} + (8 - 7.6)^{2}}{2 \times 4}`

`loss = \frac{(0.1)^{2} + (0.2)^{2} + (0.3)^{2} + (0.4)^{2}}{8}`

`loss = \frac{(0.01) + (0.04) + (0.09) + (0.16)}{8}`

`loss = \frac{0.3}{8}`

`loss = 0.03`

The visualization will be:

Fig 5. X vs y (desmos.com)

Fig 6. weights vs loss (desmos.com)

As you can see that the loss is 0.03. which is too low and our pointer is also at the optimum but this process will go on till the number of iteration provided by the user. Hence this will go on up to n number of iteration. The change in weights would be barely noticeable because the loss is too low. Hence the weight will vary from 1.9 to 2.1 in each iteration going forward because the loss will be less every time.

Suppose after all the n number of iteration the weight we got is 2. Now how the model will predict the test cases. the test cases are :

X : 6 , 7 , 8 .

y :12 , 14 , 16.

ŷ = 2 `\times` 6 = 12

ŷ = 2 `\times` 7 = 14

ŷ = 2 `\times` 8 = 16

ŷ : 12 , 14 , 16.

Summary :

In this blog, we covered an example of linear regression which will make things clear. From the next blog, we will focus on the assumption and other theory details.

-Santosh Saxena

Linear Regression (III) ML - 5

Example of Linear Regression :

1st Iteration :

2nd Iteration :

3rd Iteration :

Summary :

Contact Form