Linear regression (VII) ML-9

Greetings, This blog contains the required information related to some prerequisites before getting into the code of Linear regression. In the previous blog, we have discussed all the concepts of linear regression. This blog focuses on :

Equations that can be used in coding.
solving of gradient descent equation.
Algorithm of Linear regression.

Some prerequisites before moving to code :

Equations that are used in linear regression :

`y = W1 \times X1 + W0`. where( W is weight ) .... (i)

`y = W^\mathsf{T} \times X`. where(W and X are vectors)

`loss = \frac{\sum(y -ŷ )^{2}}{2m}`..... (ii)

Wnew `= Wold - \eta \frac{\partial loss}{\partial Wold} ` (where `\eta` is learning rate ) .... (iii)

As you know, It is difficult to equate equation (iii) because of differentiation. To overcome this issue. We have to solve the differentiation. Do not memorize the below equations, just observe the way it is getting derived. These equations are only used in the code because it is difficult to equate differentiation in code.

In `\frac{\partial loss}{\partial W} `, we will replace the loss with loss equation (refer to equation (ii))

`\frac{\partial loss}{\partial W} ` = `\frac{\partial \frac{\sum(y -ŷ )^{2}}{2m}}{\partial W} `

`\frac{\partial loss}{\partial W} ` = `\frac{\partial \sum(y -ŷ )^{2}}{2m \partial W} `

Now, we will replace ŷ with the ŷ equation (refer to equation (i))

`\frac{\partial loss}{\partial W} ` = `\frac{\partial \sum(y - (W1 \times X1 + W0))^{2}}{2m \partial W} `

`\frac{\partial loss}{\partial W} ` = `\frac{1}{2m} \times \frac{\partial \sum(y - W1 \times X1 - W0 )^{2}}{\partial W} `

Where,

y = true y

x = features

W0, W1, W = weights

m = total data points.

Hence, For W1 (weights that have independent features i.e X).

`\frac{\partial loss}{\partial W1} ` = `\frac{\partial \sum(y - W1 \times X1 - W0 )^{2}}{2m \partial W1} `

`\frac{\partial loss}{\partial W1} ` = `\frac{1}{2m} \times \frac{\partial \sum(y - W1 \times X1 - W0 )^{2}}{\partial W1} `

`\frac{\partial loss}{\partial W1} ` = `\frac{1}{2m} \times 2 \times \sum(y - W1 \times X1 - W0 ) \times (-X1)`

`\frac{\partial loss}{\partial W1} ` = ` - \frac{1}{m} \times \sum(y - W1 \times X1 - W0 ) \times (X1)`

`\frac{\partial loss}{\partial W1} ` = ` - \frac{1}{m} \times \sum(y - (W1 \times X1 + W0 )) \times (X1)`

Replacing ŷ equation with ŷ.

`\frac{\partial loss}{\partial W1} ` = ` - \frac{1}{m} \times \sum(y - ŷ ) \times (X1)`... (refer equation (i) ).

Similarly for W2, W3, .. Wn equation will be similar. Just X1 will get changed according to the number of the weight.

For W0 i.e constant (weights which do not has independent features i.e X).

`\frac{\partial loss}{\partial W0} ` = `\frac{\partial \sum(y - W1 \times X1 - W0 )^{2}}{2m \partial W0} `

`\frac{\partial loss}{\partial W0} ` = `\frac{1}{2m} \times \frac{\partial \sum(y - W1 \times X1 - W0 )^{2}}{\partial W0} `

`\frac{\partial loss}{\partial W0} ` = `\frac{1}{2m} \times 2 \times \sum(y - W1 \times X1 - W0 ) \times (1)`

`\frac{\partial loss}{\partial W0} ` = ` - \frac{1}{m} \times \sum(y - W1 \times X1 - W0 )`

`\frac{\partial loss}{\partial W1} ` = ` - \frac{1}{m} \times \sum(y - (W1 \times X1 + W0 ))`

Replacing ŷ equation with ŷ.

`\frac{\partial loss}{\partial W1} ` = ` - \frac{1}{m} \times \sum(y - ŷ ) `... (Refer equation (i) ).

Summarisation of Gradient Descent equation :

(Remember at least one)

Very Basic Equation :

While(converge)

{

W0 new `= W0 old - \eta \times` gradient_0

W1 new `= W2 old - \eta\times` gradient_1

...

Wn new `= Wn old - \eta\times` gradient_n

}

Detailed equation :

gradient = `\frac{\partial Loss}{\partial W}`

While(converge)
{

W0 new `= W0 old - \eta\frac{\partial Loss}{\partial W0 old}`

W1 new `= W1 old - \eta\frac{\partial Loss}{\partial W1 old}`
...
Wn new `= Wn old - \eta\frac{\partial Loss}{\partial Wn old}`

}

Extremely detailed equation :
gradient = `\frac{\partial Loss}{\partial W}`

`\frac{\partial Loss}{\partial W}` = ` - \frac{1}{m} \times \sum(y - ŷ ) \times (X)`

`\frac{\partial loss}{\partial W} ` = ` - \frac{1}{m} \times \sum(y - ŷ ) ` .. (for constant)

while(converge)

{

W0 new = W0 old - `\eta \times (- \frac{1}{m} \times \sum(y - ŷ) ) `

W1 new = W1 old - `\eta \times (- \frac{1}{m} \times \sum(y - ŷ) \times (X1) ) `

Wn new = Wn old - `\eta \times (- \frac{1}{m} \times \sum(y - ŷ) \times (Xn) ) `

}

ALGORITHM:

START
Ask the user for learning rate (eta) and number of iterations (n_iter)
Initialize weights (low continuous value)
Iterate step 5 and 6 up to n_iter
Calculate loss
Optimize (adjust weights that have a minimum loss).
END

Summary :

It will be difficult to add equations which has differentiation in the code. Hence, differentiation needs to be solved.
`\frac{\partial loss}{\partial W} ` = ` - \frac{1}{m} \times \sum(y - ŷ ) \times (X)` for weights which is multiplied with independent features.
`\frac{\partial loss}{\partial W} ` = ` - \frac{1}{m} \times \sum(y - ŷ )` for constants.