Greetings, This blog contains the required information regarding the code of the linear regression without sklearn in python. In the previous blog, we have covered almost everything regarding linear regression. This blog consists of :
- Code of linear regression
- Explanation of Code
Those, who are not good at coding or do not have coding background can read the explanation. If it is still unclear for those people the coding part will get covered in further blogs.
Linear Regression without sklearn
import numpy as npimport pandas as pdimport matplotlib.pyplot as plt1. class Linear_Regression:2.3. def __init__(self, eta= 0.01 , n_iter=1000 , random_state = 1):4. self.eta = eta5. self.n_iter = n_iter6. self.random_state = random_state7.8. def predict(self, value):9. return np.dot(np.transpose(self.w[1:]), value) + self.w[0]10.11. def calculate_loss(self):12. loss = 013. for i in range(len(self.x)):14. value = (self.y.values[i] - self.predict(self.x.values[i]))**215. loss = loss + (value/len(self.x))16. return loss[0]17.18. def calculate_gradient(self, x):19. gradient = 020. for i in range(len(self.x)):21. value = (self.y.values[i] - self.predict(self.x.values[i]))*(x.values[i])22. value = value/len(self.y)23. gradient = gradient + value24. return -gradient25.26. def calculate_constant(self):27. constant = 028. for i in range(len(self.x)):29. value = (self.y.values[i] - self.predict(self.x.values[i]))30. value = value/len(self.y)31. constant = constant + value32. return -constant33.34.35. def fit(self, x,y):36. self.x = pd.DataFrame(x)37. self.y = pd.DataFrame(y)38. regen = np.random.RandomState(self.random_state)39. self.w = regen.normal(loc = 0.0, scale = 0.01, size = 1 + self.x.shape[1])40. self.error = []41. self.weights = []42. for i in range(self.n_iter):43. self.error.append(self.calculate_loss())44. self.weights.append(list(self.w))45. self.w[0] = self.w[0] - self.eta*(self.calculate_constant())46. for j in range(1,len(self.w)):47. gradient = self.calculate_gradient(self.x.iloc[:,j-1])48. self.w[j] = self.w[j] - self.eta*(gradient)
Explanation :
Modules
We will use 3 basic modules to make this entire linear regression :
- Numpy: for mathematical calculations
- Pandas: for arranging data and to create a data frame
- Matplotlib: for visualization and graphs.
Class Linear Regression
Class linear regression is available at line 1. This syntax is used because we are using OOP (object-oriented programming) approach to develop linear regression. Linear regression is the name of the class. Class in simple terms is nothing but a road map or plan. Nothing is getting executed in this class unless we create an object of that class (will discuss further). Hence, we will define all the instructions (what to do ? and how to do ? ) in the class.
def __init__( argument list )
def __init__() is available from lines 3 to 6. def means we are defining a function(function means that perform certain instruction when called). def __init__() is a special type of function, it is nothing but the constructor of the class. the constructor of the class in simple terms means it will get called as we declare the object. We don't have to call the constructor. It will get called (called means getting executed or run) automatically as we declare the object.
Whatever arguments, we will give to the class should be written in the argument list of init function i.e in the brackets of the init function. In linear regression, we are giving 3 arguments :
- learning rate as eta
- number of iterations for gradient descent as n_iter
- random state (will discuss further).
There was a self keyword in the argument list of init function. Every variable that starts with an extension of self keyword is a class variable. Hence, they can be accessed in any function of that class and we do not have to mention that in an argument of any other class function (method) just we need to pass this self keyword in every class function (method) as an argument. If you want to access class function (method) in the same class then also self extension is required. more information about self will be covered in further blogs. Now, we are creating variables with the self keyword so that they can be accessed in all the methods of the class. On line 4,5,6, we are creating 3 class variables
self.eta = eta --> self.eta will take a value from the eta variable and eta is a learning rate passed by the user when creating an object. Hence, it is also available in the argument list of init function. This self.eta can be accessed in all the methods (functions)
self.n_iter = n_iter --> similarly, n_iter is number of iterations.
self.random_state = random_state --> similarly for random_state, random_state will be discussed further.
def fit( argument list )
def fit() is available from lines 35 to 48. This is the main function responsible for the entire linear regression. This function will take X and y from the user. The meaning of the code line by line is given below.
Line No 36
we are creating a class variable to store the value of x. But data should be in the form of the data frame. hence, the pandas module is used because we have to use the data frame functions to manipulate data for processing.
Line No 37
similarly, for y we are creating class variables i.e self. y.
similarly, for y we are creating class variables i.e self. y.
Line No 38
Now, we want a generator to produce some numbers for weights. Hence, we are using the NumPy module and this function is taking the value of random state because it will generate numbers with that specific randomness.Line No 39
we are creating one class variable i.e self. w to store weights and we are using the same generator for inirialization. Now, we want the numbers to be small and normally distributed. Hence, we are using normal functions (for normal distribution) starting from 0.0 with a scale of 0.01 (small weights). The number of weights (W) should be equal to the number of features (X) and one weight for constant (W0). Hence we are giving size = 1 + self.x.shape[1]. self.x.shape[1] is nothing but a number of columns i.e features (X) and we are adding 1 to get one extra weight for constant.
we are creating one class variable i.e self. w to store weights and we are using the same generator for inirialization. Now, we want the numbers to be small and normally distributed. Hence, we are using normal functions (for normal distribution) starting from 0.0 with a scale of 0.01 (small weights). The number of weights (W) should be equal to the number of features (X) and one weight for constant (W0). Hence we are giving size = 1 + self.x.shape[1]. self.x.shape[1] is nothing but a number of columns i.e features (X) and we are adding 1 to get one extra weight for constant.
Line No 40
one class variable i.e self.errors = [], This is variable is used to store all the loss values.Line No 41
similarly, one class variable i.e self.weighs = [], This variable is used to store all weightsLine No 42
we are using for loops starting from 0 up to n number of iterations.
Line No 43
we are calculating the loss with the current weights and storing values in self.error. self.calculate_loss() is a function you can find from lines 11 to 16. self.calculate_loss() is supposed to calculate the loss and return loss so that it gets added in the self.error list. Line No 44.
Similarly, the current weights will get added to the self.weights list.
Line No 45.
Now, We have to update the weights to get the minimum loss. Hence the equation of gradient descent for constant is written. There is a function called self.calculate_constant() which is available from lines 26 to 32 and that is supposed to calculate the gradient for constant.
Line No 46.
Iteration from 0 to a number of weights, to update all the weights one by one. The aim is to get the weights that will give minimum loss.
Line No 47.
variable i.e gradient, this variable is supposed to get the value from a function i.e self.calculate_gradient(). This function is supposed to calculate the gradient for an independent feature.
Line No 48.
equation of gradient descent for updating the weights. The gradient we got from line no 47.
(line no 42 to 48 is getting repeat for n_iter number of iterations. Hence, after a number of iteration, the weights which we will get in self.w will give the minimum loss with respect to the data).
def prediction():
This function is supposed to predict the value of y with the help of weights at that point in time. We have already discussed the equation in the previous blog. You can use any of them.
def calculate_loss():
This function is supposed to predict the value of the loss with the weights at that point in time. We have already discussed the equation The sum is getting calculate with the help of for loop. In every iteration, the value is getting added to the loss variable and that loss variable is getting returned at the end of the function as an output.
def calculate_gradient():
This function is supposed to return the value of the gradient at that point. The equation of gradient we already discussed in the previous blog. The sum is calculated with the help of for loop. In every iteration, the value generated is getting added to the gradient variable.
def calculate_constant():
This function is supposed to return the value of the gradient for constant at that point. The equation we already discussed in previous blogs.
How to execute this linear regression class :
x = [1,2,3,4] # Training X Data
y = [2,4,6,8] # Training y Data
lr = Linear_Regression()
lr.fit(x,y)
x_test = [6,7,20,30]
y_test = [12,14,40,60]
y_pred = []
for i in x_test:
y_pred.append(lr.predict(i)[0])
print("Y True results : ", y_test)
print("Y Predicted results : ", y_pred)Explanation :
In the first 2 lines, we have defined our training data. In the next line, lr is the object of the class linear regression. We haven't passed any parameter hence, the default parameter that is assigned in the init function will be assigned. If we have given our own arguments then it will be overridden and our mentioned parameter will get into consideration. The class variable can be accessed outside the class with the help of lr. Instead of self, it will be lr. Hence, all the class variables like self.weights and self.error can be accessed like lr.weights and lr.error. similarly for functions self.fit can be accessed like lr.fit(x,y). To access the predict function, the syntax will be lr.predict() because lr is our object (actual execution of class i.e roadmap). after lr.fit(), our model is trained. In the next 2 lines testing data is defined. y_pred is an empty list in which the model will store its prediction. In next 2 lines, The for loop is providing all the test data one by one to the model, and model is predicting the value. The predicted values are stored in the y_pred list. In the next 2 lines, y_test and y_pred, both lists are printed. The output of the code is as follows.
-Santosh Saxena

