Linear Regression 📉

Kwanwoo · Sat Oct 09 2021

📐 Linear Regression

Linear Regression is a powerful tool used for prediction and forecasting in many fields. It's the starting point for regression analysis from an optimization viewpoint.

Suppose we have data with two features like in the figure below. This data represents the number of people in a specific state (x-axis) and the number of traffic accidents in each state (y-axis). We want to develop a model for this data. It looks like we can use a linear model (a straight line) to describe the relationship.

🔧 Optimization: Finding the Best Line 🎯

How do we find the best line (i.e., determine the parameters 'a' and 'b')? This is the crux of linear regression.

💹 Cost Function: Measurement of Errors 📏

Figure 2. The difference between the predicted value of the linear model and the actual data.

To determine which line best describes our data, we can define an error () for each data point () as follows:

Here, is the predicted y-value from our model, and is the actual y-value from our data.

To get a positive error irrespective of the sign, we square it:

And to simplify future differentiation, we divide it by 2:

Now, the average error for all data (total number of data = N) is:

We already know that , so we substitute:

🌄 Cost Function: Visualization 🌌

The previously calculated is called a "cost function". The lower its value, the better the model's ability to explain the data.

We can view this cost function as a function of and , because and are given in our data.

The challenge now is to find the values of and that minimize .

Figure 3. Cost function and its minimum value in a space where slope and intercept are domain.

🧮 Cost Function: Calculation 📈

There are many ways to find the minimum value of a function, one common method being gradient descent.

Gradient descent is a method that allows us to iteratively move in the direction of steepest descent until we reach the minimum of the function.

Therefore, we can update the parameter after setting it to an arbitrary value in the function we want to obtain.

The vector may be updated as follows.

We start with initial guesses for and and update these guesses iteratively using the following update rules:

Figure 5. The process of finding the minimum value of the cost function using gradient descent.

where is the learning rate, and and are the gradients of the function with respect to and .

By repeating these steps, we will eventually find the values of and that minimize the cost function , thereby giving us the best line that fits our data.