๐ Linear Regression
Linear Regression is a powerful tool used for prediction and forecasting in many fields. It's the starting point for regression analysis from an optimization viewpoint.
Suppose we have data with two features like in the figure below. This data represents the number of people in a specific state (x-axis) and the number of traffic accidents in each state (y-axis). We want to develop a model for this data. It looks like we can use a linear model (a straight line) to describe the relationship.

๐ง Optimization: Finding the Best Line ๐ฏ
How do we find the best line (i.e., determine the parameters 'a' and 'b')? This is the crux of linear regression.
ย
๐น Cost Function: Measurement of Errors ๐

To determine which line best describes our data, we can define an error () for each data point () as follows:
Here, is the predicted y-value from our model, and is the actual y-value from our data.
To get a positive error irrespective of the sign, we square it:
And to simplify future differentiation, we divide it by 2:
Now, the average error for all data (total number of data = N) is:
We already know that , so we substitute:
ย
๐ Cost Function: Visualization ๐
The previously calculated is called a "cost function". The lower its value, the better the model's ability to explain the data.
We can view this cost function as a function of and , because and are given in our data.
The challenge now is to find the values of and that minimize .

ย
๐งฎ Cost Function: Calculation ๐
There are many ways to find the minimum value of a function, one common method being gradient descent.
Gradient descent is a method that allows us to iteratively move in the direction of steepest descent until we reach the minimum of the function.

ย
Therefore, we can update the parameter after setting it to an arbitrary value in the function we want to obtain.
The vector may be updated as follows.
We start with initial guesses for and and update these guesses iteratively using the following update rules:

where is the learning rate, and and are the gradients of the function with respect to and .
By repeating these steps, we will eventually find the values of and that minimize the cost function , thereby giving us the best line that fits our data.