Wooz Blog

Rรฉsumรฉ

Recommended

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Wed Jan 22 2025

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Tue Jan 21 2025

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

Tue Jan 14 2025

Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Sat Jan 11 2025

Latent Dirichlet Allocation - Part.1

Thu Dec 07 2023

From Seq2Seq to Transformer - Part.2

Sat Oct 07 2023

From Seq2Seq to Transformer - Part.1

Wed Oct 04 2023

Generative AI - Part.2 ๐Ÿค–

Thu Jul 13 2023

Generative AI - Part.1 ๐Ÿค–

Wed Jul 12 2023

KL Divergence ๐Ÿ”€

Mon Sep 27 2021

Bayesian Probability ๐Ÿงฉ

Sun Sep 26 2021

Vanilla Generative Adversarial Networks (GAN) Review ๐Ÿค–

Fri Sep 24 2021

Linear Regression ๐Ÿ“‰

Kwanwoo ยท Sat Oct 09 2021

๐Ÿ“ Linear Regression

Linear Regression is a powerful tool used for prediction and forecasting in many fields. It's the starting point for regression analysis from an optimization viewpoint.
Suppose we have data with two features like in the figure below. This data represents the number of people in a specific state (x-axis) and the number of traffic accidents in each state (y-axis). We want to develop a model for this data. It looks like we can use a linear model (a straight line) to describe the relationship.
notion image

๐Ÿ”ง Optimization: Finding the Best Line ๐ŸŽฏ

How do we find the best line (i.e., determine the parameters 'a' and 'b')? This is the crux of linear regression.
ย 

๐Ÿ’น Cost Function: Measurement of Errors ๐Ÿ“

Figure 2. The difference between the predicted value of the linear model and the actual data.
Figure 2. The difference between the predicted value of the linear model and the actual data.
To determine which line best describes our data, we can define an error () for each data point () as follows:
Here, is the predicted y-value from our model, and is the actual y-value from our data.
To get a positive error irrespective of the sign, we square it:
And to simplify future differentiation, we divide it by 2:
Now, the average error for all data (total number of data = N) is:
We already know that , so we substitute:
ย 

๐ŸŒ„ Cost Function: Visualization ๐ŸŒŒ

The previously calculated is called a "cost function". The lower its value, the better the model's ability to explain the data.
We can view this cost function as a function of and , because and are given in our data.
The challenge now is to find the values of and that minimize .
Figure 3. Cost function and its minimum value in a space where slope and intercept are domain.
Figure 3. Cost function and its minimum value in a space where slope and intercept are domain.
ย 

๐Ÿงฎ Cost Function: Calculation ๐Ÿ“ˆ

There are many ways to find the minimum value of a function, one common method being gradient descent.
Gradient descent is a method that allows us to iteratively move in the direction of steepest descent until we reach the minimum of the function.
notion image
ย 
Therefore, we can update the parameter after setting it to an arbitrary value in the function we want to obtain.
The vector may be updated as follows.
We start with initial guesses for and and update these guesses iteratively using the following update rules:
Figure 5. The process of finding the minimum value of the cost function using gradient descent.
Figure 5. The process of finding the minimum value of the cost function using gradient descent.
where is the learning rate, and and are the gradients of the function with respect to and .
By repeating these steps, we will eventually find the values of and that minimize the cost function , thereby giving us the best line that fits our data.
Feedback
Older Article

KL Divergence ๐Ÿ”€

Newer Article

Adaptive Moment Estimation (ADAM) ๐ŸŒˆ

ยฉ 2023 Wooz Labs.