Wooz Blog

Rรฉsumรฉ

Recommended

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Wed Jan 22 2025

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Tue Jan 21 2025

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

Tue Jan 14 2025

Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Sat Jan 11 2025

Latent Dirichlet Allocation - Part.1

Thu Dec 07 2023

From Seq2Seq to Transformer - Part.2

Sat Oct 07 2023

From Seq2Seq to Transformer - Part.1

Wed Oct 04 2023

Generative AI - Part.2 ๐Ÿค–

Thu Jul 13 2023

Generative AI - Part.1 ๐Ÿค–

Wed Jul 12 2023

KL Divergence ๐Ÿ”€

Mon Sep 27 2021

Bayesian Probability ๐Ÿงฉ

Sun Sep 26 2021

Vanilla Generative Adversarial Networks (GAN) Review ๐Ÿค–

Fri Sep 24 2021

KL Divergence ๐Ÿ”€

Kwanwoo ยท Mon Sep 27 2021

KL Divergence (๐Ÿ”€ Kullback-Leibler Divergence)

KL Divergence, also known as Relative Entropy, measures the amount of difference between two probability distributions. This metric can give you insights about how similar or dissimilar the two distributions are.

๐ŸŽฏ Cross Entropy: Binary Case ๐Ÿ•น๏ธ

In a binary situation, there are only two possible outcomes: 0 or 1. The measure used to determine the "distance" or difference between the true and predicted values in this case is called Binary Cross Entropy (BCE).

๐Ÿ“ Equation

Where:
  • = true target value
  • = predicted model value

case 1.

target value matches with the model

case 2

target value doesn't match with the model

case3

target value doesn't match with the model
ย 

case4

target value matches with the model
ย 

๐ŸŒˆ Cross Entropy: Multiple Cases ๐ŸŽฒ

When we're dealing with more than two possible outcomes, the Binary Cross Entropy formula needs to be adjusted.
๐Ÿ“ Equation:
Figure 1. Probability distribution for ideal (or normal) dice and abnormally high number of 1 dice.
Figure 1. Probability distribution for ideal (or normal) dice and abnormally high number of 1 dice.
Where:
  • = true probability distribution
  • = predicted probability distribution
ย 
Abnormal dice cross entropy
ย 
Normal dice cross entropy

๐Ÿ“ KL Divergence: Relative Comparison of Information Entropy ๐Ÿ“Š

KL Divergence, also called Relative Entropy, shows the difference between two distributions. The key concept here is "divergence", which denotes the level of difference between the two distributions.
๐Ÿ“ Equation:
This can be expanded and rewritten as:
And further simplified into:

๐Ÿ’ญย Conclusion

These formulas and concepts are the backbone of many machine learning algorithms and are essential for understanding how models learn from data and make predictions.
Feedback
Older Article

Bayesian Probability ๐Ÿงฉ

Newer Article

Linear Regression ๐Ÿ“‰

ยฉ 2023 Wooz Labs.