KL Divergence 🔀

Recommended

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs

Wed Jan 22 2025

Prometheus: Inducing Fine-grained Evaluation Capability in Language Models

Tue Jan 21 2025

LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS

Tue Jan 14 2025

Assessing Multimodal LLM-as-a-Judge with Vision-Language Benchmark

Sat Jan 11 2025

Latent Dirichlet Allocation - Part.1

Thu Dec 07 2023

From Seq2Seq to Transformer - Part.2

Sat Oct 07 2023

From Seq2Seq to Transformer - Part.1

Wed Oct 04 2023

Generative AI - Part.2 🤖

Thu Jul 13 2023

Generative AI - Part.1 🤖

Wed Jul 12 2023

KL Divergence 🔀

Mon Sep 27 2021

Bayesian Probability 🧩

Sun Sep 26 2021

Vanilla Generative Adversarial Networks (GAN) Review 🤖

Fri Sep 24 2021

KL Divergence 🔀

Kwanwoo · Mon Sep 27 2021

KL Divergence (🔀 Kullback-Leibler Divergence)

KL Divergence, also known as Relative Entropy, measures the amount of difference between two probability distributions. This metric can give you insights about how similar or dissimilar the two distributions are.

🎯 Cross Entropy: Binary Case 🕹️

In a binary situation, there are only two possible outcomes: 0 or 1. The measure used to determine the "distance" or difference between the true and predicted values in this case is called Binary Cross Entropy (BCE).

📝 Equation

Where:

= true target value

= predicted model value

case 1.

target value matches with the model

case 2

target value doesn't match with the model

case3

target value doesn't match with the model

case4

target value matches with the model

🌈 Cross Entropy: Multiple Cases 🎲

When we're dealing with more than two possible outcomes, the Binary Cross Entropy formula needs to be adjusted.

📝 Equation:

Figure 1. Probability distribution for ideal (or normal) dice and abnormally high number of 1 dice.

Where:

= true probability distribution

= predicted probability distribution

Abnormal dice cross entropy

Normal dice cross entropy

📏 KL Divergence: Relative Comparison of Information Entropy 📊

KL Divergence, also called Relative Entropy, shows the difference between two distributions. The key concept here is "divergence", which denotes the level of difference between the two distributions.

📝 Equation:

This can be expanded and rewritten as:

And further simplified into:

💭 Conclusion

These formulas and concepts are the backbone of many machine learning algorithms and are essential for understanding how models learn from data and make predictions.

Feedback

Older Article

Bayesian Probability 🧩

Newer Article

KL Divergence 🔀

KL Divergence (🔀 Kullback-Leibler Divergence)

🎯 Cross Entropy: Binary Case 🕹️

📝 Equation

case 1.

case 2

case3

case4

🌈 Cross Entropy: Multiple Cases 🎲

📏 KL Divergence: Relative Comparison of Information Entropy 📊

💭 Conclusion

Bayesian Probability 🧩

Linear Regression 📉