KL Divergence (๐ Kullback-Leibler Divergence)
KL Divergence, also known as Relative Entropy, measures the amount of difference between two probability distributions. This metric can give you insights about how similar or dissimilar the two distributions are.
๐ฏ Cross Entropy: Binary Case ๐น๏ธ
In a binary situation, there are only two possible outcomes: 0 or 1. The measure used to determine the "distance" or difference between the true and predicted values in this case is called Binary Cross Entropy (BCE).
๐ Equation
Where:
- = true target value
- = predicted model value
๐ Cross Entropy: Multiple Cases ๐ฒ
When we're dealing with more than two possible outcomes, the Binary Cross Entropy formula needs to be adjusted.
๐ Equation:

Where:
- = true probability distribution
- = predicted probability distribution
ย
Abnormal dice cross entropy
ย
Normal dice cross entropy
๐ KL Divergence: Relative Comparison of Information Entropy ๐
KL Divergence, also called Relative Entropy, shows the difference between two distributions. The key concept here is "divergence", which denotes the level of difference between the two distributions.
๐ Equation:
This can be expanded and rewritten as:
And further simplified into:
๐ญย Conclusion
These formulas and concepts are the backbone of many machine learning algorithms and are essential for understanding how models learn from data and make predictions.