# Entropy and Entropy Rate in Data Compression

In information theory, entropy is the measure of the uncertainty associated with a random variable.

It is usually referred to as Shannon entropy., which quantifies, in the sense of expected value, the information contained in a message, usually in units such as bit.

Shannon entropy is a measure of the average information content i.e., the average number of binary symbols needed to code the output of the source.

Shannon entropy represents an absolute limit on the best possible lossless compression of any communication under certain constraints treating the message to be encoded as a sequence of independent and identically distributed random variables.

The entropy rate of a source is a number that depends only on the statistical nature of the source. Consider an arbitrary source, X = {X_{1}, X_{2}, X_{3}, X_{4}...}.

##### Following are various models:

**Zero Order Model: **The characters are statistically independent of each other and every other letter of the alphabet is equally likely to occur.

Let *m* be the size of the alphabet. In this case, the entropy rate is given by,

H = log_{2 }m bits/character

For example, if the alphabet size is m = 27 then the entropy rate would be,

H = log_{2 }27 = 4.75 bits/character

**First Order Model:** The characters are statistically independent.

Let m be the size of the alphabet and let P_{i }is the probability of the i^{th }letter in the alphabet. The entropy rate is,

H = - ∑_{i} P_{i }log_{2 }P_{i }bits/character

**Second Order Model:** Let P_{j|i }be the conditional probability that the present character is the j^{th }letter in the alphabet given that the previous character is the i^{th }letter. The entropy rate is:

H = - ∑_{i} P_{i }∑_{j} P_{j | i }log_{2 }P_{j|i} bits/character

**Third Order Model:** Let P_{k|j,i }be the conditional probability that the present character is the k^{th }letter in the alphabet given that the previous character is the j^{th }letter and the one before that is the i^{th }letter. The entropy rate is:

H = - ∑_{i} P_{i }∑_{j} P_{j | i }log_{2 }P_{k|j,i} bits/character

**General Model:** Let B_{n }represents the first n characters. The entropy rate in the general case is given by,

H = - lim_{n->∞} (1/n) ∑ P(B_{n}) log_{2 }P(B_{n}) bits/character