In information theory, entropy is the measure of the uncertainty associated with a random variable.

It is usually referred to as Shannon entropy., which quantifies, in the sense of expected value, the information contained in a message, usually in units such as bit.

Shannon entropy is a measure of the average information content i.e., the average number of binary symbols needed to code the output of the source.

Shannon entropy represents an absolute limit on the best possible lossless compression of any communication under certain constraints treating the message to be encoded as a sequence of independent and identically distributed random variables.

The entropy rate of a source is a number that depends only on the statistical nature of the source. Consider an arbitrary source, X = {X1, X2, X3, X4...}.

##### Following are various models:

Zero Order Model: The characters are statistically independent of each other and every other letter of the alphabet is equally likely to occur.

Let m be the size of the alphabet. In this case, the entropy rate is given by,

H = logm bits/character

For example, if the alphabet size is m = 27 then the entropy rate would be,

H = log27 = 4.75 bits/character

First Order Model: The characters are statistically independent.

Let m be the size of the alphabet and let Pis the probability of the ith letter in the alphabet. The entropy rate is,

H = - ∑i PlogPbits/character

Second Order Model: Let Pj|i be the conditional probability that the present character is the jth letter in the alphabet given that the previous character is the ith letter. The entropy rate is:

H = - ∑i Pj Pj | i logPj|i bits/character

Third Order Model: Let Pk|j,i be the conditional probability that the present character is the kth letter in the alphabet given that the previous character is the jth letter and the one before that is the ith letter. The entropy rate is:

H = - ∑i Pj Pj | i logPk|j,i bits/character

General Model: Let Brepresents the first n characters. The entropy rate in the general case is given by,

H = - limn->∞ (1/n) ∑ P(Bn) logP(Bn) bits/character