Average Information is also called as entropy.

If we have a set of independent events Ai, which are the set of outcomes of some experiments S, such that

∪ A= S

where S is the sample space, then the average self-information associated with the random experiment is given by,

H = ∑ P(Ai) i(Ai) = - ∑ P(Ai) logP(Ai)

This quantity is called the entropy associated with the experiment.

One of the many contributions of Shannon was that he showed if the experiment is a source that puts out symbol Afrom a set A, then the entropy is a measure of the average number of binary symbols needed to code the output of the source.

Shannon showed that the best that a lossless compression scheme can do is to encode the output of a source with an average number of bits equal to the entropy of the source.

Given a set of independent events A1, A2, ....., An with probability pi = P(Ai), the following properties are used in the measure of average information H:

• We want h to be a continuous function of the probabilities pi. That is, a small change in pi should only cause a small change in the average information.
•  If all events are equally likely, that is, pi = 1/n for all i, then H should be a monotonically increasing function of n. The more possible outcomes there are, the more information should be contained in the occurrence of any particular outcome.
• Suppose we divide the possible outcomes, into a number of groups. We indicate the occurrence of a particular event by first indicating the group it belongs to, then indicating which particular member of the group it is.
• Thus, we get some information first by knowing which group the event belongs to and then we get additional information by learning which particular event (from the events in the group) has occurred. The information associated with indicating the outcome in a single stage.