One of the most popular ways of representing dependence in the data is through the use of Markov Models.

For models used in lossless compression, we use a specific type of Markov process called a discrete-time Markov chain.

Let {xn} be a sequence of observations. This sequence is said to follow a kth order Markov model if

P(xn | xn-1,.........,xn-k) = P(xn | xn-1,.........,xn-k,.......)

In other words, knowledge of the past k symbols is equivalent to the knowledge of the entire history of the process.

The values took on by the set {xn-1,.........,xn-k} are called the states of the process. If the size of the source alphabet is l, then the number of states is lk.

The most commonly used Markov model is the first-order Markov model, for which

P(xn | xn-1) = P(xn | xn-1,xn-2,xn-3,.....)

The above equation indicates the existence of dependence between samples.

However, they do not describe the form of dependence. We can develop different first-order Markov models depending on our assumption about the form of the dependence between samples.

If we assumed that the dependence was introduced linearly, we could view the data sequence as the output of a linear filter driven by white noise.

The output of such a filter can be given by the difference equation,

xn = ρxn-1 + εn

where εn is a white noise process. This model is often used when developing coding algorithms for speech and images.

The use of the Markov model does not require the assumption of linearity.

##### For example,

Consider a binary image.

The image has only two types of pixels that are white pixels and black pixels.

We know that the appearance of a white pixel as the next observation depends, to some extent, on whether the current pixel is black or white.

Therefore, we can model the pixel process as a discrete-time Markov chain.

Define two states Sand S(Sw would correspond to the case where the current pixel is a white pixel, and Sb corresponds to the case where the current pixel is a black pixel).

We define the transition probabilities P(w/b) and P(b/w), and the probability of being in each state P(Sw) and P(Sb).

The entropy of a finite state process with states Sis simply the average value of the entropy at each state

H = ∑ P(Si) H(Si)