Basic Statistical Descriptions of Data - Mean, Median, Mode & Midrange

655 Views Posted On July 21, 2020

Prerequisite: Determine the numeric and categorical attributes in the dataset

The basic statistical descriptions of data help us measure some very special properties of the data. One of these properties is the central tendency. Measuring the central tendency helps us know, where does most of the data lie taking into account the whole set of data.

Let us take a use case. Suppose that we have a set of values and we want to find a value that has the capability of replacing the whole dataset and still achieve a relevant result. Finding the central tendency helps us achieve this use case.

Let us discuss some of the central tendencies we can use -

Mean: Suppose that we have a dataset, in which, we have an attribute “age” of supposing 100 people. Let the corresponding ages be a1, a2, a3…

The mean of the ages of these 100 people means the mean-age of the people, which is equivalent to answering, “what age do most of the people belong to?

Mathematically, the mean of n values can be defined as:

Along with the benefits of finding the mean of the data, there are some drawbacks. One of them is when there are some extreme values in the data.

Let us take a case. Suppose that out of 100 people in a company. 95 have a salary in the range 2 Lakhs to 5 Lakhs, but 5 people have age above 100 Lakhs. In this case, the mean salary will be around 8 Lakhs. But as we can see that most of the people have salaries between 2 Lakhs and 5 Lakhs, so this result did have much significance and was not at all useful. We cannot replace the whole dataset with the mean in this scenario. In such cases, we have another measure of central tendency which is the Median of the data.

Median: When our dataset has skewness (data is asymmetric), calculating the Median could prove to be more beneficial than Mean.

Median is defined as the centermost value of an ordered numerical dataset. For calculating the Median, it is important for the dataset to be in some order, i.e. it should be sorted.

Let us take the same above case again. Out of 100 people in a company, 95 have a salary in the range 2 Lakhs to 5 Lakhs, but 5 people have age above 100 Lakhs. The Median of this dataset will still lie between 2 Lakhs and 5 Lakhs. So we can see that the Median of the dataset is not affected by extreme values in the dataset. Therefore in such scenarios, the Median of the dataset has more significance.

Mode: This is another measure of central tendency. The mode for a set of data is the value that occurs most frequently in the set. Hence, it can be calculated for both qualitative and quantitative attributes.

There is an equal possibility that a dataset might have two modes. Such datasets are known as bimodal. In general, a dataset with two or more modes is known as multimodal.

Midrange: This is defined as the average of the largest and smallest values in the set of values.

Share this tutorial with someone who needs it

What are your thoughts?