What is Data Reduction?

Let us suppose that we have a very large dataset and our task is to perform data analysis and mining on this data. Because the dataset is huge, data analysis would be complex and mining will require a large amount of time, making such analysis impractical and infeasible.

In such cases, the Data Reduction technique can be applied to obtain a reduced representation of the data that is much smaller in volume, yet closely maintains the integrity of the original data.

Now when Data has been reduced, data analysis won't be complex and won’t take a large amount of time.

Data Reduction Strategies

The following techniques given below, are a few data reduction techniques, we are going to study -

Dimensionality Reduction

In this technique, we reduce the number of attributes or features of the dataset taken under consideration.

Numerosity Reduction

In this technique, the original data is replaced by smaller forms of data representation.

Data Compression

In this technique, transformations are applied so as to obtain a “reduced” or “compressed” representation of the original data.

If the original data can be reconstructed from the original data without any information loss, the data reduction is called lossless.

If we can only construct an approximation of the original data, then the data reduction is called lossy.

Dimensionality Reduction & Numerosity reduction can also be considered as a form of data compression.