Prerequisite: Outlier Detection using Supervised Learning Technique

It is not always the case where we will have labeled data for training a supervised outlier detection model. When the data is not labeled, we use unsupervised learning techniques to build models for outlier detection.

In unsupervised outlier detection, we try to find some patterns in the data points which are normal and are not outliers. Any point which does not follow that pattern can be classified as an outlier.

One of the patterns we can see in the normal data is that they form clusters or groups. This means that the normal data points form multiple groups in which they lie, and some points do not lie inside any of those groups and lie very far away from all the points. Such points which do not follow the found pattern, are classified as outliers.

This approach may not be effective in all cases. Because normal data don’t need to follow a strong pattern. They are just uniformly distributed. Example of such a scenario — Computer Virus Detection. In this, the normal activities are very diverse and many do not fall into high-quality clusters. In such a case, the unsupervised approach may have a high false rate, they may mislabel many normal objects as outliers and let many outliers go undetected.

We can also face one more issue in outlier detection, which is, when a data point which seems to be an outlier, is noise.