Prerequisite: What are Outliers & What is Outlier Detection?

Outliers are termed as abnormal data in the dataset. When performing Outlier Detection using Supervised Learning Techniques, we create models, either for normal data or abnormal data.

Outlier Detection using Supervised Learning is modeled as a classification problem. The problem statement is to build a classifier to recognize Outliers.

The learning model can be built in two ways -

  • We build a model for normal data and any data which does not match the model is considered as an outlier.
  • Or, we build a model for abnormal data, and any data which does not match the model is considered as normal.

Challenges faced by Supervised Outlier Detection

We might face the following challenges when building our supervised learning model for detecting outliers -

  • The amount of outliers in the dataset is too less than the normal data, i.e. the dataset is imbalanced. In such cases, we need to use methods for handling imbalanced classes, such as oversampling outliers to increase their distribution in the training set used to model the classifier. Lack of outliers in the data can limit the capability of the classifier.
  • In applications where we need to filter out outliers, it is more important to filter outliers from the dataset than not to mislabel normal data as outliers.