In supervised learning, we talked about labeled training data where we have both input features and target features in the data.
In Unsupervised Learning, we have training data but we do not have any labels attached to them. One of the Unsupervised Machine Learning problems in day to day basis is clustering. The task in this problem is to find coherent data points, i.e. the points with similar properties. This will give us a cluster of points in the given dataset as shown below.
In the diagram below we have identified four clusters.
It is important to note that not all data points need to fall in the cluster and also not all the clusters need to form a circular shape as shown above. They can be ellipsoid or any other possible shape in which the data points may fall into.
Suppose that a data point does not lie inside any of the clusters and is very far from all the clusters, it is said to be an outlier.
Applications of Unsupervised Learning –
- Customer Data – Discover classes of customers
- Image Pixels – Discover Regions in an image
- Finding similar words to a particular word