Principal Component Analysis is a technique that is used to reduce dimensions, or in simple words, attributes of the dataset, to a lower dimension, without losing any of the information from the data. The new dimensions generated after the process are called Principal Components.

Let us take an example. Suppose that we have the iris dataset. This dataset has 4 feature vectors. Now, we do not have any method of plotting the scatter plot for a dataset of 4 dimensions, but if we reduce the dimensions to 3 or 2, we can surely create a 2D or 3D scatter plot.

Scikit-Learn Implementation

We will use the PCA class of the sklearn.decomposition python module to reduce the dimensionality of the dataset (iris). And then we will also create a 3D plot of the generated components (eigenvectors).

Importing the libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA

 

Loading the dataset

Code

iris = load_iris()

 

Extracting the target vector

Code

target_species = iris.target

 

Reducing Dimensions

Now, this is the step where we reduce the 4-dimensional iris dataset to 3 dimensions.

Code

x_reduced = PCA(n_components=3).fit_transform(iris.data)

Here, the number of principal components is defined by “n_components”.

Creating a 3D scatterplot of the new components

Code

fig = plt.figure()
axes = Axes3D(fig)
axes.set_title(‘Iris Dataset by PCA’, size=14)
axes.set_xlabel(‘First eigenvector’)
axes.set_ylabel(‘Second eigenvector’)
axes.set_zlabel(‘Third eigenvector’)
axes.w_xaxis.set_ticklabels(())
axes.w_yaxis.set_ticklabels(())
axes.w_zaxis.set_ticklabels(())
axes.scatter(x_reduced[:,0],x_reduced[:,1],x_reduced[:,2], c=target_species)

Output

So, as we can see from the above graph that the 4 dimensions of the iris dataset have been converted to 3 dimensions.

The 3 colored scatter plot in the above graph, determines the 3 categorical classes to be predicted in the dataset.