Principal Component Analysis is a technique that is used to reduce dimensions, or in simple words, attributes of the dataset, to a lower dimension, without losing any of the information from the data. The new dimensions generated after the process are called Principal Components.

Let us take an example. Suppose that we have the iris dataset. This dataset has 4 feature vectors. Now, we do not have any method of plotting the scatter plot for a dataset of 4 dimensions, but if we reduce the dimensions to 3 or 2, we can surely create a 2D or 3D scatter plot.

### Scikit-Learn Implementation

We will use the PCA class of the sklearn.decomposition python module to reduce the dimensionality of the dataset (iris). And then we will also create a 3D plot of the generated components (eigenvectors).

#### Importing the libraries

`import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom mpl_toolkits.mplot3d import Axes3D`
`from sklearn.datasets import load_irisfrom sklearn.decomposition import PCA`

Code

`iris = load_iris()`

#### Extracting the target vector

Code

`target_species = iris.target`

#### Reducing Dimensions

Now, this is the step where we reduce the 4-dimensional iris dataset to 3 dimensions.

Code

`x_reduced = PCA(n_components=3).fit_transform(iris.data)`

Here, the number of principal components is defined by “n_components”.

#### Creating a 3D scatterplot of the new components

Code

`fig = plt.figure()axes = Axes3D(fig)axes.set_title(‘Iris Dataset by PCA’, size=14)`
`axes.set_xlabel(‘First eigenvector’)axes.set_ylabel(‘Second eigenvector’)axes.set_zlabel(‘Third eigenvector’)`
`axes.w_xaxis.set_ticklabels(())axes.w_yaxis.set_ticklabels(())axes.w_zaxis.set_ticklabels(())`
`axes.scatter(x_reduced[:,0],x_reduced[:,1],x_reduced[:,2], c=target_species)`

Output

So, as we can see from the above graph that the 4 dimensions of the iris dataset have been converted to 3 dimensions.

The 3 colored scatter plot in the above graph, determines the 3 categorical classes to be predicted in the dataset.