# Principal Component Analysis | Scikit-Learn Implementation

Principal Component Analysis is a technique that is used to reduce dimensions, or in simple words, attributes of the dataset, to a lower dimension, without losing any of the information from the data. The new dimensions generated after the process are called Principal Components.

Let us take an example. Suppose that we have the iris dataset. This dataset has 4 feature vectors. Now, we do not have any method of plotting the scatter plot for a dataset of 4 dimensions, but if we reduce the dimensions to 3 or 2, we can surely create a 2D or 3D scatter plot.

### Scikit-Learn Implementation

We will use the **PCA** class of the **sklearn.decomposition** python module to reduce the dimensionality of the dataset (iris). And then we will also create a 3D plot of the generated components (eigenvectors).

#### Importing the libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

from sklearn.datasets import load_iris

from sklearn.decomposition import PCA

#### Loading the dataset

**Code**

iris = load_iris()

#### Extracting the target vector

**Code**

target_species = iris.target

#### Reducing Dimensions

Now, this is the step where we reduce the 4-dimensional iris dataset to 3 dimensions.

**Code**

x_reduced = PCA(n_components=3).fit_transform(iris.data)

Here, the number of principal components is defined by “n_components”.

#### Creating a 3D scatterplot of the new components

**Code**

fig = plt.figure()

axes = Axes3D(fig)

axes.set_title(‘Iris Dataset by PCA’, size=14)

axes.set_xlabel(‘First eigenvector’)

axes.set_ylabel(‘Second eigenvector’)

axes.set_zlabel(‘Third eigenvector’)

axes.w_xaxis.set_ticklabels(())

axes.w_yaxis.set_ticklabels(())

axes.w_zaxis.set_ticklabels(())

axes.scatter(x_reduced[:,0],x_reduced[:,1],x_reduced[:,2], c=target_species)

**Output**

So, as we can see from the above graph that the 4 dimensions of the iris dataset have been converted to 3 dimensions.

The 3 colored scatter plot in the above graph, determines the 3 categorical classes to be predicted in the dataset.