Principal Component Analysis

Data preparation

In this series of exercises, we illustrate PCA on the wine data already used for clustering. We first load the data.

R
Python

wine <- read.csv(here::here("labs/data/Wine.csv"))
row.names(wine) <- paste("W", c(1:nrow(wine)), sep="") 
wine[,-12] <- scale(wine[,-12])

library(reticulate)
use_condaenv("MLBA")

import pandas as pd
from sklearn.preprocessing import StandardScaler

wine = pd.read_csv('../../data/Wine.csv')
wine.index = ["W" + str(i) for i in range(1, len(wine)+1)]
scaler = StandardScaler()
wine.iloc[:, :-1] = scaler.fit_transform(wine.iloc[:, :-1])

Note that here the scaling of the variables is optional. The PCA can be applied on the correlation matrix which is equivalent to use scaled features. We could alternatively use unscaled data. The results would of course be different and dependent on the scales themselves. That choice depends on the practical application.

Principal Component Analysis

Before to run the PCA, it is sometimes good to represent the correlations between the features.

R
Python

library(ggcorrplot)
ggcorrplot(cor(wine[,-12]))

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 8))
sns.heatmap(wine.iloc[:, :-1].corr(), annot=True, cmap="coolwarm", center=0)
plt.show()

Warning

Please note that all the interpretations will be based on the R outputs. The python outputs can be slightly different; however, these differences have been explained wherever possible.

We see that some features are (negatively or positively) correlated, like alcohol and density. This means that they bring to some extend the same information to the data. They could thus be combined in a single feature. This is in fact what PCA is doing.

We now compute the principal components. There exist several functions in R for this. We use PCA available in package FactoMineR. It is completed by factoextra that provides nice visualizations.

R
Python

library(FactoMineR)
library(factoextra)
wine_pca <- PCA(wine[,-12], ncp = 11, graph = FALSE)
wine_pca

from sklearn.decomposition import PCA

pca = PCA(n_components=11)
wine_pca_result = pca.fit(wine.iloc[:, :-1])
wine_pca = pd.DataFrame(pca.transform(wine.iloc[:, :-1]))
wine_pca

The object wine_pca contains all the PCA results. We will analyze them now.

Note: we require ncp=11 because there can be 11 principal components (same as features), and that we want to keep the results for all the principal components. The results will not change if we set ncp=5 for example. It is just that only 5 PCs will be kept in the R object.

Circle of correlations and interpretation

To produces a circle of correlations:

R
Python

fviz_pca_var(wine_pca)

import numpy as np
import matplotlib.pyplot as plt

features = wine.columns[:-1]
loading_matrix = pd.DataFrame(pca.components_.T, columns=[f"PC{i+1}" for i in range(11)], index=features)
loading_matrix = loading_matrix.iloc[:, :2]

# Normalize the PCA loadings
loading_matrix_normalized = loading_matrix / np.sqrt(np.sum(loading_matrix**2, axis=0))

fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(loading_matrix_normalized["PC1"], loading_matrix_normalized["PC2"], color="r", marker="o", s=100)

for i, feature in enumerate(loading_matrix_normalized.index):
    ax.arrow(0, 0, loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], color="r", alpha=0.8)
    ax.text(loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], feature, color="black", fontsize=12)

ax.axhline(0, color="black", linewidth=1)
ax.axvline(0, color="black", linewidth=1)
ax.set_xlim(-1.1, 1.1)
ax.set_ylim(-1.1, 1.1)

circle = plt.Circle((0, 0), 1, color='gray', fill=False, linestyle="--", linewidth=1)
ax.add_artist(circle)

ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_title("Circle of Correlations")
ax.grid()
plt.show()

Note

Please do note for this plot:

Values on the x-axis have been normalized to be between -1 and 1. For the rest of the python plots, we’ll not apply this.
To get the same results as R, the coordinates of PC2 should be flipped. The results appear mirrored because PCA is sensitive to the orientation of the data. The PCA algorithm calculates eigenvectors as the principal components, and eigenvectors can have either positive or negative signs. This means that the orientation of the principal components can vary depending on the implementation of PCA in different libraries. You can get the same results as R with the code below:

import numpy as np
import matplotlib.pyplot as plt

# Normalize the PCA loadings
loading_matrix_normalized = loading_matrix / np.sqrt(np.sum(loading_matrix**2, axis=0))

# Add this if you want the same results
loading_matrix_normalized["PC2"] = -loading_matrix_normalized["PC2"]

fig, ax = plt.subplots(figsize=(8, 8))
ax.scatter(loading_matrix_normalized["PC1"], loading_matrix_normalized["PC2"], color="r", marker="o", s=100)

for i, feature in enumerate(loading_matrix_normalized.index):
    ax.arrow(0, 0, loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], color="r", alpha=0.8)
    ax.text(loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], feature, color="black", fontsize=12)

ax.axhline(0, color="black", linewidth=1)
ax.axvline(0, color="black", linewidth=1)
ax.set_xlim(-1.1, 1.1)
ax.set_ylim(-1.1, 1.1)

circle = plt.Circle((0, 0), 1, color='gray', fill=False, linestyle="--", linewidth=1)
ax.add_artist(circle)

ax.set_xlabel("PC1")
ax.set_ylabel("PC2")
ax.set_title("Circle of Correlations")
ax.grid()
plt.show()

This is for the two first principal components. We see

PC1 explains $31.4\%$ of the variance of the data, PC2 explains $15.3\%$. In total, $46.7\%$ of the variance of the data is explained by these two components.
PC1 is positively correlated with density, negatively with alcohol. Which confirms that these two features are negatively correlated. It is also positively correlated with residual.sugar.
PC2 is positively correlated with pH, negatively with fixed.acidity and, a little less, with citric.acid.
Features with shorts arrows are not explained here: volatile.acidity, sulphates, etc.

To even better interpret the dimensions, we can extract the contributions of each features in the dimension. Below, for PC1.

R
Python

fviz_contrib(wine_pca, choice = "var", axes = 1)

explained_variance = pca.explained_variance_ratio_
square_loading_matrix = loading_matrix**2
contributions = square_loading_matrix * 100 / explained_variance[:2]

contributions = contributions["PC1"].sort_values(ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x=contributions.values, y=contributions.index, palette="viridis")
plt.xlabel("Contribution (%)")
plt.ylabel("Features")
plt.title("Feature Contributions for PC1")
plt.show()

We recover our conclusions from the circle (for PC1).

Map of the individual and biplot

We can represent the wines in the (PC1,PC2) map. To better interpret the map, we add on it the correlation circle: a biplot.

R
Python

## fviz_pca_ind(wine_pca) ## only the individuals
fviz_pca_biplot(wine_pca) ## biplot

import numpy as np
import matplotlib.pyplot as plt

# Define the scores for the first two principal components
scores = wine_pca.iloc[:, :2]

# Define the loadings for the first two principal components
loadings = pca.components_.T[:, :2]

# Scale the loadings by the square root of the variance
loadings_scaled = loadings * np.sqrt(pca.explained_variance_[:2])

# Calculate the scaling factor for the arrows
arrow_max = 0.9 * np.max(np.max(np.abs(scores)))
scale_factor = arrow_max / np.max(np.abs(loadings_scaled))

# Create a scatter plot of the scores
plt.figure(figsize=(10, 8))
plt.scatter(scores.iloc[:, 0], scores.iloc[:, 1], s=50, alpha=0.8)

# Add arrows for each variable's loadings
for i, variable in enumerate(wine.columns[:-1]):
    plt.arrow(0, 0, loadings_scaled[i, 0]*scale_factor, loadings_scaled[i, 1]*scale_factor, color='r', alpha=0.8)
    plt.text(loadings_scaled[i, 0]*scale_factor, loadings_scaled[i, 1]*scale_factor, variable, color='black', fontsize=12)

# Add axis labels and title
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.title('Biplot of Wine Dataset')

# Add grid lines
plt.grid()

# Show the plot
plt.show()

It’s a bit difficult to see all the patterns, but for instance

Wine 168 has a large fixed.acidity and a low pH.
Wine 195 has a large alcohol and a low density.
etc.

When we say “large” or “low”, it is not in absolute value but relative to the data set, i.e., “larger than the average”; the average being at the center of the graph (PC1=0, PC2=0).

How many dimensions

For graphical representation one a single graph, we need to keep only two PCs. But if we use it to reduce the dimension of our data set, or if we want to represent the data on several graphs, then we need to know how many components are needed to reach a certain level of variance. This can be achieved by looking at the eigenvalues (screeplot).

R
Python

fviz_eig(wine_pca, addlabels = TRUE, ncp=11)

plt.figure(figsize=(10, 6))
plt.plot(range(1, 12), explained_variance * 100, 'o-')
plt.xticks(range(1, 12), [f"PC{i}" for i in range(1, 12)])
plt.xlabel("Principal Components")
plt.ylabel("Explained Variance (%)")
plt.title("Scree Plot")
plt.grid()
plt.show()

If we want to achieve $75\%$ of representation of the data (i.e., of the variance of the data), we need 5 dimensions. This means that the three biplots below represent $>75\%$ of the data (in fact $84.6\%$).

library(gridExtra)
p1 <- fviz_pca_biplot(wine_pca, axes = 1:2) 
p2 <- fviz_pca_biplot(wine_pca, axes = 3:4) 
p3 <- fviz_pca_biplot(wine_pca, axes = 5:6) 
grid.arrange(p1, p2, p3, nrow = 2, ncol=2)

Using PCA to represent clustering results

We now combine clustering and PCA: we make clusters and represent them on the map of the individuals.

First, we make a clustering (below $k=4$ for the example). Then, we use the group (as factors) to color the individuals in the biplot.

R
Python

wine_hc <- hclust(dist(wine[,-12], method = "manhattan"))
wine_clust <- cutree(wine_hc, k = 4)
fviz_pca_biplot(wine_pca,
             col.ind = factor(wine_clust))

from sklearn.cluster import AgglomerativeClustering
from matplotlib.cm import get_cmap
from matplotlib.patches import Patch

clustering = AgglomerativeClustering(n_clusters=4, metric='manhattan', linkage='average')
wine_clust = clustering.fit_predict(wine.iloc[:, :-1])

cmap = get_cmap("viridis", 4)
colors = cmap(np.array(wine_clust) / 3)

# Define legend labels based on cluster number
labels = ['Cluster {}'.format(i+1) for i in range(4)]

# Rescale the loading matrix
loading_matrix_rescaled = loading_matrix.div(loading_matrix.std())

plt.figure(figsize=(10, 10))
plt.scatter(wine_pca[0], wine_pca[1], c=colors, marker="o", s=100, alpha=0.7)

for i, feature in enumerate(loading_matrix_rescaled.index):
    plt.arrow(0, 0, loading_matrix_rescaled.loc[feature, "PC1"], loading_matrix_rescaled.loc[feature, "PC2"], color="r", alpha=0.8)
    plt.text(loading_matrix_rescaled.loc[feature, "PC1"], loading_matrix_rescaled.loc[feature, "PC2"], feature, color="black", fontsize=12)

plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("PCA Biplot with Clusters")
plt.grid()

# Create legend patches and group them in a Legend instance
legend_patches = [Patch(color=cmap(i/3), label=labels[i]) for i in range(4)]
plt.legend(handles=legend_patches, title='', loc='upper right', bbox_to_anchor=(1.0, 1.0))

plt.show()

We can see that the clusters are almost separated by the dimensions:

Cluster 1: larger PC1 and PC2. That is, large pH (low acidity) and large density, residual sugar (low alcohol), etc.
Cluster 3: larger PC1 and smaller PC2: That is, larger alcohol (lower density and sugar) and large pH (low acidity).
Cluster 2 is fuzzier. Larger acidity (lower pH) and larger alcohol.
Cluster 4 is apparently linked to larger citric acidity.

--- title: "Principal Component Analysis" --- ```{r global_options, include = FALSE} knitr::opts_chunk$set(fig.align="center", results = 'hide', fig.show = 'hide') ``` # Data preparation In this series of exercises, we illustrate PCA on the wine data already used for clustering. We first load the data. ::: panel-tabset ### R ```{r} wine <- read.csv(here::here("labs/data/Wine.csv")) row.names(wine) <- paste("W", c(1:nrow(wine)), sep="") wine[,-12] <- scale(wine[,-12]) ``` ### Python ```{r} library(reticulate) use_condaenv("MLBA") ``` ```{python} import pandas as pd from sklearn.preprocessing import StandardScaler wine = pd.read_csv('../../data/Wine.csv') wine.index = ["W" + str(i) for i in range(1, len(wine)+1)] scaler = StandardScaler() wine.iloc[:, :-1] = scaler.fit_transform(wine.iloc[:, :-1]) ``` ::: Note that here the scaling of the variables is optional. The PCA can be applied on the correlation matrix which is equivalent to use scaled features. We could alternatively use unscaled data. The results would of course be different and dependent on the scales themselves. That choice depends on the practical application. # Principal Component Analysis Before to run the PCA, it is sometimes good to represent the correlations between the features. ::: panel-tabset ### R ```{r} library(ggcorrplot) ggcorrplot(cor(wine[,-12])) ``` ### Python ```{python} import seaborn as sns import matplotlib.pyplot as plt plt.figure(figsize=(10, 8)) sns.heatmap(wine.iloc[:, :-1].corr(), annot=True, cmap="coolwarm", center=0) plt.show() ``` ::: {.callout-warning} Please note that all the interpretations will be based on the R outputs. The python outputs can be slightly different; however, these differences have been explained wherever possible. ::: ::: We see that some features are (negatively or positively) correlated, like `alcohol` and `density`. This means that they bring to some extend the same information to the data. They could thus be combined in a single feature. This is in fact what PCA is doing. We now compute the principal components. There exist several functions in `R` for this. We use `PCA` available in package `FactoMineR`. It is completed by `factoextra` that provides nice visualizations. ::: panel-tabset ### R ```{r} library(FactoMineR) library(factoextra) wine_pca <- PCA(wine[,-12], ncp = 11, graph = FALSE) wine_pca ``` ### Python ```{python} from sklearn.decomposition import PCA pca = PCA(n_components=11) wine_pca_result = pca.fit(wine.iloc[:, :-1]) wine_pca = pd.DataFrame(pca.transform(wine.iloc[:, :-1])) wine_pca ``` ::: The object `wine_pca` contains all the PCA results. We will analyze them now. Note: we require `ncp=11` because there can be 11 principal components (same as features), and that we want to keep the results for all the principal components. The results will not change if we set `ncp=5` for example. It is just that only 5 PCs will be kept in the `R` object. ## Circle of correlations and interpretation To produces a circle of correlations: ::: panel-tabset ### R ```{r} fviz_pca_var(wine_pca) ``` ### Python ```{python} import numpy as np import matplotlib.pyplot as plt features = wine.columns[:-1] loading_matrix = pd.DataFrame(pca.components_.T, columns=[f"PC{i+1}" for i in range(11)], index=features) loading_matrix = loading_matrix.iloc[:, :2] # Normalize the PCA loadings loading_matrix_normalized = loading_matrix / np.sqrt(np.sum(loading_matrix**2, axis=0)) fig, ax = plt.subplots(figsize=(8, 8)) ax.scatter(loading_matrix_normalized["PC1"], loading_matrix_normalized["PC2"], color="r", marker="o", s=100) for i, feature in enumerate(loading_matrix_normalized.index): ax.arrow(0, 0, loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], color="r", alpha=0.8) ax.text(loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], feature, color="black", fontsize=12) ax.axhline(0, color="black", linewidth=1) ax.axvline(0, color="black", linewidth=1) ax.set_xlim(-1.1, 1.1) ax.set_ylim(-1.1, 1.1) circle = plt.Circle((0, 0), 1, color='gray', fill=False, linestyle="--", linewidth=1) ax.add_artist(circle) ax.set_xlabel("PC1") ax.set_ylabel("PC2") ax.set_title("Circle of Correlations") ax.grid() plt.show() ``` ::: {.callout-note} Please do note for this plot: - Values on the x-axis have been normalized to be between -1 and 1. For the rest of the python plots, we'll not apply this. - To get the same results as R, the coordinates of PC2 should be flipped. The results appear mirrored because PCA is sensitive to the orientation of the data. The PCA algorithm calculates eigenvectors as the principal components, and eigenvectors can have either positive or negative signs. This means that the orientation of the principal components can vary depending on the implementation of PCA in different libraries. You can get the same results as R with the code below: ```{python} import numpy as np import matplotlib.pyplot as plt # Normalize the PCA loadings loading_matrix_normalized = loading_matrix / np.sqrt(np.sum(loading_matrix**2, axis=0)) # Add this if you want the same results loading_matrix_normalized["PC2"] = -loading_matrix_normalized["PC2"] fig, ax = plt.subplots(figsize=(8, 8)) ax.scatter(loading_matrix_normalized["PC1"], loading_matrix_normalized["PC2"], color="r", marker="o", s=100) for i, feature in enumerate(loading_matrix_normalized.index): ax.arrow(0, 0, loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], color="r", alpha=0.8) ax.text(loading_matrix_normalized.loc[feature, "PC1"], loading_matrix_normalized.loc[feature, "PC2"], feature, color="black", fontsize=12) ax.axhline(0, color="black", linewidth=1) ax.axvline(0, color="black", linewidth=1) ax.set_xlim(-1.1, 1.1) ax.set_ylim(-1.1, 1.1) circle = plt.Circle((0, 0), 1, color='gray', fill=False, linestyle="--", linewidth=1) ax.add_artist(circle) ax.set_xlabel("PC1") ax.set_ylabel("PC2") ax.set_title("Circle of Correlations") ax.grid() plt.show() ``` ::: ::: This is for the two first principal components. We see - PC1 explains $31.4\%$ of the variance of the data, PC2 explains $15.3\%$. In total, $46.7\%$ of the variance of the data is explained by these two components. - PC1 is positively correlated with `density`, negatively with `alcohol`. Which confirms that these two features are negatively correlated. It is also positively correlated with `residual.sugar`. - PC2 is positively correlated with `pH`, negatively with `fixed.acidity` and, a little less, with `citric.acid`. - Features with shorts arrows are not explained here: `volatile.acidity`, `sulphates`, etc. To even better interpret the dimensions, we can extract the contributions of each features in the dimension. Below, for PC1. ::: panel-tabset ### R ```{r} fviz_contrib(wine_pca, choice = "var", axes = 1) ``` ### Python ```{python} explained_variance = pca.explained_variance_ratio_ square_loading_matrix = loading_matrix**2 contributions = square_loading_matrix * 100 / explained_variance[:2] contributions = contributions["PC1"].sort_values(ascending=False) plt.figure(figsize=(10, 6)) sns.barplot(x=contributions.values, y=contributions.index, palette="viridis") plt.xlabel("Contribution (%)") plt.ylabel("Features") plt.title("Feature Contributions for PC1") plt.show() ``` ::: We recover our conclusions from the circle (for PC1). ## Map of the individual and biplot We can represent the wines in the (PC1,PC2) map. To better interpret the map, we add on it the correlation circle: a biplot. ::: panel-tabset ### R ```{r} ## fviz_pca_ind(wine_pca) ## only the individuals fviz_pca_biplot(wine_pca) ## biplot ``` ### Python ```{python} import numpy as np import matplotlib.pyplot as plt # Define the scores for the first two principal components scores = wine_pca.iloc[:, :2] # Define the loadings for the first two principal components loadings = pca.components_.T[:, :2] # Scale the loadings by the square root of the variance loadings_scaled = loadings * np.sqrt(pca.explained_variance_[:2]) # Calculate the scaling factor for the arrows arrow_max = 0.9 * np.max(np.max(np.abs(scores))) scale_factor = arrow_max / np.max(np.abs(loadings_scaled)) # Create a scatter plot of the scores plt.figure(figsize=(10, 8)) plt.scatter(scores.iloc[:, 0], scores.iloc[:, 1], s=50, alpha=0.8) # Add arrows for each variable's loadings for i, variable in enumerate(wine.columns[:-1]): plt.arrow(0, 0, loadings_scaled[i, 0]*scale_factor, loadings_scaled[i, 1]*scale_factor, color='r', alpha=0.8) plt.text(loadings_scaled[i, 0]*scale_factor, loadings_scaled[i, 1]*scale_factor, variable, color='black', fontsize=12) # Add axis labels and title plt.xlabel('PC1') plt.ylabel('PC2') plt.title('Biplot of Wine Dataset') # Add grid lines plt.grid() # Show the plot plt.show() ``` ::: It's a bit difficult to see all the patterns, but for instance - Wine 168 has a large `fixed.acidity` and a low `pH`. - Wine 195 has a large `alcohol` and a low `density`. - etc. When we say "large" or "low", it is not in absolute value but relative to the data set, i.e., "larger than the average"; the average being at the center of the graph (PC1=0, PC2=0). ## How many dimensions For graphical representation one a single graph, we need to keep only two PCs. But if we use it to reduce the dimension of our data set, or if we want to represent the data on several graphs, then we need to know how many components are needed to reach a certain level of variance. This can be achieved by looking at the eigenvalues (*screeplot*). ::: panel-tabset ### R ```{r} fviz_eig(wine_pca, addlabels = TRUE, ncp=11) ``` ### Python ```{python} plt.figure(figsize=(10, 6)) plt.plot(range(1, 12), explained_variance * 100, 'o-') plt.xticks(range(1, 12), [f"PC{i}" for i in range(1, 12)]) plt.xlabel("Principal Components") plt.ylabel("Explained Variance (%)") plt.title("Scree Plot") plt.grid() plt.show() ``` ::: If we want to achieve $75\%$ of representation of the data (i.e., of the variance of the data), we need 5 dimensions. This means that the three biplots below represent $>75\%$ of the data (in fact $84.6\%$). ```{r} library(gridExtra) p1 <- fviz_pca_biplot(wine_pca, axes = 1:2) p2 <- fviz_pca_biplot(wine_pca, axes = 3:4) p3 <- fviz_pca_biplot(wine_pca, axes = 5:6) grid.arrange(p1, p2, p3, nrow = 2, ncol=2) ``` # Using PCA to represent clustering results We now combine clustering and PCA: we make clusters and represent them on the map of the individuals. First, we make a clustering (below $k=4$ for the example). Then, we use the group (as factors) to color the individuals in the biplot. ::: panel-tabset ### R ```{r} wine_hc <- hclust(dist(wine[,-12], method = "manhattan")) wine_clust <- cutree(wine_hc, k = 4) fviz_pca_biplot(wine_pca, col.ind = factor(wine_clust)) ``` ### Python ```{python} from sklearn.cluster import AgglomerativeClustering from matplotlib.cm import get_cmap from matplotlib.patches import Patch clustering = AgglomerativeClustering(n_clusters=4, metric='manhattan', linkage='average') wine_clust = clustering.fit_predict(wine.iloc[:, :-1]) cmap = get_cmap("viridis", 4) colors = cmap(np.array(wine_clust) / 3) # Define legend labels based on cluster number labels = ['Cluster {}'.format(i+1) for i in range(4)] # Rescale the loading matrix loading_matrix_rescaled = loading_matrix.div(loading_matrix.std()) plt.figure(figsize=(10, 10)) plt.scatter(wine_pca[0], wine_pca[1], c=colors, marker="o", s=100, alpha=0.7) for i, feature in enumerate(loading_matrix_rescaled.index): plt.arrow(0, 0, loading_matrix_rescaled.loc[feature, "PC1"], loading_matrix_rescaled.loc[feature, "PC2"], color="r", alpha=0.8) plt.text(loading_matrix_rescaled.loc[feature, "PC1"], loading_matrix_rescaled.loc[feature, "PC2"], feature, color="black", fontsize=12) plt.xlabel("PC1") plt.ylabel("PC2") plt.title("PCA Biplot with Clusters") plt.grid() # Create legend patches and group them in a Legend instance legend_patches = [Patch(color=cmap(i/3), label=labels[i]) for i in range(4)] plt.legend(handles=legend_patches, title='', loc='upper right', bbox_to_anchor=(1.0, 1.0)) plt.show() ``` ::: We can see that the clusters are almost separated by the dimensions: - Cluster 1: larger PC1 and PC2. That is, large pH (low acidity) and large density, residual sugar (low alcohol), etc. - Cluster 3: larger PC1 and smaller PC2: That is, larger alcohol (lower density and sugar) and large pH (low acidity). - Cluster 2 is fuzzier. Larger acidity (lower pH) and larger alcohol. - Cluster 4 is apparently linked to larger citric acidity.