The performance measurement for supervised learning algorithms is simple because the evaluation can be done by comparing the prediction against the labels. However, for an unsupervised learning problem, there are no labels and therefore also no ground truth. Therefore we need other evaluation methods to determine how well our clustering algorithm performs. First, let’s start to find out what a good clustering algorithm is.
A good clustering algorithm has two characteristics
1) A clustering algorithm has a small within-cluster variance. Therefore all data points in a cluster are similar to each other.
2) Also a good clustering algorithm has a large between-cluster variance and therefore clusters are dissimilar to other clusters.
All clustering performance measurements are based on these two characteristics. Generally, there are two types of evaluation metrics for clustering,
In the following chapters, we dive into these cluster performance measurements.
The Rand Index (RI) measures the similarity between the cluster assignments by making pair-wise comparisons. A higher score signifies higher similarity.
– The number of a pair-wise same cluster can be seen as “true positive” (TP)
– The number of a pair-wise wrong cluster can be seen as “true negative” (TN)
The Rand Index always takes on a value between 0 and 1 and a higher index stands for better clustering.
\text = \frac + \text>>
where the total number of possible pairs is calculated by
\text = \fracwhere n is the number of samples in the dataset.
Because there might be many cases of “true negative” pairs by randomness, the Adjusted Rand Index (ARI) adjusts the RI by discounting a chance normalization term.
The following picture shows an example of how the Rand Index is calculated. We have a dataset that consists of 6 samples (A-F) and two cluster algorithms that made their prediction for each sample if the sample belongs to clusters 1, 2, or 3 (see the first table in the picture). In the second table, all possible sample pairs are listed with the predicted cluster for algorithms 1 and 2.
To calculate the Rand Index we search for
Based on the values we found in this example, we can compute the Rand Index:
\text = \frac + \text>> = \frac = 0.67
I also proved my example with a short implementation in Python with Google Colab. You see that the Rand Index is equal in the Python implementation as well as in the manual calculation.
from sklearn.metrics import rand_score, adjusted_rand_score labels = [1, 1, 1, 2, 2, 3] labels_pred = [1, 1, 2, 2, 3, 3] RI = rand_score(labels, labels_pred) # Rand Index: 0.6666666666666666 ARI = adjusted_rand_score(labels, labels_pred)
Mutual Information (MI) measures the agreement between the cluster assignments. A higher score signifies higher similarity.
There are two different variations of Mutual Information:
– Normalized Mutual Information (NMI): MI divided by average cluster entropies.
– Adjusted Mutual Information (AMI): MI adjusted for chance by discounting a chance normalization term.
The Mutual Information always takes on a value between 0 and 1 and a higher index stands for better clustering.
\text = \frac[H(U)+H(V)]>
\text = \frac[H(U)+H(V)]-E[MI]>
Assume we have m=3 classes and k=2 clusters that you see in the following picture.
The Entropy of Class Labels is calculated for the entire dataset and therefore independent of the clustering output. Therefore you can calculate the Entropy of Class Labels prior to clustering.