🌑

Stephen's Blog

Evaluation Metrics for Classification Models

Stephen Cheng

 

Intro

For supervised learning models, evaluation typically involves comparing the predictions made by the model with the ground truth labels that are provided in the dataset. Here are some common evaluation metrics used for assessing the performance of classification models.

Confusion Matrix

A confusion matrix (or, error matrix) is a visualization method for classifier algorithm results. More specifically, it is a table that breaks down the number of ground truth instances of a specific class against the number of predicted class instances.

  • Accuracy: [(TP + TN) / (TP + TN + FP + FN)]

Measures the proportion of correctly classified instances out of the total instances. It’s suitable for balanced datasets but can be misleading for imbalanced datasets. Higher accuracy values indicate a higher proportion of correct predictions.

  • Precision: TP / (TP + FP) or predicted positives

Measures the proportion of true positive predictions out of all positive predictions. Higher precision values indicate fewer false positive predictions. Precision is useful when the cost of false positives is high.

  • Recall (Sensitivity)/TPR: TP / (TP + FN) or actual positives

Measures the proportion of true positive predictions out of all actual positives. It focuses on the ability of the model to capture positive instances. Higher recall values indicate fewer false negative predictions.

  • Specificity/TNR : TN / (TN + FP) or actual negatives

Specificity, also known as the true negative rate, is a metric used in binary classification tasks to evaluate the performance of a model in correctly identifying negative instances (actual negatives). Higher specificity values indicate fewer false positive predictions.

  • False Positive Rate (FPR): FP / (FP + TN) or actual negatives

Also known as the false alarm rate, it is a metric used in binary classification tasks to evaluate the performance of a model in correctly identifying negative instances. It is the complement of specificity. Lower FPR values indicate fewer false positive predictions.

  • F1-score: 2 * (Precision * Recall) / (Precision + Recall)

The harmonic mean of precision and recall provides a balance between the two metrics. It’s useful when there’s an uneven class distribution.

Implementation with Python

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
from sklearn.metrics import accuracy_score, precision_score, recall_score, 
f1_score, confusion_matrix

# Example ground truth and predicted labels
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 1, 1, 0, 1, 1, 0, 0, 0, 1]

# Calculate accuracy
accuracy = accuracy_score(y_true, y_pred)
print("Accuracy:", accuracy)

# Calculate precision
precision = precision_score(y_true, y_pred)
print("Precision:", precision)

# Calculate recall
recall = recall_score(y_true, y_pred)
print("Recall:", recall)

# Calculate F1-score
f1 = f1_score(y_true, y_pred)
print("F1-score:", f1)

# Calculate confusion matrix
conf_matrix = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:")
print(conf_matrix)

# Calculate False Positive Rate (FPR)
TN = conf_matrix[0, 0]
FP = conf_matrix[0, 1]
FPR = FP / (FP + TN)
print("False Positive Rate (FPR):", FPR)

, , — Jun 24, 2024

Search

    Made with ❤️ and ☀️ on Earth.