Easy Way

Classification Report

from sklearn.metrics import classification_report

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]

print(classification_report(y_true, y_pred))
              precision    recall  f1-score   support

           0       0.25      0.50      0.33         2
           1       1.00      0.40      0.57         5
           2       0.50      1.00      0.67         2
           3       0.00      0.00      0.00         1

    accuracy                           0.50        10
   macro avg       0.44      0.47      0.39        10
weighted avg       0.65      0.50      0.49        10

Confusion Matrix

  • Row: True Label
  • Column: Predicted Label

  • TP 예제: [0, 0, 1, 1, 2, 2] GT에서 [0, 0, 1, 1, 2, 2] 예측시, 다 맞음.
  • FP 예제: [0, 0, 1, 1, 2, 2] GT에서 [0, 0, 0, 0, 0, 0] 예측시, 0의 FP는 1번을 0으로 예측하고, 2를 0으로 예측한 것을 FP로 봄 (0을 0으로 예측한건 TP)
  • FN 예제: [0, 0, 1, 1, 2, 2] GT에서 [2, 2, 0, 0, 1, 1] 예측시, 각 레이블마다 고양이는 강아지라고, 자동차는 사람이라고 예측했는데 다 틀림.
from sklearn.metrics import confusion_matrix
import seaborn as sns

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]

cm = confusion_matrix(y_true, y_pred)

print(cm)
sns.heatmap(cm, cmap='YlGnBu')
[[1 0 1 0]
 [2 2 1 0]
 [0 0 2 0]
 [1 0 0 0]]
 

Performance Test

Accuracy

\[\text{Accuracy} = \frac{TP + TN}{N}\]
from sklearn.metrics import accuracy_score

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]

acc = accuracy_score(y_true, y_pred)
acc_norm = accuracy_score(y_true, y_pred, normalize=False)

print('n:', len(y_true))
print(f'Accuracy           : {acc:.2}')
print(f'Normalized Accuracy: {acc_norm}')
n: 10
Accuracy           : 0.5
Normalized Accuracy: 5

Accuracy from Confusion Matrix

def cal_accuracy(cm):
    """
    대각선으로 (TP + TN) 모두 합하고, 전체 N 값으로 나눈다
    """
    return np.diagonal(cm).sum() / cm.sum()

cm = confusion_matrix(y_true, y_pred)
print('Accuracy:', cal_accuracy(cm))
Accuracy: 0.5

Recall (Sensitivity, True Positive Rate)

\[\text{True Positive Rate} = \frac{TP}{TP + FN}\]
  • 단점: 전부다 1로 예측하면, TP는 다 맞추고, FN은 0이 되면서, recall의 예측값은 1이 된다

average parameter

  • None : 각각의 클래스마다의 recall값을 계산한다
  • binary: (default) binary classification에서 사용
  • micro : 전체 클래스 데이터 관점에서의 total true positives, false negatives, false positives 를 계산
  • macro : 각각의 label마다의 recall의 unweighted mean을 계산한다. 따라서 label imbalance를 고려하지 않는다
from sklearn.metrics import recall_score

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]


recalls = recall_score(y_true, y_pred, average=None)
recall_micro = recall_score(y_true, y_pred, average='micro')
recall_macro = recall_score(y_true, y_pred, average='macro')
recall_weighted = recall_score(y_true, y_pred, average='weighted')

print('Recalls          :', recalls)
print(f'Recall (micro)   : {recall_micro:.2}')
print(f'Recall (macro)   : {recall_macro:.2}')
print(f'Recall (weighted): {recall_weighted:.2}')
Recalls          : [0.5 0.4 1.  0. ]
Recall (micro)   : 0.5
Recall (macro)   : 0.47
Recall (weighted): 0.5

Recall From Confusion Matrix

def cal_recall(cm, average=None):
    data = [np.nan_to_num(cm[i, i] / cm[i, :].sum()) for i in range(cm.shape[0])]
    data = np.array(data)
    
    if average is None:
        return data
    elif average == 'macro':
        return data.mean()
    elif average == 'micro':
        weight = cm.sum(axis=1)
        return (data * weight).sum() / weight.sum()
    return data

cm = confusion_matrix(y_true, y_pred)
print('recalls        :', cal_recall(cm, average=None))
print('recalls (macro):', cal_recall(cm, average='macro'))
print('recalls (micro):', cal_recall(cm, average='micro'))
recalls        : [0.5 0.4 1.  0. ]
recalls (macro): 0.475
recalls (micro): 0.5

Precision

\[\text{Precision} = \frac{TP}{TP + FP} = \frac{TP}{\text{Predicted Yes}}\]
  • 단점: FP가 없는 경우 100% 맞은 것으로 나옴.
    • 아래 1의 경우, 다른 레이블 (0, 2, 3) 에서 어떠한 예측값에서 1이 없습니다. (False Positive가 없음)
    • 즉, “고양이” 예측을 단 한번만 맞추고, 다른 모든 예측 값을 “강아지”로 하면 precision은 100% 다 맞춘것으로 나옵니다.
from sklearn.metrics import precision_score

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]

precisions = precision_score(y_true, y_pred, average=None)
precision_micro = precision_score(y_true, y_pred, average='micro')
precision_macro = precision_score(y_true, y_pred, average='macro')
precision_weighted = precision_score(y_true, y_pred, average='weighted')

print('Precisions       :', precisions)
print(f'Precision (micro)   : {precision_micro:.2}')
print(f'Precision (macro)   : {precision_macro:.2}')
print(f'Precision (weighted): {precision_weighted:.2}')
Precisions       : [0.25 1.   0.5  0.  ]
Precision (micro)   : 0.5
Precision (macro)   : 0.44
Precision (weighted): 0.65

Precision from Confusion Matrix

def cal_precision(cm, average=None):
    data = [np.nan_to_num(cm[i, i] / cm[:, i].sum()) for i in range(cm.shape[0])]
    data = np.array(data)

    if average is None:
        return data
    elif average == 'macro':
        return data.mean()
    elif average == 'micro':
        weight = cm.sum(axis=0)
        return (data * weight).sum() / weight.sum()
    return data

print(f'Precisions        : {cal_precision(cm)}')
print(f'Precision (macro) : {cal_precision(cm, average="macro"):.2}')
print(f'Precision (micro) : {cal_precision(cm, average="micro"):.2}')
Precisions        : [0.25 1.   0.5  0.  ]
Precision (macro) : 0.44
Precision (micro) : 0.5

F1 Score

\[\text{F1 Score} = 2 \cdot \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}}\]
from sklearn.metrics import f1_score

y_true = [0, 0, 1, 1, 1, 1, 1, 2, 2, 3]
y_pred = [0, 2, 1, 1, 2, 0, 0, 2, 2, 0]

f1_score(y_true, y_pred, average=None)

f1 = f1_score(y_true, y_pred, average=None)
f1_micro = f1_score(y_true, y_pred, average='micro')
f1_macro = f1_score(y_true, y_pred, average='macro')
f1_weighted = f1_score(y_true, y_pred, average='weighted')

print('F1       :', f1)
print(f'F1 (micro)   : {f1_micro:.2}')
print(f'F1 (macro)   : {f1_macro:.2}')
print(f'F1 (weighted): {f1_weighted:.2}')
F1       : [0.33333333 0.57142857 0.66666667 0.        ]
F1 (micro)   : 0.5
F1 (macro)   : 0.39
F1 (weighted): 0.49