Multiclass Classification in Supervised Learning

What is Multiclass Classification?

Multiclass Classification is a type of Supervised Learning used to predict which of several possible categories an observation belongs to.

Similar to regression and binary classification, it follows the same training, validation, and evaluation iterative process, reserving a portion of the data for validating the model.

Examples

Predicting email categories (spam / work email / promotional email)
Predicting disease types (flu / cold / allergy)
Predicting penguin species (Adelie / Chinstrap / Gentoo)

Example: Penguin Species Classification

We observe the **flipper length of the penguins (x)** and use it to predict the **species of the penguins (y)**.

Species Codes

0: Adelie
1: Chinstrap
2: Gentoo

Sample Data

Flipper Length (x)	Species (y)
167	0
172	0
225	2
197	1
189	1
232	2
158	0

Objective: Train a model to **input flipper length (x)** and predict **penguin species (y)**.

Training a Multiclass Classification Model

Types of Multiclass Classification Algorithms

One-vs-Rest (OvR) – Train multiple binary classification models, each predicting one category vs. all other categories.
Multinomial Classification – Train a multiclass classification model that calculates the probabilities for all categories.

One-vs-Rest (OvR) Algorithm

Concept

Train a binary classifier for each category to calculate the probability of belonging to that category:

f0(x) = P(y=0 | x)
f1(x) = P(y=1 | x)
f2(x) = P(y=2 | x)

Characteristics

Computationally efficient, suitable for cases with fewer categories.
Training multiple binary classification models incurs high computational costs for each classification.

Multinomial Classification

Concept

Calculate the probability for each category, then select the category with the highest probability as the predicted value.

Softmax Function

Example

Softmax output calculation:
Class 0 (Adelie): 20%
Class 1 (Chinstrap): 30%
Class 2 (Gentoo): 50% (highest)

Characteristics

Computationally efficient, suitable for cases with many categories.
Train a single multiclass model at once, reducing training costs.

Evaluating Multiclass Classification Models

Test Data

Flipper Length (x)	Actual Species (y)	Predicted Species (ŷ)
165	0	0
171	0	0
205	2	1
195	1	1
183	1	1
221	2	2
214	2	2

Confusion Matrix for Multiclass Classification

Predicted \ Actual	0	1	2
0	2	0	0
1	0	2	1
2	0	1	2

Interpretation

Class 0:
Correctly predicted 2 times (TP)
No incorrect predictions
Class 1:
Correctly predicted 2 times (TP)
Misclassified 1 time (FP)
Class 2:
Correctly predicted 2 times (TP)
Misclassified 1 time (FN)

Calculating Evaluation Metrics

Metrics for Each Category

Category	TP	TN	FP	FN	Accuracy	Recall	Precision	F1 Score
0	2	5	0	0	1.00	1.00	1.00	1.00
1	2	4	1	0	0.86	1.00	0.67	0.80
2	2	4	0	1	0.86	0.67	1.00	0.80

Overall Evaluation Metrics

**Overall Accuracy**:

Calculated:

90% of predictions are correct.

Overall Recall

Calculated:

86% of actual categories were correctly identified.

Overall Precision

Calculated:

86% of samples predicted as that category are correct.

Overall F1 Score

Calculated:

F1 Score = 0.86, indicating good model performance.

Conclusion

Multiclass classification is suitable for prediction problems involving multiple categories. **Common algorithms include OvR and Softmax (multinomial regression)**. When evaluating multiclass classification, metrics can be calculated for each category or overall metrics.

Further Learning

Scikit-learn Multiclass Classification
TensorFlow Classification Tutorial

What is Multiclass Classification?

Example: Penguin Species Classification

Training a Multiclass Classification Model

One-vs-Rest (OvR) Algorithm

Multinomial Classification

Evaluating Multiclass Classification Models

Calculating Evaluation Metrics

Conclusion

Related posts

Leave a Comment Cancel reply