Classification
Definition:
Classification is the supervised
learning task of predicting a categorical
class label from input data. Each example in the dataset
belongs to one of a predefined set of classes.
Characteristics:
- Outputs are discrete.
- The goal is to assign each input to a single class.
- Classes can be binary (two classes) or multiclass (more
than two classes).
Examples:
- Classifying emails as spam or not spam (binary
classification).
- Classifying iris flowers into one of three species
(multiclass classification).
Types of Classification:
- Binary Classification:
Distinguishing between exactly two classes.
- Multiclass Classification:
Distinguishing among more than two classes.
- Multilabel Classification:
Assigning multiple class labels to each instance.
Key Concepts:
- The class labels are discrete and come from a finite set.
- Often expressed as a yes/no question in binary
classification (e.g., “Is this email spam?”).
- The predicted class labels are often encoded numerically
but represent categories (e.g., 0, 1, 2 for iris species).
Regression
Definition:
Regression is the supervised
learning task of predicting a continuous
numerical value based on input features.
Characteristics:
- Outputs are continuous and often real-valued numbers.
- The model predicts a numeric quantity rather than a
class.
Examples:
- Predicting a person’s annual income from age, education,
and location.
- Predicting crop yield given weather and other factors.
Key Concepts:
- Unlike classification, the output is a continuous value.
- The task is about estimating the underlying function that
maps inputs to continuous outputs.
- Outputs can theoretically be any number within a range,
reflecting real-world quantities.
Distinguishing Between
Classification and Regression
An intuitive way to differentiate
is based on the continuity of
the output:
- If the output is discrete
(categorical classes), the problem is classification.
- If the output is continuous
(numerical values), the problem is regression.
Practical Examples and
Representations:
- The Iris
dataset is a classic example for classification, with
three species as classes.
- For regression, datasets might involve predicting house
prices, temperatures, or yields, with outputs as continuous numbers.
- Input data can be numerical or categorical, but models
require proper encoding and representation (e.g., one-hot encoding for
categorical variables).
Summary and Usage
- Classification and regression are foundational supervised
learning tasks.
- Choosing the right algorithm depends on the nature of the
output (categorical vs continuous).
- Preprocessing and feature representation are critical for
both tasks to achieve good performance.
- Many algorithms can be adapted for either task, but the
interpretation and training differ accordingly.
Comments
Post a Comment