1. What are Linear Models?
Linear models are a class of
models that make predictions using a linear
function of the input features. The prediction is computed as a
weighted sum of the input features plus a bias term. They have been extensively
studied over more than a century and remain widely used due to their
simplicity, interpretability, and effectiveness in many scenarios.
2. Mathematical Formulation
For regression, the general
form of a linear model's prediction is:
y^=w0x0+w1x1+…+wpxp+b
where;
- y^ is
the predicted output,
- xi is
the i-th input feature,
- wi is
the learned weight coefficient for feature xi,
- b is the intercept
(bias term),
- p is the number of
features.
In vector form:
y^=wTx+b
where w=(w0,w1,...,wp) and x=(x0,x1,...,xp).
3. Interpretation and Intuition
- The prediction is a linear combination of features — each feature
contributes proportionally to its weight.
- The model captures linear
relationships between features and targets.
- Despite simplicity, when data has a large number of features,
linear models can approximate complex functions (even perfectly fit
training data if number of features ≥ number of samples).
4. Linear Models for Regression
Ordinary Least Squares (OLS) /
Linear Regression
·
The classic linear regression model estimates w and b by minimizing the sum of squared differences
between observed and predicted values.
·
Objective: Minimize the residual sum of squares minw,b∑i=1N(yi−y^i)2 where yi are true outputs and y^i are
predicted outputs.
·
This results in a convex optimization problem with
a closed-form solution using linear algebra.
5. Linear Models for
Classification
- Linear models are also extensively used for
classification tasks.
- For example, Logistic
Regression models the probability of a class as a logistic
function applied to the linear combination of features.
- Similarly, Linear
Support Vector Machines (SVMs) seek a separating
hyperplane defined by a linear function.
6. When Do Linear Models Perform
Well?
- Particularly effective when the number of features is large relative to the number of samples,
as they can fit complex combinations of features.
- Efficient to train on very large datasets where training
more complex models is computationally prohibitive.
- Often serve as baseline
models or components in more complex pipelines.
7. Limitations and Failure Cases
- In low-dimensional
spaces or when the true decision boundary is non-linear,
linear models may underperform.
- They can't naturally handle complex, non-linear
relationships unless combined with feature transformations or kernel
methods (e.g., kernelized SVMs).
- Feature scaling and careful regularization are necessary
to avoid overfitting or underfitting.
8. Key Variants
- Ordinary Least Squares (OLS):
Minimizes squared error, no regularization.
- Ridge Regression:
Adds L2 regularization to penalize large weights.
- Lasso Regression:
Adds L1 regularization for feature selection/sparsity.
- Elastic Net:
Combines L1 and L2 penalties.
- Variants apply different techniques for parameter
estimation and complexity control.
9. Summary
- Linear models predict through a weighted sum of features.
- They are computationally efficient and interpretable.
- Perform well with many features or large datasets.
- May be outperformed in non-linear or low-dimensional
contexts.
- Integral to classical and modern machine learning
workflows.
Comments
Post a Comment