Lasso | Ridge Regression (Quick Revision)

Navaneeth Sharma
3 min readOct 2, 2021

The Must Learn Regression Techniques

Photo by Nicholas Cappello on Unsplash

Welcome back! Today we will quickly go through the LASSO and Ridge regressions which are the modifications for Linear regression. This article is a part of the revision series, so I assume you know Linear Regression (Not necessarily LASSO and Ridge, the blog will give you a very high-level idea of these).

Lasso and Ridge Regression are a special kind of model for Regressions that penalizes the Least-squares loss, Logistic loss, etc. The General Equation that holds good for both is

General Expression for any Loss Function

LASSO Regression

For Lasso, the penalty will be

L1 Regularization or Lasso

Here the term lambda refers to a regularization parameter. While this is a Hyperparameter, we need to be cautious of selecting it. Higher the lambda more penalty for the cost function for small mistakes while learning. Also, the weight w term is made absolute before summing to the loss function, also called as L1 penalty. Here comes the name as Least Absolute Shrinkage and Selection Operator (LASSO).

Pros

  • It can reduce the Overfitting of the linear model.
  • Also, widely used to get the important features, which have multicollinearity. Only one feature will be selected among all possible correlated features by completely penalizing other features [Note: The weight assigned to other features will be zero. ]

Cons

  • It performs features selection poorly for the bootstrapped dataset. As bootstrapped data varies a lot, there is a possibility of choosing different feature every time if data has multiple collinear data.
  • When there are multiple highly correlated features, Lasso selects a random feature among them, which can be non-intuitive.

Ridge Regression

For Ridge, the penalty will be

L2 Regularization term

The term lambda has a similar role to play as in lasso, the higher value of lambda, the more will be the penalty. Here, Instead of taking absolute, the square term of the W is used to regularize. This approach is predominantly used among Applied Mathematics, Machine Learning, Signal Processing area, etc. You can learn more about this here.

Pros

  • Reduces Overfitting of Model. Like most of the regularization out there, this will try to reduce the overfitting of the Model by the use of proper lambda value
  • Computationally Efficient. The Ridge Regression is very light and less computations are required compared to other algorithms like SVM, etc.

Cons

  • As Ridge regression uses the regularization, it tends to reduce the dimensions [not like Lasso Regression, which completely ignores the other features], there will be a high bias towards some features

Cool! That’s it for this time. We have gone through Lasso and Ridge Regressions and got to know the pros and cons of these.

Thank you for your time. Let’s meet next time with another exciting revision material.

--

--