Classification Losses (Quick Revision)

Navaneeth Sharma
4 min readOct 10, 2021

Cross Entropy Loss and Hinge Loss Revision

Photo by Aaron Burden on Unsplash

Welcome to another blog of Machine Learning /Data Science Revision Series. This blog is the #5 of the Series. As this is a revision, I assume you have some background on Classification Algorithms and their optimization.

Classification is a task to classify things based on features. In Machine learning, it means to categorize using ML algorithms based on the data. The classification can be via numerous algorithms, SVM, Logistic Regression, Tree-based algorithms, or neural networks. The algorithms other than tree-based will widely use either hinge loss or categorical loss. (The Tree-based algorithms contain a special set of losses, which we will discuss in the coming parts). Let’s dive into What? Why? and When? to use these loss functions as our tradition :-).

Cross Entropy Loss

What?

Cross-Entropy is a measure of loss between two probability distributions. What does that mean now? Entropy is a measure of randomness. So as much data is random, the entropy would be higher. Now Cross-Entropy means is difference between two or more entropies.

There are two types of Cross-Entropy Losses

  1. Binary Cross-Entropy
  2. Categorical Cross-Entropy

Binary Cross-Entropy is a loss function for data that has only two kinds of labels. The loss can be calculated by

Binary Cross-Entropy Equation

It is also sometimes called logistic loss. Generally used when the activation function of the last layer(for neural net) is sigmoid or tanh functions.

Categorical Cross-Entropy is used generally when there are more than two kinds of labels in the data. The Equation is exact the same as the previous one. The softmax activation function is used at the last layer of the model, then Categorical Cross-Entropy is calculated based on the probabilities calculated by softmax.

Categorical Cross Entropy General Equation

Why?

The Binary Cross-Entropy will fit the classification tasks, especially binary classification. As it gives the difference between the probabilities, We can estimate the output in probability accurately using this loss function.
The Categorical Cross-Entropy best suits for Multi-Class classification. The labels are subjected to One-Hot vector encoding before training. This loss is usually used in algorithms such as neural networks. In the last layer of the Neural Network, the softmax activation function is applied for getting the probabilities across the classes of prediction. Then the loss of computed probability is calculated using Cross-Entropy, and the backpropagation algorithm is performed.

When?

Cross-Entropy loss is used for algorithms such as Logistic Regression, Neural Networks, and Most of the Deep-Learning based Classification Models. If there is a binary Classification task, go for binary cross-entropy loss, else if there are more than two classes at the output labels got for Categorical Cross-Entropy.

Hinge Loss

What?

Support Vector Machines (SVMs) use Hinge Loss as the loss function. The significant advantage of using Hinge loss is, it provides a convex optimization. Though it is convex, it is non-smooth and non-differentiable loss. Whereas Logistic regression, which uses Cross-Entropy loss is smooth and differentiable. So, Hinge loss requires a special kind of optimization which is a bit tricky.

Sample Hinge Loss Equation

Why?

As SVM uses maximum-margin for creating a decision boundary, Hinge loss plays a key role. There are several modification done to Hinge loss, for better optimization. The loss can be used for linear predictors. As SVM maps the lower dimensional data to higher dimensions, it will be easy to get the decision boundary on the cost of computation.

When?

The libraries which have SVMs, internally use hinge loss. If you are building SVM from scratch then go into Hinge loss and its optimization details. As far as I know, most of the algorithms do not use Hinge loss except SVMs.

I highly recommend you to revise SVMs instead of just Hinge Loss.

Cool! We have revised the major loss functions for the classification, Cross-Entropy loss, and Hinge Loss.

Thank you for your time. Let’s meet next time with another exciting revision material.

--

--