What is the difference between LDA and PCA for dimensionality reduction

Question

I have a dataset with n number of dimensions what should be the ideal algorithm to approach it.

Abhi · Answer 1 · Aug 4, 2018

PCA is a Dimensionality Reduction algorithm.

Basically, its a machine learning based technique to extract hidden factors from the dataset.

Defines your data using lesser number of components to explain the variance in your data
Reduces the number of dimensions in the data such that your computational complexity is reduced

Working of PCA:

Consider a scenario where you have data on x and y axis:

Applying PCA results into the generation of components, such that they are orthogonal and hence, highly uncorrelated with each other. Hence, also solving the problem of multicollinearity.

Though PCA reduces dimensions but when dealing with multi-class data it’s necessary to reduce dimensions in a way that inter class separation is also taken care of. LDA is an algorithm used for the same. Let’s discuss it in detail :

Reduces Dimensions
Searches for a linear combination of variables that best separates 2 classes
Reduces the degree of overfitting

Working of LDA:

Assume a set of D - dimensional samples {x(1, x(2, …, x(N}, N1 of which belong to class ω1 and N2 to class ω2

Obtain a scalar y by projecting the samples x onto a line: Y = W^TX

Of all the possible lines select the one that maximizes the separability of the scalars:

answered Aug 4, 2018 by Abhi
• 3,720 points

Seema · Answer 2 · Mar 7, 2019

Principal Component Analysis (PCA) is an unsupervised learning algorithm as it ignores the class labels (the so-called principal components) that maximize the variance in a dataset, to find the directions. In other words, PCA is basically summarization of data.PCA does not select a set of features and discard other features, but it infers some new features, which best describe the type of class from the existing features.

PCA works on eigenvectors and eigenvalues of the covariance matrix, which is the equivalent of fitting those straight, principal-component lines to the variance of the data. Why? Because eigenvectors trace the principal lines of force, In other words, PCA determines the lines of variance in the dataset which are called as principal components with the first principal component having the maximum variance, second principal component having second maximum variance and so on.

Linear Discriminant Analysis is a supervised algorithm as it takes the class label into consideration. It is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.

LDA helps you find the boundaries around clusters of classes. It projects your data points on a line so that your clusters are as separated as possible, with each cluster having a relative (close) distance to a centroid.

So the question arises- how are these clusters are defined and how do we get the reduced feature set in case of LDA?

Basically LDA finds a centroid of each class datapoints. For example with thirteen different features LDA will find the centroid of each of its class using the thirteen different feature dataset. Now on the basis of this, it determines a new dimension which is nothing but an axis which should satisfy two criteria:
1. Maximize the distance between the centroid of each class.
2. Minimize the variation (which LDA calls scatter and is represented by s2), within each category.

PCA performs better in case where number of samples per class is less. Whereas LDA works better with large dataset having multiple classes; class separability is an important factor while reducing dimensionality.