What is Principal Component Analysis and how do I create it's model in R
Jul 17, 2018 696 views

## 2 answers to this question.

Principal Component Analysis is a method for dimensionality reduction. Many times, it happens that, one observation is related to multiple dimensions(features) and this brings in a lot of chaos to the data, that is why it is important to reduce the number of dimensions.

The concept of Principal Component Analysis is this:

• The data is transformed to a new space, with equal or less number of dimensions. These dimensions(features) are known as principal components.
• The first principal component captures the maximum amount of variance from the features in the original data.
• The second principal component is orthogonal to the first and captures the maximum amount of variability left.
• The same is true for each principal component, they are all uncorrelated and each is less important than the previous one.

You can do PCA in R with the help of “prcomp()” function.

• 6,360 points

Principal component analysis (PCA) is routinely employed on a wide range of problems. From the detection of outliers to predictive modeling, PCA has the ability of projecting the observations described by $p$ variables into few orthogonal components defined at where the data ‘stretch’ the most, rendering a simplified overview. PCA is particularly powerful in dealing with multicollinearity and variables that outnumber the samples ( $p \gg n$).

It is an unsupervised method, meaning it will always look into the greatest sources of variation regardless of the data structure. Its counterpart, the partial least squares (PLS), is a supervised method and will perform the same sort of covariance decomposition, albeit building a user-defined number of components (frequently designated as latent variables) that minimize the SSE from predicting a specified outcome with an ordinary least squares (OLS).

Although there is a plethora of PCA methods available for R, I will only introduce two,

• prcomp, a default function from the R base package
• pcaMethods, a bioconductor package that I frequently use for my own PCAs
• 3,790 points

## Create a tree model in R from data.frame?

See the below example to understand how ...READ MORE

## SVM model in R

What is svm model? How to use ...READ MORE

## How to visualize the randomForest model in R?

How to visualize the randomForest model in ...READ MORE

## All Levels of a Factor in a Model Matrix in R

I have a data.frame that includes factor ...READ MORE

## Big Data transformations with R

Dear Koushik, Hope you are doing great. You can ...READ MORE

## Finding frequency of observations in R

You can use the "dplyr" package to ...READ MORE

The below is the code to perform ...READ MORE