How do I remove unnecessary redundant data from a dataset?

0 votes
I have a data set with 100 clumns and 18854 rows. How do I eliminate redundant data?
Nov 13, 2018 in Data Analytics by Ali
• 10,450 points
31 views

1 answer to this question.

0 votes

You can use dimensionality reduction methods such as PCA.

dimensionality reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.

answered Nov 13, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
1 answer

How do I become a data scientist step by step?

I am assuming that you are a ...READ MORE

answered Jul 26, 2018 in Data Analytics by ANMOL
• 3,620 points
72 views
+1 vote
1 answer

How do I perform feature selection in a disease prediction data set?

Feature selection is based equally upon logic ...READ MORE

answered Aug 20, 2018 in Data Analytics by ANMOL
• 3,620 points
37 views
0 votes
1 answer

How to remove rows with missing values (NAs) in a data frame?

You can use complete.cases in the following ...READ MORE

answered Apr 13, 2018 in Data Analytics by darklord
• 6,140 points
3,695 views
0 votes
1 answer

How can I drop columns by name in a data frame ?

We can Drop Columns by name in ...READ MORE

answered Apr 13, 2018 in Data Analytics by zombie
• 3,690 points
29 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
28 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
19 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
20 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
17 views
0 votes
1 answer

How do i send R errors from console to standard java output?

R offers a command to save its ...READ MORE

answered Nov 8, 2018 in Data Analytics by Maverick
• 10,040 points
12 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
39 views