How do I remove unnecessary redundant data from a dataset?

0 votes
I have a data set with 100 clumns and 18854 rows. How do I eliminate redundant data?
Nov 13, 2018 in Data Analytics by Ali
• 10,440 points
70 views

1 answer to this question.

0 votes

You can use dimensionality reduction methods such as PCA.

dimensionality reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components.

answered Nov 13, 2018 by Maverick
• 10,040 points

Related Questions In Data Analytics

0 votes
1 answer

How do I remove an element from a list by index in R?

Use list[index] = NULL The list value will ...READ MORE

answered Oct 31 in Data Analytics by Cherukuri
• 32,260 points
42 views
0 votes
1 answer

How do I become a data scientist step by step?

I am assuming that you are a ...READ MORE

answered Jul 26, 2018 in Data Analytics by Anmol
• 3,620 points
106 views
+1 vote
1 answer

How do I perform feature selection in a disease prediction data set?

Feature selection is based equally upon logic ...READ MORE

answered Aug 20, 2018 in Data Analytics by Anmol
• 3,620 points
65 views
0 votes
1 answer

How do I make a matrix from a list of vectors in R?

Suppose l1 and l2 are my vectors, li = ...READ MORE

answered Aug 6 in Data Analytics by Cherukuri
• 32,260 points
32 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
77 views
0 votes
1 answer

Cleaning a Data Frame Using Regexp in R

The simplest way: library(dplyr) library(stringi) df %>% mutate(NUMERO_APPEL.fix = ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
33 views
0 votes
1 answer

Manipulate character string using gsub() and perform multivariate data cleaning efficiently in R

gsubfn is perfect for this task: library(gsubfn) as.vector(sapply(gsubfn("[A-Z]", list(B="* 1", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
40 views
0 votes
1 answer

Clean and standardize words using R

You might want to checkout the stringdist package, e.g.: library(stringdist) toMatch ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
39 views
0 votes
1 answer

How do i send R errors from console to standard java output?

R offers a command to save its ...READ MORE

answered Nov 8, 2018 in Data Analytics by Maverick
• 10,040 points
37 views
0 votes
1 answer

How to remove certain character from a vector

We can use sub to remove the * by specifying fixed = ...READ MORE

answered Nov 14, 2018 in Data Analytics by Maverick
• 10,040 points
53 views