How does data cleaning play a vital role in data analysis

0 votes
I want to know how data cleaning plays a vital role in analysis?
Jul 23, 2018 in Data Analytics by DataKing99
• 8,130 points
142 views

2 answers to this question.

–1 vote

Data cleaning can help in analysis because:

  • Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with.
  • Data Cleaning helps to increase the accuracy of the model in machine learning.
  • It is a cumbersome process because as the number of data sources increases, the time taken to clean the data increases exponentially due to the number of sources and the volume of data generated by these sources.
  • It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.
answered Jul 23, 2018 by CodingByHeart77
• 3,680 points
0 votes

Data is the core you do your analysis upon, unclean data generate un-accurate and ambiguous results from the analysis

Let's take an example to understand this further:

You want to categorize similarly color clothes together from the following table:

S no. Item name Color Size
1 T-shirt bleu S
2 T-shirt Blue M
3 Jeans Black L
4 T-shirt Blue XL
5 Jeans Black L

 

As the data is unclean the analysis will show 3 color categories instead of two, hence data cleaning is a vital process of data analysis.
answered Jul 23, 2018 by Anmol
• 3,620 points

Related Questions In Data Analytics

0 votes
1 answer
0 votes
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21 in Data Analytics by anonymous
• 25,900 points
121 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,060 points

edited Apr 12, 2018 by nirvana 2,569 views
0 votes
2 answers

How to remove rows with missing values (NAs) in a data frame?

Hi, The below code returns rows without ...READ MORE

answered Aug 20 in Data Analytics by anonymous
• 25,900 points
4,527 views
0 votes
1 answer

Cleaning data using R

Try something like this: text1='"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10" 1,"Male",22,"movies","music","travel","cloths","grocery",,,,, 2,"Male",28,"travel","books","movies",,,,,,, 3,"Female",27,"rent","fuel","grocery","cloths",,,,,, 4,"Female",22,"rent","grocery","travel","movies","cloths",,,,, 5,"Female",22,"rent","online-shopping","utiliy",,,,,,,' d1 <- read.table(text=text1, sep=",", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
30 views
0 votes
1 answer

Clean a set of data using R

Try this: NCM <- c(5,1,3,2,4) Mbrand <- c(1,5,3,4,2) fac<-factor(Mbrand, levels ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
33 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
47 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,040 points
27 views
+1 vote
1 answer

How do I perform feature selection in a disease prediction data set?

Feature selection is based equally upon logic ...READ MORE

answered Aug 20, 2018 in Data Analytics by Anmol
• 3,620 points
47 views
+1 vote
1 answer

How good at SQL does a data scientist really need to be?

SQL is a standardized query language for requesting information ...READ MORE

answered Aug 9, 2018 in Data Analytics by Anmol
• 3,620 points
38 views