How does data cleaning play a vital role in data analysis

0 votes
I want to know how data cleaning plays a vital role in analysis?
Jul 24, 2018 in Data Analytics by DataKing99
• 8,240 points
4,965 views

2 answers to this question.

–1 vote

Data cleaning can help in analysis because:

  • Cleaning data from multiple sources helps to transform it into a format that data analysts or data scientists can work with.
  • Data Cleaning helps to increase the accuracy of the model in machine learning.
  • It is a cumbersome process because as the number of data sources increases, the time taken to clean the data increases exponentially due to the number of sources and the volume of data generated by these sources.
  • It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.

Secure Your Future in Data Analysis - Enroll Today at Our Unbeatable Data Analyst Course Fee!

answered Jul 24, 2018 by CodingByHeart77
• 3,740 points
0 votes

Data is the core you do your analysis upon, unclean data generate un-accurate and ambiguous results from the analysis

Let's take an example to understand this further:

You want to categorize similarly color clothes together from the following table:

S no. Item name Color Size
1 T-shirt bleu S
2 T-shirt Blue M
3 Jeans Black L
4 T-shirt Blue XL
5 Jeans Black L

 

As the data is unclean the analysis will show 3 color categories instead of two, hence data cleaning is a vital process of data analysis.
answered Jul 24, 2018 by Abhi
• 3,720 points

Related Questions In Data Analytics

+2 votes
1 answer

Why data cleaning plays a vital role in the analysis?

Data cleaning is the fourth step in ...READ MORE

answered Nov 22, 2019 in Data Analytics by Keshav
1,213 views
0 votes
1 answer
+1 vote
2 answers

How to sort a data frame by columns in R?

You can use dplyr function arrange() like ...READ MORE

answered Aug 21, 2019 in Data Analytics by anonymous
• 33,030 points
1,448 views
0 votes
1 answer

How to convert a list to data frame in R?

Let's assume your list of lists is ...READ MORE

answered Apr 12, 2018 in Data Analytics by nirvana
• 3,130 points

edited Apr 12, 2018 by nirvana 21,845 views
0 votes
1 answer

Cleaning data using R

Try something like this: text1='"id","gender","age","category1","category2","category3","category4","category5","category6","category7","category8","category9","category10" 1,"Male",22,"movies","music","travel","cloths","grocery",,,,, 2,"Male",28,"travel","books","movies",,,,,,, 3,"Female",27,"rent","fuel","grocery","cloths",,,,,, 4,"Female",22,"rent","grocery","travel","movies","cloths",,,,, 5,"Female",22,"rent","online-shopping","utiliy",,,,,,,' d1 <- read.table(text=text1, sep=",", ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
372 views
0 votes
1 answer

Clean a set of data using R

Try this: NCM <- c(5,1,3,2,4) Mbrand <- c(1,5,3,4,2) fac<-factor(Mbrand, levels ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
410 views
0 votes
1 answer

Replace comma with a period in data cleaning using R

You can use the scan function in ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
3,325 views
0 votes
1 answer

Look for certain values from not cleaned data

First see what rows meet t$ps04==1 & t$rectyp==1. ...READ MORE

answered Nov 13, 2018 in Data Analytics by Maverick
• 10,840 points
412 views
+1 vote
1 answer

How do I perform feature selection in a disease prediction data set?

Feature selection is based equally upon logic ...READ MORE

answered Aug 20, 2018 in Data Analytics by Abhi
• 3,720 points
647 views
+1 vote
1 answer

How good at SQL does a data scientist really need to be?

SQL is a standardized query language for requesting information ...READ MORE

answered Aug 9, 2018 in Data Analytics by Abhi
• 3,720 points
430 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP