What are all the Data quality checks we do in our real time Big Data projects

Question

What are all the Data quality checks we do in our real time Bigdata projects.
Example1: How can we find out the count of records loaded in hdfs and source are same?
Example2: How can we know the loaded records in hdfs are proper?

Madhan · Answer 1 · Sep 4, 2019

You can use a checksum to compare the file in the source and the file uploaded on the hdfs.

Try this:

$ hdfs dfs -cat /file/in/hdfs | md5sum

$ hdfs dfs -cat /file/at/source | md5sum

If these two commands return the same value, then the file is not corrupted.

answered Sep 4, 2019 by Tina

Thanks for the help, but i am copying from mysql table to hdfs in that scenario, if one record corrupted then how can we know that?

commented Sep 5, 2019 by Madhan
• 130 points

I'm not a 100% sure but I think you can use crc32 checksum as follows:

For the mysql table, use the below command to get the checksum:

CHECKSUM TABLE <tablename>;

And in hdfs, use this:

hdfs dfs -Ddfs.checksum.combine.mode=COMPOSITE_CRC -checksum /path/to/file

commented Sep 5, 2019 by Yogi

What are all the Data quality checks we do in our real time Big Data projects

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

What is the use of Apache Kafka in a Big Data Cluster?

What are the extra files we need to run when we run a Hive action in Oozie?

what are the characteristics of big data?

what are the different features of big data analytics

How do I get number of columns in each line from a delimited file??

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

hadoop fs -put command?

What is the HDFS command to list all the files in HDFS according to the timestamp?

What are the different relational operations in “Pig Latin” you worked with?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES