How to find in incorrect file/records in hive?

0 votes
Suppose 1000 records are present in one Json file and saving all records in HIVE Table. In that records one record is incorrect, how to find that error record?
Jul 25 in Big Data Hadoop by Robby
45 views

1 answer to this question.

0 votes

A value with a wrong datatype causes the generated MR job to crash. ignore.malformed.json does not seem to fix it.

Here is the sample data, mixed2.json

{"f1":"hello", "f2":7}

{"f1":"goodbye", "f2":8}

{"f1":"this", "f2":9}

{"f1":"that", "f2":"ten"}

Here is the sample Hive script, mixed2.hive. The first query (on f1) works. The other queries (on * and f2) crash. It would be nice to see NULL or something else. The get_json_object() function actually returns the bad string, so it prints "ten"!

drop table mixed2;

create table mixed2 (f1 string, f2 int)

row format serde 'org.openx.data.jsonserde.JsonSerDe'

with serdeproperties ("ignore.malformed.json" = "true")

stored as textfile;


load data inpath '/tmp/mixed2.json' overwrite into table mixed2;


select f1 from mixed2;

select f2 from mixed2;

select * from mixed2;

You should declare then the column as "String" instead of int. The SerDe will be able to read the numbers into strings, then you can CAST them in hive.

Abnormalities upto some extent can be taken care of but if the schema entirely changes then we can't load data at all.

answered Jul 25 by Ritu

Related Questions In Big Data Hadoop

0 votes
1 answer

How to find the default database in Hive?

Yes, you can find out which database ...READ MORE

answered May 20 in Big Data Hadoop by Shiro
48 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,800 points
397 views
0 votes
1 answer

How to upload file to HDFS in Ubuntu

you can use  hadoop fs -copyFromLocal  "/home/ritwik ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
180 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,800 points
3,576 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,800 points
454 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
18,432 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,280 points
1,333 views
0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 68,480 points
716 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
6,608 views