How to find in incorrect file/records in hive?

0 votes
Suppose 1000 records are present in one Json file and saving all records in HIVE Table. In that records one record is incorrect, how to find that error record?
Jul 25 in Big Data Hadoop by Robby
36 views

1 answer to this question.

0 votes

A value with a wrong datatype causes the generated MR job to crash. ignore.malformed.json does not seem to fix it.

Here is the sample data, mixed2.json

{"f1":"hello", "f2":7}

{"f1":"goodbye", "f2":8}

{"f1":"this", "f2":9}

{"f1":"that", "f2":"ten"}

Here is the sample Hive script, mixed2.hive. The first query (on f1) works. The other queries (on * and f2) crash. It would be nice to see NULL or something else. The get_json_object() function actually returns the bad string, so it prints "ten"!

drop table mixed2;

create table mixed2 (f1 string, f2 int)

row format serde 'org.openx.data.jsonserde.JsonSerDe'

with serdeproperties ("ignore.malformed.json" = "true")

stored as textfile;


load data inpath '/tmp/mixed2.json' overwrite into table mixed2;


select f1 from mixed2;

select f2 from mixed2;

select * from mixed2;

You should declare then the column as "String" instead of int. The SerDe will be able to read the numbers into strings, then you can CAST them in hive.

Abnormalities upto some extent can be taken care of but if the schema entirely changes then we can't load data at all.

answered Jul 25 by Ritu

Related Questions In Big Data Hadoop

0 votes
1 answer

How to find the default database in Hive?

Yes, you can find out which database ...READ MORE

answered May 20 in Big Data Hadoop by Shiro
38 views
0 votes
1 answer

How Impala is fast compared to Hive in terms of query response?

Impala provides faster response as it uses MPP(massively ...READ MORE

answered Mar 21, 2018 in Big Data Hadoop by nitinrawat895
• 10,730 points
353 views
0 votes
1 answer

How to upload file to HDFS in Ubuntu

you can use  hadoop fs -copyFromLocal  "/home/ritwik ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
154 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
3,363 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
404 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
16,677 views
0 votes
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,260 points
1,217 views
0 votes
1 answer

How to create a Hive table from sequence file stored in HDFS?

There are two SerDe for SequenceFile as ...READ MORE

answered Dec 17, 2018 in Big Data Hadoop by Omkar
• 67,660 points
614 views
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
6,092 views