Parquet Avro File hadling with HIVE

Question

How to deal with Parquet & Avro File Format in HIVE?
Do we need to add or download dependencies for the same, if Yes what are the steps?

score 0 · Answer 1 · Jul 25, 2019

For avro you can follow the format as shown below.

CREATE TABLE table_name

PARTITIONED BY (t string, y string, m string, d string, h string, hh string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'

STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'

OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

TBLPROPERTIES (

'avro.schema.url'='hdfs://location/schema/schema.avsc')

location "hdfs:///location/data;

and load the data like


/location/data/y=2016/m=02/d=03/h=03/hh=12/data.avro 

/location/data/y=2016/m=02/d=03/h=03/hh=13/data2.avro

in that way you will be able to load the data with the following statement. All the jars should be already present in hive if not then avro-json-1.0-SNAPSHOT.jar needs to be added.

And the solution for parquet is to create dynamically a table from avro, and then create a new table of parquet format from the avro one.

there is the source code from Hive, which should help you.

CREATE TABLE avro_test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc');


CREATE EXTERNAL TABLE parquet_test LIKE avro_test STORED AS PARQUET LOCATION 'hdfs://myParquetFilesPath';