How to append data to a parquet file?

+1 vote

I am trying to append some data to my parquet file and for that, I'm using the following code:

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);
parquetWriter.write(record);

But this creates a new file, it does not append the file. What should I do to append the file?

Jan 11, 2019 in Big Data Hadoop by slayer
• 29,270 points
2,417 views

1 answer to this question.

+1 vote

Try using Spark API to append the file. Refer to the following code:

df.write.mode('append').parquet('parquet_data_file')

answered Jan 11, 2019 by Omkar
• 69,000 points
How to achieve this using java's ParquetWriter API?
It creates second parquet file, it does not append data to the existing one

Hi,

It will append the data. Are you saying that it creates a new partitions?

Follow the bellow example it will give you some idea.

 val data = Seq(("James ","","Smith","36636","M",3000),
     |       ("Michael ","Rose","","40288","M",4000),
     |       ("Robert ","","Williams","42114","M",4000),
     |       ("Maria ","Anne","Jones","39192","F",4000),
     |       ("Jen","Mary","Brown","","F",-1)
     |     );

val columns= Seq("firstname","middlename","lastname","dob","gender","salary");
import spark.sqlContext.implicits._

val df = data.toDF(columns:_*)

df.write.parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

df.write.mode("append").parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

Related Questions In Big Data Hadoop

0 votes
1 answer

How to import data in sqoop as a Parquet file?

Sqoop allows you to import the file ...READ MORE

answered May 15, 2019 in Big Data Hadoop by Nanda
1,999 views
0 votes
1 answer

How can I append data to an existing file in HDFS?

You have to do some configurations as ...READ MORE

answered Jul 25, 2019 in Big Data Hadoop by ravikiran
• 4,600 points
1,372 views
0 votes
1 answer
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
8,394 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
4,540 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,870 points
643 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
25,670 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,310 points
1,707 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28, 2019 in Big Data Hadoop by Omkar
• 69,000 points
7,150 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 69,000 points
973 views