How to append data to a parquet file

+1 vote

I am trying to append some data to my parquet file and for that, I'm using the following code:

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);
parquetWriter.write(record);

But this creates a new file, it does not append the file. What should I do to append the file?

Jan 11, 2019 in Big Data Hadoop by slayer
• 29,350 points
15,812 views

1 answer to this question.

+1 vote

Try using Spark API to append the file. Refer to the following code:

df.write.mode('append').parquet('parquet_data_file')

answered Jan 11, 2019 by Omkar
• 69,210 points
How to achieve this using java's ParquetWriter API?
It creates second parquet file, it does not append data to the existing one

Hi,

It will append the data. Are you saying that it creates a new partitions?

Follow the bellow example it will give you some idea.

 val data = Seq(("James ","","Smith","36636","M",3000),
     |       ("Michael ","Rose","","40288","M",4000),
     |       ("Robert ","","Williams","42114","M",4000),
     |       ("Maria ","Anne","Jones","39192","F",4000),
     |       ("Jen","Mary","Brown","","F",-1)
     |     );

val columns= Seq("firstname","middlename","lastname","dob","gender","salary");
import spark.sqlContext.implicits._

val df = data.toDF(columns:_*)

df.write.parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

df.write.mode("append").parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

Related Questions In Big Data Hadoop

0 votes
1 answer

How to import data in sqoop as a Parquet file?

Sqoop allows you to import the file ...READ MORE

answered May 15, 2019 in Big Data Hadoop by Nanda
10,373 views
0 votes
1 answer

How can I append data to an existing file in HDFS?

You have to do some configurations as ...READ MORE

answered Jul 25, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
8,076 views
0 votes
1 answer
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
25,649 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
10,619 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,215 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
104,951 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
4,296 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28, 2019 in Big Data Hadoop by Omkar
• 69,210 points
18,087 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 69,210 points
2,648 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP