How to append data to a parquet file

+1 vote

I am trying to append some data to my parquet file and for that, I'm using the following code:

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);
parquetWriter.write(record);

But this creates a new file, it does not append the file. What should I do to append the file?

Jan 11, 2019 in Big Data Hadoop by slayer
• 29,300 points
5,984 views

1 answer to this question.

+1 vote

Try using Spark API to append the file. Refer to the following code:

df.write.mode('append').parquet('parquet_data_file')

answered Jan 11, 2019 by Omkar
• 69,110 points
How to achieve this using java's ParquetWriter API?
It creates second parquet file, it does not append data to the existing one

Hi,

It will append the data. Are you saying that it creates a new partitions?

Follow the bellow example it will give you some idea.

 val data = Seq(("James ","","Smith","36636","M",3000),
     |       ("Michael ","Rose","","40288","M",4000),
     |       ("Robert ","","Williams","42114","M",4000),
     |       ("Maria ","Anne","Jones","39192","F",4000),
     |       ("Jen","Mary","Brown","","F",-1)
     |     );

val columns= Seq("firstname","middlename","lastname","dob","gender","salary");
import spark.sqlContext.implicits._

val df = data.toDF(columns:_*)

df.write.parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

df.write.mode("append").parquet("/user/people.parquet")

val parqDF = spark.read.parquet("/user/people.parquet")

parqDF.show()
+---------+----------+--------+-----+------+------+
|firstname|middlename|lastname|  dob|gender|salary|
+---------+----------+--------+-----+------+------+
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|  Robert |          |Williams|42114|     M|  4000|
|   Maria |      Anne|   Jones|39192|     F|  4000|
|      Jen|      Mary|   Brown|     |     F|    -1|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
|   James |          |   Smith|36636|     M|  3000|
| Michael |      Rose|        |40288|     M|  4000|
+---------+----------+--------+-----+------+------+

Related Questions In Big Data Hadoop

0 votes
1 answer

How to import data in sqoop as a Parquet file?

Sqoop allows you to import the file ...READ MORE

answered May 15, 2019 in Big Data Hadoop by Nanda
5,242 views
0 votes
1 answer

How can I append data to an existing file in HDFS?

You have to do some configurations as ...READ MORE

answered Jul 25, 2019 in Big Data Hadoop by ravikiran
• 4,620 points
4,134 views
0 votes
1 answer
0 votes
7 answers

How to run a jar file in hadoop?

I used this command to run my ...READ MORE

answered Dec 10, 2018 in Big Data Hadoop by Dasinto
14,598 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
7,090 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,139 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
51,278 views
–1 vote
1 answer

Hadoop dfs -ls command?

In your case there is no difference ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by kurt_cobain
• 9,390 points
2,683 views
0 votes
1 answer

How to create a parquet table in hive and store data in it from a hive table?

Please use the code attached below for ...READ MORE

answered Jan 28, 2019 in Big Data Hadoop by Omkar
• 69,110 points
13,135 views
0 votes
1 answer

Hadoop Hive Hbase: How to insert data into Hbase using Hive (JSON file)?

You can use the get_json_object function to parse the ...READ MORE

answered Nov 15, 2018 in Big Data Hadoop by Omkar
• 69,110 points
1,662 views