Create dataframe for Avro file

0 votes
I have an Avro file. I want to do some operations on it. I was wondering if it is possible to work with Avro files using dataframe?
Jul 22, 2019 in Apache Spark by Ritu
2,723 views

1 answer to this question.

0 votes

Yes, we can work with Avro files using dataframe. The easiest way to work with Avro data files in Spark applications is by using the DataFrame API. The spark-avro library includes Avro methods in SQLContext for reading and writing Avro files:

Scala Example with Function

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)

// The Avro records are converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("input_dir")
df.filter("age > 5").write.avro("output_dir")

You can also specify "com.databricks.spark.avro" in the format method:

Scala Example with Format

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("input_dir")
df.filter("age > 5").write.format("com.databricks.spark.avro").save("output_dir")
answered Jul 22, 2019 by Rishi

Related Questions In Apache Spark

+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

answered Nov 4, 2020 in Apache Spark by MD
• 95,460 points
3,292 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,770 points
1,776 views
0 votes
0 answers

How to create RDD as string file?

Can anyone suggest how to create RDD ...READ MORE

Jul 5, 2019 in Apache Spark by anand
916 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

answered Aug 26, 2019 in Apache Spark by Karan
3,590 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,057 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,559 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
108,980 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

 Refer to the below command used: val df ...READ MORE

answered Jul 5, 2019 in Apache Spark by karan
3,470 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 88,819 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP