Create dataframe for Avro file

0 votes
I have an Avro file. I want to do some operations on it. I was wondering if it is possible to work with Avro files using dataframe?
Jul 22, 2019 in Apache Spark by Ritu
1,543 views

1 answer to this question.

0 votes

Yes, we can work with Avro files using dataframe. The easiest way to work with Avro data files in Spark applications is by using the DataFrame API. The spark-avro library includes Avro methods in SQLContext for reading and writing Avro files:

Scala Example with Function

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)

// The Avro records are converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("input_dir")
df.filter("age > 5").write.avro("output_dir")

You can also specify "com.databricks.spark.avro" in the format method:

Scala Example with Format

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("input_dir")
df.filter("age > 5").write.format("com.databricks.spark.avro").save("output_dir")
answered Jul 22, 2019 by Rishi

Related Questions In Apache Spark

+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

answered Nov 4, 2020 in Apache Spark by MD
• 95,320 points
1,198 views
0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 4, 2019 in Apache Spark by Gitika
• 65,970 points
791 views
0 votes
0 answers

How to create RDD as string file?

Can anyone suggest how to create RDD ...READ MORE

Jul 5, 2019 in Apache Spark by anand
312 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

answered Aug 26, 2019 in Apache Spark by Karan
1,485 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
8,034 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
1,366 views
+2 votes
11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points
66,825 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

 Refer to the below command used: val df ...READ MORE

answered Jul 5, 2019 in Apache Spark by karan
1,903 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 74,805 views