Create dataframe for Avro file

0 votes
I have an Avro file. I want to do some operations on it. I was wondering if it is possible to work with Avro files using dataframe?
Jul 22, 2019 in Apache Spark by Ritu
781 views

1 answer to this question.

0 votes

Yes, we can work with Avro files using dataframe. The easiest way to work with Avro data files in Spark applications is by using the DataFrame API. The spark-avro library includes Avro methods in SQLContext for reading and writing Avro files:

Scala Example with Function

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)

// The Avro records are converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("input_dir")
df.filter("age > 5").write.avro("output_dir")

You can also specify "com.databricks.spark.avro" in the format method:

Scala Example with Format

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("input_dir")
df.filter("age > 5").write.format("com.databricks.spark.avro").save("output_dir")
answered Jul 22, 2019 by Rishi

Related Questions In Apache Spark

0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 3, 2019 in Apache Spark by Gitika
• 33,930 points
319 views
0 votes
0 answers

How to create RDD as string file?

Can anyone suggest how to create RDD ...READ MORE

Jul 4, 2019 in Apache Spark by anand
125 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

answered Aug 26, 2019 in Apache Spark by Karan
726 views
+1 vote
2 answers
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
5,523 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
815 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyF ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
34,259 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

 Refer to the below command used: val df ...READ MORE

answered Jul 5, 2019 in Apache Spark by karan
353 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 54,315 views