Create dataframe for Avro file

0 votes
I have an Avro file. I want to do some operations on it. I was wondering if it is possible to work with Avro files using dataframe?
Jul 22 in Apache Spark by Ritu
99 views

1 answer to this question.

0 votes

Yes, we can work with Avro files using dataframe. The easiest way to work with Avro data files in Spark applications is by using the DataFrame API. The spark-avro library includes Avro methods in SQLContext for reading and writing Avro files:

Scala Example with Function

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)

// The Avro records are converted to Spark types, filtered, and
// then written back out as Avro records
val df = sqlContext.read.avro("input_dir")
df.filter("age > 5").write.avro("output_dir")

You can also specify "com.databricks.spark.avro" in the format method:

Scala Example with Format

import com.databricks.spark.avro._

val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("com.databricks.spark.avro").load("input_dir")
df.filter("age > 5").write.format("com.databricks.spark.avro").save("output_dir")
answered Jul 22 by Rishi

Related Questions In Apache Spark

0 votes
1 answer

How to create RDD from an external file source in scala?

Hi, To create an RDD from external file ...READ MORE

answered Jul 3 in Apache Spark by Gitika
• 25,340 points
56 views
0 votes
0 answers

How to create RDD as string file?

Can anyone suggest how to create RDD ...READ MORE

Jul 4 in Apache Spark by anand
26 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

answered Aug 26 in Apache Spark by Karan
66 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

answered Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
4,623 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,029 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
339 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
14,959 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

 Refer to the below command used: val df ...READ MORE

answered Jul 5 in Apache Spark by karan
28 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

answered Apr 4 in Apache Spark by anonymous

edited Apr 5 by Omkar 23,945 views