Scala Convert text file data into ORC format using data frame

+1 vote
Hi
Can u please help and provide an example how to convert text file data into ORC format and JSON format using data frame?
Aug 1, 2019 in Apache Spark by Rishi
3,642 views

1 answer to this question.

+1 vote

Converting text file to Orc:

Using Spark, the Textfile is basically converted into a data frame and then stored in Orc format. Below is the scala program.

object OrcConv {

  def main(args : Array[String]){

    val conf = new SparkConf().setMaster("local").setAppName("OrcConv");

    val sc = new SparkContext(conf);

    val sqlContext = new HiveContext(sc);

     

    val file = sc.textFile("path");

    val schemaString = "name age";

    val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName,StringType,true)));

     

    val rowRDD=file.map(_.split(",")).map(p => Row(p(0), p(1)));

    val fileSchemaRDD = sqlContext.createDataFrame(rowRDD,schema);

    fileSchemaRDD.write.orc("path");

  }   

}

For converting text file to json, you will just have to make one change in the above program: you will have to write the below statement,

fileSchemaRDD.write.orc("path");

as below,

fileSchemaRDD.write.json("path");

Hope this helps!!

If you need to learn more about Scala, It's recommended to join Scala Certification course today.

Thank you!

answered Aug 1, 2019 by Esha
can you please provide mvn dependencies for this

Related Questions In Apache Spark

+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

answered Jul 24, 2019 in Apache Spark by Suri
26,457 views
+1 vote
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

answered Jul 25, 2019 in Apache Spark by Hari
1,518 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 65,770 points
5,045 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

answered Sep 21, 2020 in Apache Spark by MD
• 95,460 points
3,941 views
+1 vote
2 answers
+1 vote
1 answer

How to read json file as dictionary in scala?

val df1= sqlContext.read.json("file:///home/edureka/Desktop/datsets/world_bank.json") // loads file, ...READ MORE

answered Jul 24, 2019 by Firoz
3,580 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
11,078 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points
2,575 views
0 votes
1 answer

Spark comparing two big data files using scala

Try this and see if this does ...READ MORE

answered Apr 2, 2019 in Apache Spark by Omkar
• 69,220 points
7,243 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

answered Aug 2, 2019 in Apache Spark by Karan
1,771 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP