Scala: Convert text file data into ORC format using data frame

+1 vote
Hi
Can u please help and provide an example how to convert text file data into ORC format and JSON format using data frame?
Aug 1, 2019 in Apache Spark by Rishi
990 views

1 answer to this question.

+1 vote

Converting text file to Orc:

Using Spark, the Textfile is basically converted into a data frame and then stored in Orc format. Below is the scala program.

object OrcConv {

  def main(args : Array[String]){

    val conf = new SparkConf().setMaster("local").setAppName("OrcConv");

    val sc = new SparkContext(conf);

    val sqlContext = new HiveContext(sc);

     

    val file = sc.textFile("path");

    val schemaString = "name age";

    val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName,StringType,true)));

     

    val rowRDD=file.map(_.split(",")).map(p => Row(p(0), p(1)));

    val fileSchemaRDD = sqlContext.createDataFrame(rowRDD,schema);

    fileSchemaRDD.write.orc("path");

  }   

}

For converting text file to json, you will just have to make one change in the above program: you will have to write the below statement,

fileSchemaRDD.write.orc("path");

as below,

fileSchemaRDD.write.json("path");
answered Aug 1, 2019 by Esha
can you please provide mvn dependencies for this

Related Questions In Apache Spark

+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

answered Jul 24, 2019 in Apache Spark by Suri
13,376 views
+1 vote
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

answered Jul 25, 2019 in Apache Spark by Hari
471 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 33,450 points
2,170 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,380 points
3,601 views
+1 vote
2 answers
+1 vote
1 answer

How to read json file as dictionary in scala?

val df1= sqlContext.read.json("file:///home/edureka/Desktop/datsets/world_bank.json") // loads file, ...READ MORE

answered Jul 24, 2019 by Firoz
2,070 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
5,484 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,920 points
808 views
0 votes
1 answer

Spark comparing two big data files using scala

Try this and see if this does ...READ MORE

answered Apr 2, 2019 in Apache Spark by Omkar
• 69,040 points
2,816 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

answered Aug 2, 2019 in Apache Spark by Karan
295 views