Scala: Convert text file data into ORC format using data frame

0 votes
Hi
Can u please help and provide an example how to convert text file data into ORC format and JSON format using data frame?
Aug 1 in Apache Spark by Rishi
15 views

1 answer to this question.

0 votes

Converting text file to Orc:

Using Spark, the Textfile is basically converted into a data frame and then stored in Orc format. Below is the scala program.

object OrcConv {

  def main(args : Array[String]){

    val conf = new SparkConf().setMaster("local").setAppName("OrcConv");

    val sc = new SparkContext(conf);

    val sqlContext = new HiveContext(sc);

     

    val file = sc.textFile("path");

    val schemaString = "name age";

    val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName,StringType,true)));

     

    val rowRDD=file.map(_.split(",")).map(p => Row(p(0), p(1)));

    val fileSchemaRDD = sqlContext.createDataFrame(rowRDD,schema);

    fileSchemaRDD.write.orc("path");

  }   

}

For converting text file to json, you will just have to make one change in the above program: you will have to write the below statement,

fileSchemaRDD.write.orc("path");

as below,

fileSchemaRDD.write.json("path");
answered Aug 1 by Esha

Related Questions In Apache Spark

0 votes
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

answered Jul 24 in Apache Spark by Suri
44 views
0 votes
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

answered Jul 25 in Apache Spark by Hari
18 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,300 points
30 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,290 points
995 views
0 votes
1 answer
0 votes
1 answer

How to read json file as dictionary in scala?

val df1= sqlContext.read.json("file:///home/edureka/Desktop/datsets/world_bank.json") // loads file, ...READ MORE

answered Jul 24 by Firoz
44 views
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
2,303 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,490 points
234 views
0 votes
1 answer

Spark comparing two big data files using scala

Try this and see if this does ...READ MORE

answered Apr 2 in Apache Spark by Omkar
• 67,290 points
179 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

answered Aug 2 in Apache Spark by Karan
15 views