Scala: Convert text file data into ORC format using data frame

+1 vote
Hi
Can u please help and provide an example how to convert text file data into ORC format and JSON format using data frame?
Aug 1, 2019 in Apache Spark by Rishi
1,228 views

1 answer to this question.

+1 vote

Converting text file to Orc:

Using Spark, the Textfile is basically converted into a data frame and then stored in Orc format. Below is the scala program.

object OrcConv {

  def main(args : Array[String]){

    val conf = new SparkConf().setMaster("local").setAppName("OrcConv");

    val sc = new SparkContext(conf);

    val sqlContext = new HiveContext(sc);

     

    val file = sc.textFile("path");

    val schemaString = "name age";

    val schema = StructType(schemaString.split(" ").map(fieldName => StructField(fieldName,StringType,true)));

     

    val rowRDD=file.map(_.split(",")).map(p => Row(p(0), p(1)));

    val fileSchemaRDD = sqlContext.createDataFrame(rowRDD,schema);

    fileSchemaRDD.write.orc("path");

  }   

}

For converting text file to json, you will just have to make one change in the above program: you will have to write the below statement,

fileSchemaRDD.write.orc("path");

as below,

fileSchemaRDD.write.json("path");
answered Aug 1, 2019 by Esha
can you please provide mvn dependencies for this

Related Questions In Apache Spark

+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

answered Jul 24, 2019 in Apache Spark by Suri
15,757 views
+1 vote
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

answered Jul 25, 2019 in Apache Spark by Hari
543 views
+1 vote
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6, 2019 in Apache Spark by Gitika
• 41,360 points
2,575 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

answered Sep 21 in Apache Spark by MD
• 65,200 points
93 views
+1 vote
2 answers
+1 vote
1 answer

How to read json file as dictionary in scala?

val df1= sqlContext.read.json("file:///home/edureka/Desktop/datsets/world_bank.json") // loads file, ...READ MORE

answered Jul 24, 2019 by Firoz
2,493 views
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
6,070 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,950 points
926 views
0 votes
1 answer

Spark comparing two big data files using scala

Try this and see if this does ...READ MORE

answered Apr 2, 2019 in Apache Spark by Omkar
• 69,030 points
3,248 views
0 votes
1 answer

Scala: save filtered data row by row using saveAsTextFile

Try this code, it worked for me: val ...READ MORE

answered Aug 2, 2019 in Apache Spark by Karan
420 views