Spark: Saving file csv

0 votes

It would be great if you can suggest to me what I am doing wrong in the below code. I just want to save output in Ans3AppleStore.csv. I think it is last the last part of the code where I need some change.

def main(args: Array[String])
{
val conf = new SparkConf().setAppName('mod5sol')
val sc:SparkContext = new SparkContext(conf)
val sqlContext: SQLContext = new SQLContext(sc)
val df: DataFrame = sqlContext.read.format('csv').option('header', 'true').load('AppleStore.csv')
df.registerTempTable('Apple5')
var dfsize=sqlContext.sql('select size_bytes Size, (size_bytes/1024) In_MB, ((size_bytes/1024)/1024) In_GB from Apple5').write.format('csv').save('Ans3Apple
Store.csv')}
May 22 in Apache Spark by Jai
57 views

1 answer to this question.

0 votes

 If you need a single output file (still in a folder) you can repartition (preferred if upstream data is large, but requires a shuffle):

df
      .repartition(1)
      .write.format("com.databricks.spark.csv")
      .option("header", "true")
      .save("mydata.csv")
      

or coalesce:

df
      .coalesce(1)
      .write.format("com.databricks.spark.csv")
      .option("header", "true")
      .save("mydata.csv")

All data will be written to mydata.csv/part-00000.

answered May 22 by Rishi

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,345 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
62 views
0 votes
1 answer

Spark cannot access local file anymore?

By default it will access the HDFS. ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
44 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 13,300 points
1,410 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
3,097 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,690 points
349 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
15,204 views
0 votes
1 answer

Scala: CSV file to Save data into HBase

Check the reference code mentioned below: def main(args: ...READ MORE

answered Jul 25 in Apache Spark by Hari
71 views
0 votes
1 answer

How to read a data from text file in Spark?

Hey, You can try this: from pyspark import SparkContext SparkContext.stop(sc) sc ...READ MORE

answered Aug 6 in Apache Spark by Gitika
• 25,340 points
172 views