Spark: Saving file csv

0 votes

It would be great if you can suggest to me what I am doing wrong in the below code. I just want to save output in Ans3AppleStore.csv. I think it is last the last part of the code where I need some change.

def main(args: Array[String])
{
val conf = new SparkConf().setAppName('mod5sol')
val sc:SparkContext = new SparkContext(conf)
val sqlContext: SQLContext = new SQLContext(sc)
val df: DataFrame = sqlContext.read.format('csv').option('header', 'true').load('AppleStore.csv')
df.registerTempTable('Apple5')
var dfsize=sqlContext.sql('select size_bytes Size, (size_bytes/1024) In_MB, ((size_bytes/1024)/1024) In_GB from Apple5').write.format('csv').save('Ans3Apple
Store.csv')}
May 22 in Apache Spark by Jai
22 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

 If you need a single output file (still in a folder) you can repartition (preferred if upstream data is large, but requires a shuffle):

df
      .repartition(1)
      .write.format("com.databricks.spark.csv")
      .option("header", "true")
      .save("mydata.csv")
      

or coalesce:

df
      .coalesce(1)
      .write.format("com.databricks.spark.csv")
      .option("header", "true")
      .save("mydata.csv")

All data will be written to mydata.csv/part-00000.

answered May 22 by Rishi

Related Questions In Apache Spark

0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

answered Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,240 points
894 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,240 points
42 views
0 votes
1 answer

Spark cannot access local file anymore?

By default it will access the HDFS. ...READ MORE

answered May 3, 2018 in Apache Spark by kurt_cobain
• 9,240 points
31 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

answered May 29, 2018 in Apache Spark by Shubham
• 12,890 points
762 views
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,670 points
1,882 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,670 points
168 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,411 views
0 votes
1 answer

Starting Spark in Windows

Run below commands spark-class org.apache.spark.deploy.master.Master spark-class org.apache.spark.deploy.worker.Worker spark://192.168.254.1:7077 NOTE: The ...READ MORE

answered May 22 in Apache Spark by Reshma
19 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.