Need to load 40 GB data to elasticsearch using spark

+1 vote
I am working in psedo distributed spark cluster on system with 2 cores, 4 logical processor and 30 GB RAM. Data is in 80 csv file where each one is 500 mb. With default configuration, simple spark job is taking 2 hrs. Please advise the things to consider for performance improvement.
Jul 16, 2019 in Apache Spark by Amit
• 130 points
202 views
Probably because your data is too large?

1 answer to this question.

–1 vote
Did you find any documents or example for this issue? I have the same situaiton and i try to find something for that. However i didnt find anything.
answered Nov 5, 2019 by Begum

Related Questions In Apache Spark

0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 10,840 points
1,399 views
0 votes
1 answer

How to authenticate Spark internal connections using a secret key?

You need to set the secret key ...READ MORE

answered Mar 13, 2019 in Apache Spark by Venu
134 views
0 votes
1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
112 views
0 votes
1 answer

Using R to display configuration of Spark SQL

Try the below-mentioned code. sparkR.session() properties <- sql("SET -v") showDF(properties, ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
27 views
+1 vote
1 answer
+1 vote
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
3,859 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,840 points
527 views
+1 vote
11 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
20,412 views
0 votes
1 answer

Need to disable unpersist in Spark

You can dynamically change this function by ...READ MORE

answered Mar 19, 2019 in Apache Spark by Jai
90 views
0 votes
1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
169 views