Need to load 40 GB data to elasticsearch using spark

I am working in psedo distributed spark cluster on system with 2 cores, 4 logical processor and 30 GB RAM. Data is in 80 csv file where each one is 500 mb. With default configuration, simple spark job is taking 2 hrs. Please advise the things to consider for performance improvement.

Jul 17, 2019 in Apache Spark by Amit
• 130 points • 1,803 views

Probably because your data is too large?

commented Dec 11, 2019 by Hashil

1 answer to this question.

Did you find any documents or example for this issue? I have the same situaiton and i try to find something for that. However i didnt find anything.

answered Nov 5, 2019 by Begum

Related Questions In Apache Spark

0 votes

1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

answered Sep 21, 2020 in Apache Spark by MD
• 95,460 points • 4,512 views

0 votes

1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

answered Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points • 7,251 views

0 votes

1 answer

How to authenticate Spark internal connections using a secret key?

You need to set the secret key ...READ MORE

answered Mar 13, 2019 in Apache Spark by Venu
• 2,973 views

0 votes

1 answer

How to get SQL configuration in Spark using Python?

You can get the configuration details through ...READ MORE

answered Mar 18, 2019 in Apache Spark by John
• 1,842 views

+1 vote

2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

answered Aug 7, 2019 in Apache Spark by ashish
• 6,870 views

+1 vote

1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 13,552 views

0 votes

1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 11,380 points • 4,449 views

+2 votes

11 answers

hadoop fs -put command?

Hi, You can create one directory in HDFS ...READ MORE

answered Mar 16, 2018 in Big Data Hadoop by nitinrawat895
• 11,380 points • 116,559 views

0 votes

1 answer

Need to disable unpersist in Spark

You can dynamically change this function by ...READ MORE

answered Mar 19, 2019 in Apache Spark by Jai
• 1,378 views

0 votes

1 answer

How to use ftp scheme using Yarn in Spark application?

In case Yarn does not support schemes ...READ MORE

answered Mar 28, 2019 in Apache Spark by Raj
• 1,643 views

Subscribe to our Newsletter, and get personalized recommendations.

REGISTER FOR FREE WEBINAR

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP