Spark:error:throws stack overflow when union a lot.

0 votes

Getting this error throws stackoverflow when union a lot of RDD. When I use "++" to combine a lot of RDDs, I got error stack over flow error.

This the the way I generated:

val collection = (for (
  path <- files
) yield sc.textFile(path)).reduce(_ union _)

Can anyone say how to resolve this?

Jul 31 in Apache Spark by Sunny
21 views

1 answer to this question.

0 votes

Hey,

Use SparkContext.union(...) instead to union many RDDs at once

You don't want to do it one at a time like that since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. This will ensure not getting a stack overflow error.

Since RDD.union() creates a new step in the lineage (an extra set of stack frames on any computation) for each RDD, whereas SparkContext.union() makes it all at once. 

answered Jul 31 by Gitika
• 25,300 points

Related Questions In Apache Spark

0 votes
1 answer
0 votes
1 answer

Error : split value is not a member of org.apache.spark.sql.Row

spark.read.csv is used when loading into a ...READ MORE

answered Jul 10 in Apache Spark by Rishi
83 views
0 votes
1 answer

Error : split value is not a member of org.apache.spark.sql.Row

spark.read.csv is used when loading into a ...READ MORE

answered Jul 22 in Apache Spark by Firoz
101 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
2,386 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,510 points
243 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
12,164 views
0 votes
1 answer

Error: value textfile is not a member of org.apache.spark.SparkContext

Hi, Regarding this error, you just need to change ...READ MORE

answered Jul 4 in Apache Spark by Gitika
• 25,300 points
67 views
0 votes
1 answer

What is a Parquet file in Spark?

Hey, Parquet is a columnar format file supported ...READ MORE

answered Jul 2 in Apache Spark by Gitika
• 25,300 points
26 views