Why do we use sc.parallelize?

0 votes

Could you please let me know when RDD is already distributed over nodes in a cluster and will be acted upon in parallel, what is the use of parallelize. Why do we use sc.parallelize?

Jul 11 in Apache Spark by Sumit
153 views

1 answer to this question.

0 votes

Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel.

Now this RDD creation can be done in two ways:

First, is to refer to an external dataset present in the hdfs or local i.e,

sc.textFile("/user/edureka_425640/patient_records.txt")

Second, is parallelizing an existing collection using sc.parallelize i.e., sc.parallelize API will help in loading user created data which is not mandatorily coming from a directory.

val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)

So, when we are using sc.parallelize, we are actually using it for RDD creation only.

answered Jul 11 by Suman

Related Questions In Apache Spark

0 votes
1 answer

How to use Scala anonymous functions and why do we use it?

Hi, Anonymous functions in Scala is the lightweight ...READ MORE

answered Jul 26 in Apache Spark by Gitika
• 25,340 points
64 views
0 votes
1 answer

Why do we need App in Scala?

Hey, The app is a helper class that ...READ MORE

answered Jul 24 in Apache Spark by Gitika
• 25,340 points
23 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

answered Jun 18, 2018 in Apache Spark by nitinrawat895
• 10,670 points
161 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

answered Jan 3 in Apache Spark by Omkar
• 67,480 points
136 views
0 votes
1 answer
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
2,736 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,670 points
289 views
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
13,556 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

answered Sep 19, 2018 in Apache Spark by zombie
• 3,690 points
65 views
0 votes
1 answer

Spark context (sc) not found

Maybe the hadoop service didn't start properly. Try ...READ MORE

answered Feb 13 in Apache Spark by John
48 views