Trending questions in Apache Spark

0 votes
1 answer

Spark shuffle service port number

The default port that shuffle service runs ...READ MORE

Mar 1, 2019 in Apache Spark by Omkar
• 69,210 points
632 views
0 votes
1 answer

Why is Spark map output compressed?

Spark thinks that it is a good ...READ MORE

Feb 24, 2019 in Apache Spark by Wasim
873 views
0 votes
1 answer

Invalid syntax in spark

There's a problem with your syntax. There ...READ MORE

Jan 31, 2019 in Apache Spark by Omkar
• 69,210 points
1,837 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

Feb 13, 2019 in Apache Spark by Omkar
• 69,210 points
1,147 views
0 votes
1 answer

Companion objects in Scala

When a singleton object is named the ...READ MORE

Feb 24, 2019 in Apache Spark by Uma
624 views
0 votes
1 answer

Unresolved dependency issue on sbt package command

Check if you are able to access ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,210 points
2,438 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 17, 2018 in Apache Spark by zombie
• 3,790 points
9,234 views
+1 vote
1 answer

Spark interview

Preparing for an interview? We have something ...READ MORE

Feb 7, 2019 in Apache Spark by Edureka
• 2,960 points
605 views
0 votes
1 answer

Error using double map.

You have forgotten to mention the case ...READ MORE

Feb 11, 2019 in Apache Spark by Omkar
• 69,210 points
434 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points
13,272 views
0 votes
1 answer

Query regarding a spark split logic

First, import the data in Spark and ...READ MORE

Feb 9, 2019 in Apache Spark by Omkar
• 69,210 points
382 views
0 votes
1 answer

Error while using Spark SQL filter API

You have to use "===" instead of ...READ MORE

Feb 4, 2019 in Apache Spark by Omkar
• 69,210 points
559 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,210 points
1,407 views
0 votes
1 answer

Languages supported by Apache Spark?

Apache Spark supports the following four languages:  Scala, ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,527 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,087 views
–1 vote
1 answer

Deciding number of spark context objects

How many spark context objects you should ...READ MORE

Jan 16, 2019 in Apache Spark by Omkar
• 69,210 points
489 views
0 votes
1 answer

Spark and Scale Auxiliary constructor doubt

println("Slayer") is an anonymous block and gets ...READ MORE

Jan 8, 2019 in Apache Spark by Omkar
• 69,210 points
530 views
0 votes
1 answer

Is there an API for implementing graphs in Spark?

GraphX is the Spark API for graphs and ...READ MORE

Jan 5, 2019 in Apache Spark by Frankie
• 9,830 points
499 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,354 views
0 votes
1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
2,254 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
2,472 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
8,599 views
0 votes
1 answer

Is 'sparkline' a method?

I suggest you to check 2 things That jquery.sparkline.js is actually ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
1,011 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
7,959 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
2,653 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,672 views
0 votes
1 answer

What are the levels of parallelism in spark streaming ?

> In order to reduce the processing ...READ MORE

Jul 27, 2018 in Apache Spark by zombie
• 3,790 points
4,472 views
0 votes
1 answer

Internal work of Spark

Spark revolves around the concept of a ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 11,380 points
756 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,941 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,726 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,051 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,480 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,533 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
906 views
0 votes
1 answer

What happens to RDD when one of the nodes goes down?

Whenever a node goes down, Spark knows ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,600 views
0 votes
1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,123 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
7,334 views
0 votes
1 answer

Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,351 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points
6,690 views
0 votes
1 answer

Functions of Spark SQL?

Spark SQL is capable of: Loading data from ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,245 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points
1,691 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,858 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,808 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,113 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,389 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,301 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,237 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,716 views
0 votes
1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points
1,221 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,061 views