Trending questions in Apache Spark

0 votes
1 answer

Spark event log location

Unless and until you have not changed ...READ MORE

Mar 6, 2019 in Apache Spark by Rohit
636 views
0 votes
1 answer

Changing port for Block Managers

By default, the port of which the ...READ MORE

Mar 10, 2019 in Apache Spark by Siri
478 views
0 votes
1 answer

Why is Spark map output compressed?

Spark thinks that it is a good ...READ MORE

Feb 24, 2019 in Apache Spark by Wasim
1,067 views
0 votes
1 answer

Companion objects in Scala

When a singleton object is named the ...READ MORE

Feb 24, 2019 in Apache Spark by Uma
1,002 views
0 votes
1 answer

where can i get spark-terasort.jar and not .scala file, to do spark terasort in windows.

Hi! I found 2 links on github where ...READ MORE

Feb 13, 2019 in Apache Spark by Omkar
• 69,220 points
1,320 views
0 votes
1 answer

Unresolved dependency issue on sbt package command

Check if you are able to access ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,220 points
2,903 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 17, 2018 in Apache Spark by zombie
• 3,790 points
9,629 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,650 points
13,709 views
0 votes
1 answer

Error using double map.

You have forgotten to mention the case ...READ MORE

Feb 11, 2019 in Apache Spark by Omkar
• 69,220 points
702 views
+1 vote
1 answer

Spark interview

Preparing for an interview? We have something ...READ MORE

Feb 7, 2019 in Apache Spark by Edureka
• 2,960 points
787 views
0 votes
1 answer

Error while using Spark SQL filter API

You have to use "===" instead of ...READ MORE

Feb 4, 2019 in Apache Spark by Omkar
• 69,220 points
761 views
0 votes
1 answer

Query regarding a spark split logic

First, import the data in Spark and ...READ MORE

Feb 9, 2019 in Apache Spark by Omkar
• 69,220 points
511 views
0 votes
1 answer

Languages supported by Apache Spark?

Apache Spark supports the following four languages:  Scala, ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,102 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

Jan 3, 2019 in Apache Spark by Omkar
• 69,220 points
1,726 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,425 views
–1 vote
1 answer

Deciding number of spark context objects

How many spark context objects you should ...READ MORE

Jan 16, 2019 in Apache Spark by Omkar
• 69,220 points
745 views
0 votes
1 answer

Spark and Scale Auxiliary constructor doubt

println("Slayer") is an anonymous block and gets ...READ MORE

Jan 8, 2019 in Apache Spark by Omkar
• 69,220 points
717 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,673 views
0 votes
1 answer

Is there an API for implementing graphs in Spark?

GraphX is the Spark API for graphs and ...READ MORE

Jan 5, 2019 in Apache Spark by Frankie
• 9,830 points
704 views
0 votes
1 answer

How to open/stream .zip files through Spark?

You can try and check this below ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
2,458 views
0 votes
1 answer

Filter, Option or FlatMap in spark

If, for option 2, you mean have ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
2,703 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
9,114 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
8,413 views
0 votes
1 answer

Is 'sparkline' a method?

I suggest you to check 2 things That jquery.sparkline.js is actually ...READ MORE

Nov 9, 2018 in Apache Spark by Frankie
• 9,830 points
1,233 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
3,072 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,955 views
0 votes
1 answer

What are the levels of parallelism in spark streaming ?

> In order to reduce the processing ...READ MORE

Jul 27, 2018 in Apache Spark by zombie
• 3,790 points
4,848 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,212 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,286 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
8,489 views
0 votes
1 answer

Internal work of Spark

Spark revolves around the concept of a ...READ MORE

Oct 11, 2018 in Apache Spark by nitinrawat895
• 11,380 points
907 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,864 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,989 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
7,804 views
0 votes
1 answer

What happens to RDD when one of the nodes goes down?

Whenever a node goes down, Spark knows ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,795 views
0 votes
1 answer

In what kind of use cases has Spark outperformed Hadoop in processing?

I can list some but there can ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
1,093 views
0 votes
1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,335 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,390 points
7,102 views
0 votes
1 answer

Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,522 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points
2,001 views
0 votes
1 answer

Functions of Spark SQL?

Spark SQL is capable of: Loading data from ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,480 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,038 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,026 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,296 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,572 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,541 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,494 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,508 views
0 votes
1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points
1,459 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,914 views