Most answered questions in Apache Spark

0 votes
1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,128 views
0 votes
1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points
1,231 views
0 votes
1 answer

What are the levels of parallelism in spark streaming ?

> In order to reduce the processing ...READ MORE

Jul 27, 2018 in Apache Spark by zombie
• 3,790 points
4,478 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,306 views
0 votes
1 answer

PySpark Config ?

Mainly, we use SparkConf because we need ...READ MORE

Jul 26, 2018 in Apache Spark by kurt_cobain
• 9,390 points
639 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,120 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,361 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,948 views
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
1,853 views
0 votes
1 answer

Spark streaming with Kafka dependency error

Your error is with the version of ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
1,153 views
+1 vote
1 answer

map vs mapValues in Spark

There is a difference between the two: mapValues ...READ MORE

Jun 29, 2018 in Apache Spark by nitinrawat895
• 11,380 points
15,406 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
756 views
0 votes
1 answer

Spark Driver roles

A Spark driver (aka an application’s driver ...READ MORE

Jun 21, 2018 in Apache Spark by Ashish
• 2,650 points
792 views
0 votes
1 answer

Spark standalone client mode

spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE

Jun 20, 2018 in Apache Spark by Ashish
• 2,650 points
611 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,866 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,170 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,201 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,819 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,741 views
0 votes
1 answer

Is it mandatory to start Hadoop to run spark application?

No, it is not mandatory, but there ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
694 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,544 views
0 votes
1 answer

What is Shark?

Shark is a tool, developed for people ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
760 views
+1 vote
1 answer

Kafka Feature

Here are some of the important features of ...READ MORE

Jun 7, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,600 views
0 votes
1 answer

SQLInterpreter in Spark

SQL Interpreter & Optimizer handles the functional ...READ MORE

Jun 7, 2018 in Apache Spark by kurt_cobain
• 9,390 points
495 views
0 votes
1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
21,816 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
844 views
0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,390 points
716 views
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
852 views
0 votes
1 answer

Hadoop mandatory for Spark?

No not mandatory, but there is no ...READ MORE

Jun 1, 2018 in Apache Spark by kurt_cobain
• 9,390 points
430 views
0 votes
1 answer

How does partitioning work in Spark?

By default a partition is created for ...READ MORE

May 31, 2018 in Apache Spark by nitinrawat895
• 11,380 points
972 views
0 votes
1 answer

How to import the dependencies of Spark MLlib into eclipse project?

I would recommend you create & build ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,490 points
1,815 views
0 votes
1 answer

Spark Machine Learning pipeline works fine in Spark 1.6, but it gives error when executed on Spark 2.x?

You need to change the following: val pipeline ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,490 points
797 views
0 votes
1 answer

What is Spark Piping?

Spark provides a pipe() method on RDDs. ...READ MORE

May 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,017 views
0 votes
1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

May 31, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,921 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,720 views
0 votes
1 answer

Is there any way to uncache RDD?

RDD can be uncached using unpersist() So. use ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,497 views
+1 vote
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
7,969 views
0 votes
1 answer

What are the parameters in local[a,b,c] explains?

SparkContext.createTaskScheduler property parses the master parameter Local: 1 ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
526 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,490 points
13,058 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

May 29, 2018 in Apache Spark by Data_Nerd
• 2,390 points
584 views
0 votes
1 answer

What is Sliding Window?

Sliding Window controls transmission of data packets ...READ MORE

May 28, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,390 views
0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

May 28, 2018 in Apache Spark by kurt_cobain
• 9,390 points
652 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,394 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
7,681 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
3,174 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,607 views
+1 vote
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,434 views
0 votes
1 answer

How to set keys & access tokens for Twitter Spark streaming?

Either you have to create a Twitter4j.properties ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
1,450 views
0 votes
1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
13,327 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 11,380 points
33,851 views