Latest questions in Apache Spark

0 votes
1 answer

Does Spark provide the storage layer too?

No, it doesn’t provide storage layer but ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,354 views
0 votes
1 answer

Functions of Spark SQL?

Spark SQL is capable of: Loading data from ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,249 views
0 votes
1 answer

Languages supported by Apache Spark?

Apache Spark supports the following four languages:  Scala, ...READ MORE

Sep 3, 2018 in Apache Spark by nitinrawat895
• 11,380 points
6,537 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points
1,705 views
0 votes
1 answer

How to stop INFO messages displaying on Spark console?

Just do the following: Edit your conf/log4j.properties file ...READ MORE

Aug 21, 2018 in Apache Spark by nitinrawat895
• 11,380 points
2,127 views
0 votes
1 answer

What makes Spark faster than MapReduce?

Let's first look at mapper side differences Map ...READ MORE

Jul 27, 2018 in Apache Spark by Neha
• 6,300 points
1,228 views
0 votes
1 answer

What are the levels of parallelism in spark streaming ?

> In order to reduce the processing ...READ MORE

Jul 27, 2018 in Apache Spark by zombie
• 3,790 points
4,477 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
42,417 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
75,821 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

Aug 23, 2018 in Apache Spark by samarth295
• 2,220 points
11,988 views
0 votes
1 answer

How is RDD in Spark different from Distributed Storage Management? Can anyone help me with this ?

Some of the key differences between an RDD and ...READ MORE

Jul 26, 2018 in Apache Spark by zombie
• 3,790 points
1,304 views
0 votes
3 answers

I don't understand the reason behind Spark RDD being immutable.

There are few reasons for keeping RDD ...READ MORE

Apr 18, 2019 in Apache Spark by santlal561987@gmail.com
12,206 views
0 votes
1 answer

PySpark Config ?

Mainly, we use SparkConf because we need ...READ MORE

Jul 26, 2018 in Apache Spark by kurt_cobain
• 9,390 points
639 views
+1 vote
1 answer

getting null values in spark dataframe while reading data from hbase

Can you share the screenshots for the ...READ MORE

Jul 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,120 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
60,796 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 17, 2018 in Apache Spark by zombie
• 3,790 points
9,248 views
0 votes
1 answer

How to add third party java jars for use in PySpark?

You can add external jars as arguments ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points

edited Nov 19, 2021 by Sarfaraz 8,361 views
0 votes
1 answer

Difference between sparkContext, JavaSparkContext, SQLContext, & SparkSession?

Yes, there is a difference between the ...READ MORE

Jul 4, 2018 in Apache Spark by nitinrawat895
• 11,380 points
4,948 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
19,899 views
0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
1,853 views
0 votes
3 answers

Sorting rows in descending order in Spark SQL

df.orderBy($"col".desc) - this works as well READ MORE

Jul 5, 2020 in Apache Spark by Sai
• 160 points
16,347 views
0 votes
1 answer

Spark streaming with Kafka dependency error

Your error is with the version of ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,490 points
1,151 views
+1 vote
1 answer

map vs mapValues in Spark

There is a difference between the two: mapValues ...READ MORE

Jun 29, 2018 in Apache Spark by nitinrawat895
• 11,380 points
15,404 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,244 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 87,601 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

Jun 26, 2018 in Apache Spark by nitinrawat895
• 11,380 points
756 views
0 votes
1 answer

Spark Driver roles

A Spark driver (aka an application’s driver ...READ MORE

Jun 21, 2018 in Apache Spark by Ashish
• 2,650 points
792 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

Jul 4, 2018 in Apache Spark by zombie
• 3,790 points
1,899 views
0 votes
2 answers

map() and flatmap()

map(): Return a new distributed dataset formed by ...READ MORE

Jul 4, 2018 in Apache Spark by zombie
• 3,790 points
841 views
0 votes
1 answer

Spark standalone client mode

spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE

Jun 20, 2018 in Apache Spark by Ashish
• 2,650 points
611 views
0 votes
1 answer

Ways to create RDD in Apache Spark

There are two popular ways using which ...READ MORE

Jun 19, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,866 views
0 votes
3 answers

Lineage Graph in Spark

Whenever a series of transformations are performed ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
11,092 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,168 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
1,199 views
0 votes
1 answer

What do we mean by an RDD in Spark?

The full form of RDD is a ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
3,816 views
0 votes
1 answer

When running Spark on Yarn, do I need to install Spark on all nodes of Yarn Cluster?

No, it is not necessary to install ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
5,738 views
0 votes
1 answer

Is it mandatory to start Hadoop to run spark application?

No, it is not mandatory, but there ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
693 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
5,544 views
0 votes
1 answer

What is Shark?

Shark is a tool, developed for people ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
758 views
+1 vote
1 answer

Kafka Feature

Here are some of the important features of ...READ MORE

Jun 7, 2018 in Apache Spark by Data_Nerd
• 2,390 points
1,600 views
0 votes
1 answer

SQLInterpreter in Spark

SQL Interpreter & Optimizer handles the functional ...READ MORE

Jun 7, 2018 in Apache Spark by kurt_cobain
• 9,390 points
495 views
0 votes
1 answer

How to find the number of elements present in the array in a Spark DataFame column?

You can select the column and apply ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
21,815 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
92,007 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
844 views
0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,390 points
715 views
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
851 views
0 votes
1 answer

Hadoop mandatory for Spark?

No not mandatory, but there is no ...READ MORE

Jun 1, 2018 in Apache Spark by kurt_cobain
• 9,390 points
430 views
0 votes
1 answer

How does partitioning work in Spark?

By default a partition is created for ...READ MORE

May 31, 2018 in Apache Spark by nitinrawat895
• 11,380 points
970 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,440 points
74,249 views
0 votes
1 answer

How to import the dependencies of Spark MLlib into eclipse project?

I would recommend you create & build ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,490 points
1,815 views