Most answered questions in Apache Spark

+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 68,048 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
62,276 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
42,763 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,140 points
56,351 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
43,791 views
0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

Dec 13, 2020 in Apache Spark by Gitika
• 65,870 points
54,322 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,390 points
33,286 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
25,798 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,660 points
35,275 views
0 votes
3 answers

Can anyone explain fold() operation in Spark?

Fold in spark Fold is a very powerful ...READ MORE

Aug 22, 2018 in Apache Spark by samarth295
• 2,220 points
8,262 views
0 votes
3 answers

I don't understand the reason behind Spark RDD being immutable.

There are few reasons for keeping RDD ...READ MORE

Apr 18, 2019 in Apache Spark by santlal561987@gmail.com
6,489 views
0 votes
3 answers

Sorting rows in descending order in Spark SQL

df.orderBy($"col".desc) - this works as well READ MORE

Jul 5, 2020 in Apache Spark by Sai
• 160 points
14,317 views
+1 vote
3 answers

Which cluster type should I choose for Spark?

According to me, start with a standalone ...READ MORE

Jun 27, 2018 in Apache Spark by nitinrawat895
• 11,380 points
298 views
0 votes
3 answers

Lineage Graph in Spark

Whenever a series of transformations are performed ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,660 points
6,525 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3 ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,480 points
77,631 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

Dec 31, 2018 in Apache Spark by anonymous
14,853 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,870 points
210 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 20, 2020 in Apache Spark by suvasish
7,510 views
0 votes
2 answers

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

When using the Java substring() method, a ...READ MORE

Mar 13, 2020 in Apache Spark by evanbung
• 180 points
3,274 views
+1 vote
2 answers

sparkstream.textfilstreaming(localpathdirectory). I am getting empty results

Hey @c.kothamasu You should copy your file to ...READ MORE

Nov 7, 2019 in Apache Spark by Manas
169 views
+1 vote
2 answers

Spark: Can we add column to dataframe?

Yes we can add columns to the ...READ MORE

Oct 24, 2019 in Apache Spark by Siva
• 160 points
3,039 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
28,479 views
0 votes
2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
4,802 views
+1 vote
2 answers

What is sparkContext?

SparkContext sets up internal services and establishes ...READ MORE

Dec 5, 2019 in Apache Spark by anonymous
1,044 views
0 votes
2 answers

How to execute a function in apache-scala?

Function Definition : def test():Unit{ var a=10 var b=20 var c=a+b } calling ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy
191 views
+1 vote
2 answers

How do I get number of columns in each line from a delimited file??

Instead of spliting on '\n'. You should ...READ MORE

Aug 7, 2019 in Apache Spark by ashish
2,521 views
0 votes
2 answers

Which cluster type should I choose for Spark?

Spark is agnostic to the underlying cluster ...READ MORE

Aug 21, 2018 in Apache Spark by zombie
• 3,790 points
384 views
0 votes
2 answers

How to use RDD filter with other function?

val x = sc.parallelize(1 to 10, 2)   // ...READ MORE

Aug 16, 2018 in Apache Spark by zombie
• 3,790 points
5,524 views
+1 vote
2 answers

How can I convert Spark Dataframe to Spark RDD?

Assuming your RDD[row] is called rdd, you ...READ MORE

Jul 9, 2018 in Apache Spark by zombie
• 3,790 points
12,788 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,790 points
934 views
0 votes
2 answers

map() and flatmap()

map(): Return a new distributed dataset formed by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,790 points
316 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

Jul 4, 2019 in Apache Spark by Dhara dhruve
3,581 views
0 votes
2 answers

Difference between createOrReplaceTempView and registerTempTable

I am pretty sure createOrReplaceTempView just replaced ...READ MORE

Sep 18, 2020 in Apache Spark by Nathan Mott
7,811 views
+1 vote
2 answers

Apache Spark vs Apache Spark 2

Spark 2 doesn't differ much architecture-wise from ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
6,876 views
+1 vote
2 answers

Hadoop 3 compatibility with older versions of Hive, Pig, Sqoop and Spark

Hadoop 3 is not widely used in ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
3,838 views
0 votes
1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

Dec 3, 2020 in Apache Spark by MD
• 95,140 points
169 views
0 votes
1 answer

What will be printed when the below code is executed?

Option b) .List(0,3,5) The takeOrdered method returns the smallest n elements in a ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,870 points
138 views
0 votes
1 answer

What will be printed when the below code is executed?

Option D)  runtime error READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,870 points
188 views
0 votes
1 answer

What class is declared in the blow code?

Option D: String class READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,870 points
95 views
0 votes
1 answer

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

Hi@ritu, AWS has lots of services. For spark ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,140 points
108 views
0 votes
1 answer

Which one of the following commands is used to start python-spark?

Hi@ritu, To start your python spark shell, you ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,140 points
89 views
0 votes
1 answer

What will be printed when the below code is executed ?

Option a) List(5,100,10) The take method returns the first n elements in an ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,870 points
71 views
0 votes
1 answer

What is the output of the following code?

rror: expected class or object definition sc.parallelize(Array(1L,("SFO")),(2L,("ORD")),(3L,("DFW")))) ^ one error ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,870 points
78 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

Nov 25, 2020 in Apache Spark by akhtar
• 38,180 points
302 views
0 votes
1 answer

16)What allows spark to periodically persist data about an application such that it can recover from failures?

Hi@Edureka, Checkpointing is a process of truncating RDD ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,140 points
152 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,140 points
381 views
0 votes
1 answer

13)Refer the input and identify the output if the below code is run

Option c)  Run time error - A READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,870 points
74 views
0 votes
1 answer

What does the following code print?

error: expected class or object definition sc.parallelize (Array(1L, ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,870 points
168 views
0 votes
1 answer

From the following graph code ,which code snippet will return the no.of flight routes?

Hey, @Ritu, I am getting error in your ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,870 points
71 views