Trending questions in Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

Dec 14, 2020 in Apache Spark by Gitika
• 65,910 points
121,294 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
91,814 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 87,324 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
75,562 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,440 points
74,008 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
60,588 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
71,155 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
45,066 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
37,987 views
+1 vote
1 answer

Is there any efficient way of dealing null values during concat functionality of pyspark.sql version 2.3.4?

When you concatenate any string with a ...READ MORE

Nov 6, 2019 in Apache Spark by Rishi
37,782 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10, 2019 in Apache Spark by Tina
41,941 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 21, 2020 in Apache Spark by suvasish
21,033 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
19,645 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
42,293 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,390 points
41,761 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
14,872 views
0 votes
1 answer

1)Given sfpd RDD, to create a pair RDD consisting of tuples of the form (Category. 1) in scala ,which of the following is used?

Hi, @Ritu, When creating a pair RDD from ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
5,534 views
0 votes
0 answers
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
25,705 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,440 points
17,160 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
3,530 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,440 points
3,340 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

Option c) Mapr Jobs that are submitted READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
3,164 views
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

Hi@ritu, You need to learn the Architecture of ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
3,111 views
0 votes
0 answers

What allows spark to periodically persist data about an application such that it can recover from failures? [closed]

What allows spark to periodically persist data ...READ MORE

Nov 26, 2020 in Apache Spark by ritu
• 960 points

closed Nov 26, 2020 by MD 2,440 views
0 votes
1 answer
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@ritu, Spark's internal scheduler may truncate the lineage of the RDD graph if ...READ MORE

Nov 25, 2020 in Apache Spark by akhtar
• 38,230 points
2,248 views
0 votes
1 answer

4)Spark streaming converts streaming data into DStreams. which one of the given statements about DStreams is True?

Hi@ritu, Spark DStream (Discretized Stream) is the basic ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
2,294 views
0 votes
1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,440 points
3,899 views
0 votes
1 answer

6)What allows spark streaming to provide fault tolerance for network sources of data?

Hi@ritu, Fault tolerance is the property that enables ...READ MORE

Dec 1, 2020 in Apache Spark by MD
• 95,440 points
2,055 views
0 votes
1 answer

Spark Core How to fetch max n rows of an RDD function without using Rdd.max()

Hi@Prasant, If Spark Streaming is not supporting tuple, ...READ MORE

Dec 3, 2020 in Apache Spark by MD
• 95,440 points
1,768 views
0 votes
1 answer

What will be printed when the below code is executed?

Option D)  runtime error READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,910 points
2,018 views
0 votes
1 answer

which one of the following commands is used to see the structure of the Dataframe?

Hi @Ritu If you want to see the ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
1,899 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
2,740 views
0 votes
0 answers
0 votes
1 answer

What does the following code print?

error: expected class or object definition sc.parallelize (Array(1L, ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
1,770 views
0 votes
1 answer

How do you load this multiline data in spark as a single record?

Hi@Ruben, I think you can add an escape ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
1,781 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,440 points
2,759 views
0 votes
1 answer

16)What allows spark to periodically persist data about an application such that it can recover from failures?

Hi@Edureka, Checkpointing is a process of truncating RDD ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,440 points
1,618 views
0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
1,550 views
0 votes
1 answer

Which one of the following commands is used to start python-spark?

Hi@ritu, To start your python spark shell, you ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,440 points
1,127 views
0 votes
0 answers

17)from the given choices, identify the value returned by $"whatever"?

17)from the given choices, identify the value ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points
1,221 views
0 votes
1 answer

In AWS, if user wants to run spark, then on top of which one of the following can the user do it?

Hi@ritu, AWS has lots of services. For spark ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,440 points
1,109 views
0 votes
1 answer

What will be printed when the below code is executed ?

Option a) List(5,100,10) The take method returns the first n elements in an ...READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,910 points
1,076 views
0 votes
1 answer

What does the below code print?

Option d) Run time error. READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
957 views
0 votes
1 answer

From the below code. what is the most appropriate next step in ML process?

Hi@ritu, The most appropriate step according to me ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,440 points
882 views
0 votes
0 answers

What does the below code print? [closed]

What does the below code print? val AgeDs ...READ MORE

Nov 25, 2020 in Apache Spark by ritu
• 960 points

closed Nov 25, 2020 by Gitika 913 views
0 votes
1 answer

13)Refer the input and identify the output if the below code is run

Option c)  Run time error - A READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
807 views