Trending questions in Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

Dec 14, 2020 in Apache Spark by Gitika
• 65,730 points
127,516 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
93,429 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 90,540 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
78,114 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,460 points
76,686 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
62,921 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
73,494 views
0 votes
0 answers

How to import pyspark in Jupyter Notebook

When I tried to import Pyspark I am getting ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited Mar 5 239 views
0 votes
0 answers

How to import pyspark in Jupyter

I tried to import pyspark in jupyter ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited Mar 5 208 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
46,336 views
0 votes
0 answers

How to read a nested avro file format in spark dataframe

The avro file format contains nested data. ...READ MORE

Nov 16, 2022 in Apache Spark by Devang

edited Mar 4 221 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
39,255 views
0 votes
3 answers
+1 vote
1 answer

Is there any efficient way of dealing null values during concat functionality of pyspark.sql version 2.3.4?

When you concatenate any string with a ...READ MORE

Nov 6, 2019 in Apache Spark by Rishi
41,082 views
0 votes
0 answers

How can i implement corss apply function of TSQL in pyspark

How can i implement corss apply function ...READ MORE

May 30, 2022 in Apache Spark by anonymous

edited Mar 4 262 views
0 votes
0 answers

Pyspark: Aggregate and filtering code error

Hi guys, I am a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited Mar 4 252 views
0 votes
0 answers

Pyspark: Finding top three countries with covid confirmed covid cases

Hi guys, I have a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited Mar 4 189 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10, 2019 in Apache Spark by Tina
43,604 views
0 votes
0 answers

Scala / SparkSQL dataframes filter issue "data type mismatch"

My probleme is i have a code ...READ MORE

Mar 24, 2022 in Apache Spark by Hamza

edited Mar 4 117 views
0 votes
0 answers

Access value in arrays of structs spark scala

Hi, I have a dataset with the ...READ MORE

Mar 24, 2022 in Apache Spark by anonymous

edited Mar 4 115 views
0 votes
0 answers

What should I pay attention to when installing smart curtains fabrics?

What should I pay attention to when ...READ MORE

Mar 23, 2022 in Apache Spark by qiansifang

edited Mar 4 115 views
0 votes
1 answer

What will be printed when the below code is executed?

Option a) 443 READ MORE

Mar 8, 2023 in Apache Spark by anonymous

edited Mar 5 2,772 views
0 votes
0 answers

The Batman Movie Online Free HD

dfgsdfg READ MORE

Mar 4, 2022 in Apache Spark by anonymous

edited Mar 4 92 views
0 votes
1 answer

What will be printed when the below code is executed ?

List 5 100 10 READ MORE

Feb 7, 2023 in Apache Spark by Subbu

edited Mar 5 1,819 views
0 votes
0 answers
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

C.  Data streams divided into batches > ...READ MORE

Jul 3, 2022 in Apache Spark by anonymous

edited Mar 5 4,189 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 21, 2020 in Apache Spark by suvasish
23,531 views
0 votes
0 answers

Execute Spark.sql query within withColumn clause is Spark Scala

I have a dataframe which has one ...READ MORE

Sep 14, 2021 in Apache Spark by Pinksrider

edited Mar 4 163 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
21,072 views
0 votes
0 answers

Aws logs are not writing in cloud watch after certain steps

i have an aws job which reads ...READ MORE

Jul 30, 2021 in Apache Spark by Anjali

edited Mar 4 140 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

Dec 27, 2022 in Apache Spark by Deepthi

edited Mar 5 4,000 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,350 points
44,040 views
0 votes
0 answers

Create Hive table using Dataframe getting error

Code: srcDF.write.mode(tblmode).saveAsTable(s"${dbName}.${tgtHiveTableName}") error: 21/06/04 22:11:45 ERROR pa.TrxNbrx: org.apache.spark.SparkException: ...READ MORE

Jun 5, 2021 in Apache Spark by Rajesh

edited Mar 4 130 views
0 votes
0 answers

OI JANA TESTE LIVE

OI JANA TESTE LIVE READ MORE

Jun 5, 2021 in Apache Spark by Eufrasia

edited Mar 4 118 views
0 votes
0 answers

what parameters are required for a "windowed" operation such as reduceByKeyAndWindow?

a) Window length b) sliding interval c) Window Length ...READ MORE

Jun 4, 2021 in Apache Spark by anonymous

edited Mar 4 142 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

.option("sep", delimeter) READ MORE

Oct 28, 2022 in Apache Spark by anonymous

edited Mar 5 3,858 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
43,768 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

The stages which are running slow READ MORE

Apr 29, 2021 in Apache Spark by anonymous

edited Mar 5 4,224 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
16,024 views
0 votes
1 answer

How to select all columns with group by?

Try  df.select(df("*")).groupby("id").agg(sum("salary")) READ MORE

Sep 17, 2021 in Apache Spark by Parimi Pavan

edited Mar 5 14,741 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,460 points
19,051 views
0 votes
0 answers

Real time Project challenges in Spark Data pipeline

Can anybody highlights some challenges they have ...READ MORE

Apr 6, 2021 in Apache Spark by anonymous

edited Mar 4 144 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
26,807 views
0 votes
0 answers
0 votes
1 answer

Why Partitions are immutable in Spark?

Partitions use HDFS API. READ MORE

Aug 25, 2022 in Apache Spark by anonymous

edited Mar 5 2,493 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points
4,851 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
4,484 views