Trending questions in Apache Spark

0 votes
5 answers

How to change the spark Session configuration in Pyspark?

You aren't actually overwriting anything with this ...READ MORE

Dec 14, 2020 in Apache Spark by Gitika
• 65,730 points
127,506 views
0 votes
3 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE

Jun 6, 2018 in Apache Spark by Shubham
• 13,490 points
93,424 views
+2 votes
14 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 5, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 90,526 views
+1 vote
6 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
78,097 views
+1 vote
8 answers

How to replace null values in Spark DataFrame?

Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE

Dec 15, 2020 in Apache Spark by MD
• 95,460 points
76,674 views
+1 vote
8 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
62,907 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
73,478 views
0 votes
0 answers

How to import pyspark in Jupyter Notebook

When I tried to import Pyspark I am getting ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited Mar 5 237 views
0 votes
0 answers

How to import pyspark in Jupyter

I tried to import pyspark in jupyter ...READ MORE

Apr 3, 2023 in Apache Spark by Navyasilpa

edited Mar 5 206 views
+1 vote
2 answers

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
46,330 views
0 votes
0 answers

How to read a nested avro file format in spark dataframe

The avro file format contains nested data. ...READ MORE

Nov 16, 2022 in Apache Spark by Devang

edited Mar 4 218 views
+1 vote
3 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
39,245 views
0 votes
3 answers
+1 vote
1 answer

Is there any efficient way of dealing null values during concat functionality of pyspark.sql version 2.3.4?

When you concatenate any string with a ...READ MORE

Nov 6, 2019 in Apache Spark by Rishi
41,073 views
0 votes
0 answers

How can i implement corss apply function of TSQL in pyspark

How can i implement corss apply function ...READ MORE

May 30, 2022 in Apache Spark by anonymous

edited Mar 4 260 views
0 votes
0 answers

Pyspark: Aggregate and filtering code error

Hi guys, I am a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited Mar 4 250 views
0 votes
0 answers

Pyspark: Finding top three countries with covid confirmed covid cases

Hi guys, I have a beginner at pyspark ...READ MORE

Apr 22, 2022 in Apache Spark by Saadat

edited Mar 4 187 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10, 2019 in Apache Spark by Tina
43,602 views
0 votes
0 answers

Scala / SparkSQL dataframes filter issue "data type mismatch"

My probleme is i have a code ...READ MORE

Mar 24, 2022 in Apache Spark by Hamza

edited Mar 4 115 views
0 votes
0 answers

Access value in arrays of structs spark scala

Hi, I have a dataset with the ...READ MORE

Mar 24, 2022 in Apache Spark by anonymous

edited Mar 4 114 views
0 votes
0 answers

What should I pay attention to when installing smart curtains fabrics?

What should I pay attention to when ...READ MORE

Mar 23, 2022 in Apache Spark by qiansifang

edited Mar 4 112 views
0 votes
1 answer

What will be printed when the below code is executed?

Option a) 443 READ MORE

Mar 8, 2023 in Apache Spark by anonymous

edited Mar 5 2,765 views
0 votes
0 answers

The Batman Movie Online Free HD

dfgsdfg READ MORE

Mar 4, 2022 in Apache Spark by anonymous

edited Mar 4 90 views
0 votes
1 answer

What will be printed when the below code is executed ?

List 5 100 10 READ MORE

Feb 7, 2023 in Apache Spark by Subbu

edited Mar 5 1,814 views
0 votes
0 answers
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

C.  Data streams divided into batches > ...READ MORE

Jul 3, 2022 in Apache Spark by anonymous

edited Mar 5 4,184 views
+2 votes
2 answers

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Using findspark is expected to solve the ...READ MORE

Jun 21, 2020 in Apache Spark by suvasish
23,529 views
0 votes
0 answers

Execute Spark.sql query within withColumn clause is Spark Scala

I have a dataframe which has one ...READ MORE

Sep 14, 2021 in Apache Spark by Pinksrider

edited Mar 4 159 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
21,068 views
0 votes
0 answers

Aws logs are not writing in cloud watch after certain steps

i have an aws job which reads ...READ MORE

Jul 30, 2021 in Apache Spark by Anjali

edited Mar 4 139 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Using cash technique we can save intermediate ...READ MORE

Dec 27, 2022 in Apache Spark by Deepthi

edited Mar 5 3,995 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,350 points
44,028 views
0 votes
0 answers

Create Hive table using Dataframe getting error

Code: srcDF.write.mode(tblmode).saveAsTable(s"${dbName}.${tgtHiveTableName}") error: 21/06/04 22:11:45 ERROR pa.TrxNbrx: org.apache.spark.SparkException: ...READ MORE

Jun 5, 2021 in Apache Spark by Rajesh

edited Mar 4 126 views
0 votes
0 answers

OI JANA TESTE LIVE

OI JANA TESTE LIVE READ MORE

Jun 5, 2021 in Apache Spark by Eufrasia

edited Mar 4 117 views
0 votes
0 answers

what parameters are required for a "windowed" operation such as reduceByKeyAndWindow?

a) Window length b) sliding interval c) Window Length ...READ MORE

Jun 4, 2021 in Apache Spark by anonymous

edited Mar 4 141 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

.option("sep", delimeter) READ MORE

Oct 28, 2022 in Apache Spark by anonymous

edited Mar 5 3,852 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 28, 2018 in Apache Spark by shams
• 3,670 points
43,759 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

The stages which are running slow READ MORE

Apr 29, 2021 in Apache Spark by anonymous

edited Mar 5 4,219 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,460 points
16,019 views
0 votes
1 answer

How to select all columns with group by?

Try  df.select(df("*")).groupby("id").agg(sum("salary")) READ MORE

Sep 17, 2021 in Apache Spark by Parimi Pavan

edited Mar 5 14,735 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,460 points
19,044 views
0 votes
0 answers

Real time Project challenges in Spark Data pipeline

Can anybody highlights some challenges they have ...READ MORE

Apr 6, 2021 in Apache Spark by anonymous

edited Mar 4 141 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
26,805 views
0 votes
0 answers
0 votes
1 answer

Why Partitions are immutable in Spark?

Partitions use HDFS API. READ MORE

Aug 25, 2022 in Apache Spark by anonymous

edited Mar 5 2,487 views
0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,730 points
4,846 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,460 points
4,481 views