Trending questions in Apache Spark

0 votes
2 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3 ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,380 points
55,603 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
50,415 views
0 votes
11 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 48,842 views
0 votes
7 answers

How to replace null values in Spark DataFrame?

in spark 2.x you can directly use ...READ MORE

Mar 28 in Apache Spark by gaurav
40,813 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
30,670 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
24,466 views
0 votes
1 answer

How to unzip a folder to individual files in HDFS?

Hi, @Amey, You can go through this regarding ...READ MORE

May 26 in Apache Spark by Gitika
• 29,370 points
44 views
0 votes
1 answer

if i want to see my public key after running cat <path> command in gitbash but saying no such file or directory.

Hey, @KK, You can fix this issue may be ...READ MORE

May 26 in Apache Spark by Gitika
• 29,370 points
38 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

5 days ago in Apache Spark by MD
• 23,580 points
79 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
25,762 views
0 votes
1 answer

Where can I get best spark tutorials for beginners?

Hi@akhtar There are lots of online courses available ...READ MORE

May 14 in Apache Spark by MD
• 23,580 points
71 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14 in Apache Spark by MD
• 23,580 points
74 views
0 votes
0 answers

Do we have any platform where we can submit spark application.

looking for a platform where we can ...READ MORE

May 12 in Apache Spark by anonymous
• 120 points
67 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7 in Apache Spark by MD
• 23,580 points
205 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 7 in Apache Spark by MD
• 23,580 points
80 views
0 votes
1 answer
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6 in Apache Spark by MD
• 23,580 points
88 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6 in Apache Spark by MD
• 23,580 points
75 views
+1 vote
1 answer

py4j.protocol.Py4JError: org.apache.spark.api.python.PythonUtils.getEncryptionEnabled does not exist in the JVM

Hi@akhtar, This error may occur, if you don't ...READ MORE

Apr 7 in Apache Spark by MD
• 23,580 points
1,010 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29 in Apache Spark by MD
• 23,580 points
59 views
0 votes
1 answer
0 votes
1 answer

error: Caused by: org.apache.spark.SparkException: Failed to execute user defined function.

Hi@akhtar, I think you got this error due to version mismatch ...READ MORE

Apr 22 in Apache Spark by MD
• 23,580 points
76 views
0 votes
1 answer

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
11,536 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

Apr 13 in Apache Spark by MD
• 23,580 points
93 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7 in Apache Spark by MD
• 23,580 points
197 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
10,885 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,310 points
25,241 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 28 in Apache Spark by GAURAV
• 140 points
208 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
14,216 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,580 points
25,221 views
0 votes
0 answers

How to parse an S3 XML file to find tags using apache spark

How can one parse an S3 XML ...READ MORE

Mar 18 in Apache Spark by anonymous
• 120 points
93 views
0 votes
2 answers

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

When using the Java substring() method, a ...READ MORE

Mar 13 in Apache Spark by evanbung
• 180 points
770 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3 in Apache Spark by MD
• 23,580 points
1,355 views
0 votes
0 answers

One Hot Encoding in Apache Spark

The following code that I wrote for ...READ MORE

Feb 11 in Apache Spark by Manish
• 120 points
347 views
0 votes
1 answer
0 votes
1 answer

Difference between map() and mapPartitions() function in Spark??

Hi@ akhtar, Both map() and mapPartitions() are the ...READ MORE

Jan 29 in Apache Spark by MD
• 23,580 points
673 views
0 votes
1 answer

What is the difference between spark streaming and spark structured streaming?

Hi@akhtar Generally, Spark streaming  is used for real time ...READ MORE

Feb 4 in Apache Spark by MD
• 23,580 points
289 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

Feb 6 in Apache Spark by MD
• 23,580 points
172 views
0 votes
1 answer

Does spark streaming provides checkpoint?

Hi@akhtar, Yes, Spark streaming uses checkpoint. Checkpoint is ...READ MORE

Feb 4 in Apache Spark by MD
• 23,580 points
97 views
0 votes
1 answer

Is Spark Sql provides indexing to improve processing speed?

Hi@akhtar, There is no concept of indexing in ...READ MORE

Feb 4 in Apache Spark by MD
• 23,580 points
85 views
0 votes
1 answer

What are Dstreams?

Hi@akhtar, Dstreams are the basic abstraction that is ...READ MORE

Feb 4 in Apache Spark by MD
• 23,580 points
35 views
0 votes
0 answers

not able to get output in spark streaming??

Hi everyone, I tried to count individual words ...READ MORE

Feb 4 in Apache Spark by akhtar
• 10,930 points
67 views
0 votes
1 answer

Cannot create directory /hive/xzxz/_temporary/0. Name node is in safe mode.

Hi@akhtar, Here you are trying to save csv ...READ MORE

Feb 3 in Apache Spark by MD
• 23,580 points
51 views
0 votes
1 answer

Caused by: java.lang.NumberFormatException: Empty String

Hi@akhtar, As we know text files are in ...READ MORE

Jan 31 in Apache Spark by MD
• 23,580 points
162 views
0 votes
0 answers

env : R : No such file or directory

Hi, I tried to set sparkR .But I ...READ MORE

Jan 31 in Apache Spark by Hasid
• 370 points
105 views
0 votes
0 answers

What is pageRank in graphX??

Hi, I am new in spark. Can somebody ...READ MORE

Jan 31 in Apache Spark by akhtar
• 10,930 points
97 views
0 votes
0 answers

Error: Package: R-core-devel-3.6.0-1el7.x86_64 (epel) Requires: pcre2-devel

Hi, I am getting this error when try ...READ MORE

Jan 31 in Apache Spark by Hasid
• 370 points
85 views
0 votes
0 answers

Not enough space to cache rdd_80_1 in memory!

Hi everyone, I'm new in Spark. I am working ...READ MORE

Jan 29 in Apache Spark by akhtar
• 10,930 points
116 views