Trending questions in Apache Spark

0 votes
2 answers

Filtering a row in Spark DataFrame based on matching values from a list

Use the function as following: var notFollowingList=List(9.8,7,6,3 ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,450 points
63,654 views
0 votes
12 answers

How to create new column with function in Spark Dataframe?

val coder: (Int => String) = v ...READ MORE

Apr 4, 2019 in Apache Spark by anonymous

edited Apr 5, 2019 by Omkar 56,679 views
0 votes
7 answers

How to replace null values in Spark DataFrame?

in spark 2.x you can directly use ...READ MORE

Mar 28 in Apache Spark by gaurav
47,618 views
+5 votes
11 answers

Concatenate columns in apache spark dataframe

its late but this how you can ...READ MORE

Mar 21, 2019 in Apache Spark by anonymous
55,860 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
39,344 views
0 votes
5 answers

groupByKey vs reduceByKey in Apache Spark.

ReduceByKey is the best for production. READ MORE

Mar 3, 2019 in Apache Spark by anonymous
31,094 views
0 votes
7 answers

How to print the contents of RDD in Apache Spark?

Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE

Dec 10, 2018 in Apache Spark by Akshay
32,169 views
0 votes
1 answer

Spark: Dataframe vs Dataset

Recently, there are two new data abstractions ...READ MORE

Jul 29, 2019 in Apache Spark by Jackie
18,290 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

5 hours ago in Apache Spark by MD
• 54,980 points
13 views
0 votes
1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

9 hours ago in Apache Spark by MD
• 54,980 points
13 views
0 votes
1 answer
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19 in Apache Spark by MD
• 54,980 points
67 views
+1 vote
2 answers
0 votes
1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30 in Apache Spark by MD
• 54,980 points
128 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29 in Apache Spark by MD
• 54,980 points
105 views
0 votes
0 answers

Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import org.apache.spark.SparkContext; import org.apache.spark.SparkJobInfo; import ...READ MORE

Jul 23 in Apache Spark by kamboj
• 140 points

recategorized Jul 28 by Gitika 144 views
0 votes
2 answers

map() vs flatMap() in Spark

Spark map function expresses a one-to-one transformation. ...READ MORE

Jun 17, 2019 in Apache Spark by vishal
• 180 points
19,416 views
+1 vote
3 answers

What is the difference between rdd and dataframes in Apache Spark ?

Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE

Aug 27, 2018 in Apache Spark by shams
• 3,630 points
29,704 views
0 votes
1 answer

how to run spark job from EC2 to EMR?

Hi, You can follow the below-given steps to ...READ MORE

Jun 25 in Apache Spark by MD
• 54,980 points
270 views
+1 vote
1 answer

Reading a text file through spark data frame

Try this: val df = sc.textFile("HDFS://nameservice1/user/edureka_168049/Structure_IT/samplefile.txt") df.collect() val df = ...READ MORE

Jul 24, 2019 in Apache Spark by Suri
14,767 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7 in Apache Spark by MD
• 54,980 points
2,306 views
+2 votes
4 answers

use length function in substring in spark

You can use the function expr val data ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,320 points
28,773 views
0 votes
1 answer

Can the executor core be greater than the total number of spark tasks?

Hi@Rishi, Yes, it is possible. If executor no. ...READ MORE

Jun 17 in Apache Spark by MD
• 54,980 points
133 views
0 votes
1 answer

Can number of Spark task be greater than the executor core?

Hi@Rishi, Yes, number of spark tasks can be ...READ MORE

Jun 17 in Apache Spark by MD
• 54,980 points
118 views
0 votes
1 answer

after installing hadoop 3.0.1 I can's access spark shell or hive shell.

Hi@abdul, Hadoop 3.0.1 has lots of new features. ...READ MORE

Jun 16 in Apache Spark by MD
• 54,980 points
126 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6 in Apache Spark by MD
• 54,980 points
1,253 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6 in Apache Spark by MD
• 54,980 points
1,086 views
0 votes
1 answer

How to unzip a folder to individual files in HDFS?

Hi, @Amey, You can go through this regarding ...READ MORE

May 26 in Apache Spark by Gitika
• 36,850 points
171 views
0 votes
1 answer

if i want to see my public key after running cat <path> command in gitbash but saying no such file or directory.

Hey, @KK, You can fix this issue may be ...READ MORE

May 26 in Apache Spark by Gitika
• 36,850 points
105 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

May 29 in Apache Spark by MD
• 54,980 points
179 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14 in Apache Spark by MD
• 54,980 points
315 views
0 votes
1 answer

Where can I get best spark tutorials for beginners?

Hi@akhtar There are lots of online courses available ...READ MORE

May 14 in Apache Spark by MD
• 54,980 points
141 views
0 votes
1 answer
0 votes
0 answers

Do we have any platform where we can submit spark application.

looking for a platform where we can ...READ MORE

May 12 in Apache Spark by anonymous
• 120 points
115 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 7 in Apache Spark by MD
• 54,980 points
180 views
0 votes
3 answers

Sorting rows in descending order in Spark SQL

df.orderBy($"col".desc) - this works as well READ MORE

Jul 5 in Apache Spark by Sai
• 150 points
13,319 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29 in Apache Spark by MD
• 54,980 points
380 views
0 votes
1 answer
0 votes
1 answer

env : R : No such file or directory

Hi@akhtar, I also got this error. I am able to ...READ MORE

Jul 21 in Apache Spark by MD
• 54,980 points
235 views
0 votes
1 answer

Not enough space to cache rdd_80_1 in memory!

Hi@akhtar, Currently, you are running with the default ...READ MORE

Jul 21 in Apache Spark by MD
• 54,980 points
265 views
0 votes
1 answer

What is pageRank in graphX??

Hi@akhtar, The PageRank algorithm outputs a probability distribution ...READ MORE

Jul 21 in Apache Spark by MD
• 54,980 points
157 views
0 votes
1 answer

error: Caused by: org.apache.spark.SparkException: Failed to execute user defined function.

Hi@akhtar, I think you got this error due to version mismatch ...READ MORE

Apr 22 in Apache Spark by MD
• 54,980 points
334 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve given input columns

The string Productivity has to be enclosed between single ...READ MORE

Jul 10, 2019 in Apache Spark by Tina
12,668 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7 in Apache Spark by MD
• 54,980 points
850 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

Apr 13 in Apache Spark by MD
• 54,980 points
352 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3 in Apache Spark by MD
• 54,980 points
2,855 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 28 in Apache Spark by GAURAV
• 140 points
412 views
0 votes
2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

Aug 5 in Apache Spark by Ramkumar Ramasamy.
3,062 views