Trending questions in Apache Spark

0 votes
1 answer

What class is declared in the blow code?

Option D: String class READ MORE

Nov 26, 2020 in Apache Spark by Gitika
• 65,850 points
219 views
0 votes
1 answer

From the following graph code ,which code snippet will return the no.of flight routes?

Hey, @Ritu, I am getting error in your ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,850 points
236 views
0 votes
1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,360 points
1,941 views
0 votes
0 answers

What is the output of the following code? [closed]

What is the output of the following ...READ MORE

Nov 25, 2020 in Apache Spark by Edureka
• 200 points

closed Nov 26, 2020 by MD 206 views
0 votes
1 answer

2)What will be printed when the below code is executed ?

Hi, @Ritu, List(5,100,10) is printed. The take method returns the first n elements in ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,850 points
196 views
0 votes
1 answer

Spark - how the solve the below question?

option d, Runtime error READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,850 points
156 views
0 votes
1 answer

How to read Avro Partition Data?

Hi@akhtar, When we try to retrieve the data ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,360 points
668 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,360 points
2,051 views
0 votes
1 answer

How to implement my clustering algorithm in pyspark (without using the ready library for example k-means)?

Hi@dani, As you said you are a beginner ...READ MORE

Oct 14, 2020 in Apache Spark by MD
• 95,360 points
673 views
0 votes
1 answer

Facing issue while reading tsv file in pyspark

Hi@khyati, You are getting this type of output ...READ MORE

Sep 28, 2020 in Apache Spark by MD
• 95,360 points
1,070 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,360 points
7,052 views
0 votes
1 answer

Ranger KMS - Curl command

Hi@Shllpa, In general, we get the 401 status code ...READ MORE

Sep 29, 2020 in Apache Spark by MD
• 95,360 points
424 views
0 votes
1 answer
0 votes
1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,360 points
729 views
+1 vote
1 answer

is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [51, 53, 10, 10]

Hi@akhtar, Here you are trying to read a ...READ MORE

Feb 3, 2020 in Apache Spark by MD
• 95,360 points
10,543 views
0 votes
1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30, 2020 in Apache Spark by MD
• 95,360 points
2,437 views
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19, 2020 in Apache Spark by MD
• 95,360 points
433 views
0 votes
1 answer

Py4JJavaError: An error occurred while calling o310.csv. : java.net.ConnectException: Call From master/192.168.56.101 to master:9000

Hi@akhtar, I think your HDFS cluster is not ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,360 points
4,558 views
0 votes
0 answers

Unable to get the Job status and Group ID java- spark standalone program with databricks

package com.dataguise.test; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.TimeUnit; import org.apache.spark.SparkContext; import org.apache.spark.SparkJobInfo; import ...READ MORE

Jul 23, 2020 in Apache Spark by kamboj
• 140 points

recategorized Jul 28, 2020 by Gitika 999 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29, 2020 in Apache Spark by MD
• 95,360 points
693 views
0 votes
1 answer

how to run spark job from EC2 to EMR?

Hi, You can follow the below-given steps to ...READ MORE

Jun 25, 2020 in Apache Spark by MD
• 95,360 points
1,444 views
0 votes
1 answer

org.apache.spark.sql.AnalysisException: cannot resolve "`id`" given input columns

I have used a header-less csv file ...READ MORE

Jul 14, 2019 in Apache Spark by Puneet
16,276 views
0 votes
2 answers

Error : split value is not a member of org.apache.spark.sql.Row

var d=rdd2col.rdd.map(x=>x.split(",")) or val names=rd ...READ MORE

Aug 5, 2020 in Apache Spark by Ramkumar Ramasamy.
7,749 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,360 points
2,618 views
0 votes
1 answer

Can the executor core be greater than the total number of spark tasks?

Hi@Rishi, Yes, it is possible. If executor no. ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,360 points
868 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7, 2020 in Apache Spark by MD
• 95,360 points
3,799 views
0 votes
1 answer

Can number of Spark task be greater than the executor core?

Hi@Rishi, Yes, number of spark tasks can be ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,360 points
386 views
0 votes
1 answer

after installing hadoop 3.0.1 I can's access spark shell or hive shell.

Hi@abdul, Hadoop 3.0.1 has lots of new features. ...READ MORE

Jun 16, 2020 in Apache Spark by MD
• 95,360 points
376 views
0 votes
1 answer

How to unzip a folder to individual files in HDFS?

Hi, @Amey, You can go through this regarding ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,850 points
1,117 views
0 votes
3 answers

Sorting rows in descending order in Spark SQL

df.orderBy($"col".desc) - this works as well READ MORE

Jul 5, 2020 in Apache Spark by Sai
• 160 points
15,030 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,360 points
2,040 views
0 votes
2 answers

java.lang.StringIndexOutOfBoundsException: String index out of range: 1

When using the Java substring() method, a ...READ MORE

Mar 13, 2020 in Apache Spark by evanbung
• 180 points
4,941 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

May 29, 2020 in Apache Spark by MD
• 95,360 points
827 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 8, 2020 in Apache Spark by MD
• 95,360 points
1,148 views
0 votes
1 answer

error: Caused by: org.apache.spark.SparkException: Failed to execute user defined function.

Hi@akhtar, I think you got this error due to version mismatch ...READ MORE

Apr 22, 2020 in Apache Spark by MD
• 95,360 points
1,847 views
+1 vote
1 answer

How to assign a column in Spark Dataframe (PySpark) as a Primary Key?

spark do not have any concept of ...READ MORE

Jan 12, 2020 in Apache Spark by Sirish
• 160 points
6,233 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

Apr 13, 2020 in Apache Spark by MD
• 95,360 points
2,242 views
0 votes
1 answer

if i want to see my public key after running cat <path> command in gitbash but saying no such file or directory.

Hey, @KK, You can fix this issue may be ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,850 points
269 views
0 votes
1 answer

Not enough space to cache rdd_80_1 in memory!

Hi@akhtar, Currently, you are running with the default ...READ MORE

Jul 22, 2020 in Apache Spark by MD
• 95,360 points
1,182 views
0 votes
1 answer

Where can I get best spark tutorials for beginners?

Hi@akhtar There are lots of online courses available ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,360 points
309 views
0 votes
0 answers

Do we have any platform where we can submit spark application.

looking for a platform where we can ...READ MORE

May 12, 2020 in Apache Spark by anonymous
• 120 points
284 views
0 votes
1 answer

ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.catalyst.errors.package$TreeNodeException

Hi@akhtar, You may resolve this exception, by increasing the ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,360 points
801 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 29, 2020 in Apache Spark by GAURAV
• 140 points
2,133 views
0 votes
1 answer

Difference between map() and mapPartitions() function in Spark.

Hi@ akhtar, Both map() and mapPartitions() are the ...READ MORE

Jan 29, 2020 in Apache Spark by MD
• 95,360 points
4,763 views
0 votes
1 answer

env : R : No such file or directory

Hi@akhtar, I also got this error. I am able to ...READ MORE

Jul 22, 2020 in Apache Spark by MD
• 95,360 points
794 views
0 votes
1 answer

What is pageRank in graphX??

Hi@akhtar, The PageRank algorithm outputs a probability distribution ...READ MORE

Jul 22, 2020 in Apache Spark by MD
• 95,360 points
456 views
0 votes
1 answer

Why do we use sc.parallelize?

Spark revolves around the concept of a ...READ MORE

Jul 11, 2019 in Apache Spark by Suman
12,047 views
–1 vote
0 answers

How to parse an S3 XML file to find tags using apache spark

How can one parse an S3 XML ...READ MORE

Mar 18, 2020 in Apache Spark by anonymous
• 110 points
1,236 views