Most answered questions in Apache Spark

0 votes
1 answer

From the below code. what is the most appropriate next step in ML process?

Hi@ritu, The most appropriate step according to me ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,440 points
913 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

Option c) Mapr Jobs that are submitted READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
3,239 views
0 votes
1 answer

What does the below code print?

Option d) Run time error. READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
974 views
0 votes
1 answer

which one of the following commands is used to see the structure of the Dataframe?

Hi @Ritu If you want to see the ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
1,943 views
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

Hi@ritu, You need to learn the Architecture of ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
3,195 views
0 votes
1 answer
0 votes
1 answer

Spark - how the solve the below question?

option d, Runtime error READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
594 views
0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
1,605 views
0 votes
1 answer

6)What allows spark streaming to provide fault tolerance for network sources of data?

Hi@ritu, Fault tolerance is the property that enables ...READ MORE

Dec 1, 2020 in Apache Spark by MD
• 95,440 points
2,111 views
0 votes
1 answer

4)Spark streaming converts streaming data into DStreams. which one of the given statements about DStreams is True?

Hi@ritu, Spark DStream (Discretized Stream) is the basic ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
2,370 views
0 votes
1 answer

2)What will be printed when the below code is executed ?

Hi, @Ritu, List(5,100,10) is printed. The take method returns the first n elements in ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
582 views
0 votes
1 answer

1)Given sfpd RDD, to create a pair RDD consisting of tuples of the form (Category. 1) in scala ,which of the following is used?

Hi, @Ritu, When creating a pair RDD from ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
5,627 views
0 votes
1 answer

How do you load this multiline data in spark as a single record?

Hi@Ruben, I think you can add an escape ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
1,846 views
0 votes
1 answer

How to read Avro Partition Data?

Hi@akhtar, When we try to retrieve the data ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
1,563 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
2,797 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,440 points
2,827 views
0 votes
1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,440 points
4,009 views
0 votes
1 answer

How to implement my clustering algorithm in pyspark (without using the ready library for example k-means)?

Hi@dani, As you said you are a beginner ...READ MORE

Oct 14, 2020 in Apache Spark by MD
• 95,440 points
1,365 views
0 votes
1 answer

Ranger KMS - Curl command

Hi@Shllpa, In general, we get the 401 status code ...READ MORE

Sep 29, 2020 in Apache Spark by MD
• 95,440 points
1,086 views
0 votes
1 answer

Facing issue while reading tsv file in pyspark

Hi@khyati, You are getting this type of output ...READ MORE

Sep 28, 2020 in Apache Spark by MD
• 95,440 points
2,090 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,440 points
3,449 views
0 votes
1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,440 points
1,148 views
0 votes
1 answer
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19, 2020 in Apache Spark by MD
• 95,440 points
976 views
0 votes
1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30, 2020 in Apache Spark by MD
• 95,440 points
4,107 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29, 2020 in Apache Spark by MD
• 95,440 points
1,777 views
0 votes
1 answer

how to run spark job from EC2 to EMR?

Hi, You can follow the below-given steps to ...READ MORE

Jun 25, 2020 in Apache Spark by MD
• 95,440 points
2,182 views
0 votes
1 answer

Can number of Spark task be greater than the executor core?

Hi@Rishi, Yes, number of spark tasks can be ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,440 points
1,642 views
0 votes
1 answer

Can the executor core be greater than the total number of spark tasks?

Hi@Rishi, Yes, it is possible. If executor no. ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,440 points
1,849 views
0 votes
1 answer

after installing hadoop 3.0.1 I can's access spark shell or hive shell.

Hi@abdul, Hadoop 3.0.1 has lots of new features. ...READ MORE

Jun 16, 2020 in Apache Spark by MD
• 95,440 points
848 views
0 votes
1 answer

How to unzip a folder to individual files in HDFS?

Hi, @Amey, You can go through this regarding ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,910 points
2,361 views
0 votes
1 answer

if i want to see my public key after running cat <path> command in gitbash but saying no such file or directory.

Hey, @KK, You can fix this issue may be ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,910 points
614 views
0 votes
1 answer

Where can I get best spark tutorials for beginners?

Hi@akhtar There are lots of online courses available ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,440 points
591 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,440 points
4,576 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

May 29, 2020 in Apache Spark by MD
• 95,440 points
1,630 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 8, 2020 in Apache Spark by MD
• 95,440 points
2,178 views
0 votes
1 answer

Py4JJavaError: An error occurred while calling o310.csv. : java.net.ConnectException: Call From master/192.168.56.101 to master:9000

Hi@akhtar, I think your HDFS cluster is not ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,440 points
6,975 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,440 points
7,960 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
19,804 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
14,975 views
0 votes
1 answer

ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.catalyst.errors.package$TreeNodeException

Hi@akhtar, You may resolve this exception, by increasing the ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,440 points
1,680 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,440 points
3,306 views
0 votes
1 answer

error: Caused by: org.apache.spark.SparkException: Failed to execute user defined function.

Hi@akhtar, I think you got this error due to version mismatch ...READ MORE

Apr 22, 2020 in Apache Spark by MD
• 95,440 points
3,748 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

Apr 13, 2020 in Apache Spark by MD
• 95,440 points
3,678 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7, 2020 in Apache Spark by MD
• 95,440 points
5,946 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 29, 2020 in Apache Spark by GAURAV
• 140 points
3,257 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

Feb 6, 2020 in Apache Spark by MD
• 95,440 points
3,626 views