Most answered questions in Apache Spark

0 votes
1 answer

From the below code. what is the most appropriate next step in ML process?

Hi@ritu, The most appropriate step according to me ...READ MORE

Nov 25, 2020 in Apache Spark by MD
• 95,440 points
911 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

Option c) Mapr Jobs that are submitted READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
3,233 views
0 votes
1 answer

What does the below code print?

Option d) Run time error. READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
974 views
0 votes
1 answer

which one of the following commands is used to see the structure of the Dataframe?

Hi @Ritu If you want to see the ...READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
1,943 views
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

Hi@ritu, You need to learn the Architecture of ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
3,191 views
0 votes
1 answer
0 votes
1 answer

Spark - how the solve the below question?

option d, Runtime error READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
593 views
0 votes
1 answer

7)From Schema RDD, data can be cache by which one of the given choices?

Hi, @Ritu, According to the official documentation of Spark 1.2, ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
1,605 views
0 votes
1 answer

6)What allows spark streaming to provide fault tolerance for network sources of data?

Hi@ritu, Fault tolerance is the property that enables ...READ MORE

Dec 1, 2020 in Apache Spark by MD
• 95,440 points
2,108 views
0 votes
1 answer

4)Spark streaming converts streaming data into DStreams. which one of the given statements about DStreams is True?

Hi@ritu, Spark DStream (Discretized Stream) is the basic ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
2,366 views
0 votes
1 answer

2)What will be printed when the below code is executed ?

Hi, @Ritu, List(5,100,10) is printed. The take method returns the first n elements in ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
581 views
0 votes
1 answer

1)Given sfpd RDD, to create a pair RDD consisting of tuples of the form (Category. 1) in scala ,which of the following is used?

Hi, @Ritu, When creating a pair RDD from ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
5,626 views
0 votes
1 answer

How do you load this multiline data in spark as a single record?

Hi@Ruben, I think you can add an escape ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
1,844 views
0 votes
1 answer

How to read Avro Partition Data?

Hi@akhtar, When we try to retrieve the data ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
1,560 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
2,794 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,440 points
2,824 views
0 votes
1 answer

how create distance vector in pyspark (Euclidean distance)

Hi@dani, You can find the euclidean distance using ...READ MORE

Oct 16, 2020 in Apache Spark by MD
• 95,440 points
4,002 views
0 votes
1 answer

How to implement my clustering algorithm in pyspark (without using the ready library for example k-means)?

Hi@dani, As you said you are a beginner ...READ MORE

Oct 14, 2020 in Apache Spark by MD
• 95,440 points
1,359 views
0 votes
1 answer

Ranger KMS - Curl command

Hi@Shllpa, In general, we get the 401 status code ...READ MORE

Sep 29, 2020 in Apache Spark by MD
• 95,440 points
1,082 views
0 votes
1 answer

Facing issue while reading tsv file in pyspark

Hi@khyati, You are getting this type of output ...READ MORE

Sep 28, 2020 in Apache Spark by MD
• 95,440 points
2,088 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,440 points
3,445 views
0 votes
1 answer

I am not able to run the apache spark program in mac oc

Hi@Srinath, It seems you didn't set Hadoop for ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,440 points
1,147 views
0 votes
1 answer
0 votes
1 answer

how can I get all executors' pending jobs and stages of particular sparksession?

Hi@Neha, You can find all the job status ...READ MORE

Aug 19, 2020 in Apache Spark by MD
• 95,440 points
973 views
0 votes
1 answer

File not found exception while processing the spark job in yarn cluster mode with multinode hadoop cluster

Hi@Ganendra, I am not sure what's the issue, ...READ MORE

Jul 30, 2020 in Apache Spark by MD
• 95,440 points
4,106 views
0 votes
1 answer

Unable to submit the spark job in deployment mode - multinode cluster(using ubuntu machines) with yarn master

Hi@Ganendra, As you said you launched a multinode cluster, ...READ MORE

Jul 29, 2020 in Apache Spark by MD
• 95,440 points
1,776 views
0 votes
1 answer

how to run spark job from EC2 to EMR?

Hi, You can follow the below-given steps to ...READ MORE

Jun 25, 2020 in Apache Spark by MD
• 95,440 points
2,181 views
0 votes
1 answer

Can number of Spark task be greater than the executor core?

Hi@Rishi, Yes, number of spark tasks can be ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,440 points
1,641 views
0 votes
1 answer

Can the executor core be greater than the total number of spark tasks?

Hi@Rishi, Yes, it is possible. If executor no. ...READ MORE

Jun 17, 2020 in Apache Spark by MD
• 95,440 points
1,849 views
0 votes
1 answer

after installing hadoop 3.0.1 I can's access spark shell or hive shell.

Hi@abdul, Hadoop 3.0.1 has lots of new features. ...READ MORE

Jun 16, 2020 in Apache Spark by MD
• 95,440 points
847 views
0 votes
1 answer

How to unzip a folder to individual files in HDFS?

Hi, @Amey, You can go through this regarding ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,910 points
2,354 views
0 votes
1 answer

if i want to see my public key after running cat <path> command in gitbash but saying no such file or directory.

Hey, @KK, You can fix this issue may be ...READ MORE

May 26, 2020 in Apache Spark by Gitika
• 65,910 points
614 views
0 votes
1 answer

Where can I get best spark tutorials for beginners?

Hi@akhtar There are lots of online courses available ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,440 points
591 views
0 votes
1 answer

How to create a not null column in case class in spark

Hi@Deepak, In your test class you passed empid ...READ MORE

May 14, 2020 in Apache Spark by MD
• 95,440 points
4,574 views
+1 vote
1 answer

How to read .mp4 (video file) stored at HDFS using pyspark?

Hi@Amey, You can enable WebHDFS to do this ...READ MORE

May 29, 2020 in Apache Spark by MD
• 95,440 points
1,629 views
+1 vote
1 answer

Optimal column count for ORC and Parquet

Hi@Amey, It depends on your use case. Both ...READ MORE

May 8, 2020 in Apache Spark by MD
• 95,440 points
2,176 views
0 votes
1 answer

Py4JJavaError: An error occurred while calling o310.csv. : java.net.ConnectException: Call From master/192.168.56.101 to master:9000

Hi@akhtar, I think your HDFS cluster is not ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,440 points
6,973 views
+1 vote
1 answer

How to convert pyspark Dataframe to pandas Dataframe?

Hi@akhtar, To convert pyspark dataframe into pandas dataframe, ...READ MORE

May 7, 2020 in Apache Spark by MD
• 95,440 points
7,957 views
0 votes
1 answer

Error: No module named 'findspark'

Hi@akhtar, To import this module in your program, ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
19,802 views
0 votes
1 answer

ImportError: No module named 'pyspark'

Hi@akhtar, By default pyspark in not present in ...READ MORE

May 6, 2020 in Apache Spark by MD
• 95,440 points
14,970 views
0 votes
1 answer

ERROR thriftserver.SparkExecuteStatementOperation: Error executing query, currentState RUNNING, org.apache.spark.sql.catalyst.errors.package$TreeNodeException

Hi@akhtar, You may resolve this exception, by increasing the ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,440 points
1,679 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,440 points
3,301 views
0 votes
1 answer

error: Caused by: org.apache.spark.SparkException: Failed to execute user defined function.

Hi@akhtar, I think you got this error due to version mismatch ...READ MORE

Apr 22, 2020 in Apache Spark by MD
• 95,440 points
3,744 views
0 votes
1 answer

How to parse a textFile to csv in pyspark?

Hi, Use this below given code, it will ...READ MORE

Apr 13, 2020 in Apache Spark by MD
• 95,440 points
3,676 views
0 votes
1 answer

env: ‘python’: No such file or directory in pyspark.

Hi@akhtar, This error occurs because your python version ...READ MORE

Apr 7, 2020 in Apache Spark by MD
• 95,440 points
5,943 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 29, 2020 in Apache Spark by GAURAV
• 140 points
3,256 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

Feb 6, 2020 in Apache Spark by MD
• 95,440 points
3,624 views