Most viewed questions in Apache Spark

0 votes
2 answers

5)Using which one of the given choices will you create an RDD with specific partitioning?

Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE

Nov 23, 2020 in Apache Spark by Gitika
• 65,910 points
3,657 views
+1 vote
1 answer

Spark: java.io.FileNotFoundException

Hello, From the error I get that the ...READ MORE

Dec 13, 2019 in Apache Spark by Alexandru
• 510 points
3,650 views
+1 vote
1 answer

_spark_metadata/0 doesn't exist while Compacting batch 9 Structured streaming error

Please check https://kb.databricks.com/streaming/file-sink-str ...READ MORE

Nov 20, 2019 in Apache Spark by anonymous
3,645 views
0 votes
1 answer

How to create multiple producers in apache kafka?

Hi@akhtar, To create multiple producer you have to ...READ MORE

Feb 6, 2020 in Apache Spark by MD
• 95,440 points
3,639 views
0 votes
1 answer

How SparkSQL is different from HQL and SQL?

Hi, SparkSQL is a special component on the ...READ MORE

Jul 3, 2019 in Apache Spark by Gitika
• 65,910 points
3,467 views
0 votes
1 answer

What is the difference between spark streaming and spark structured streaming?

Hi@akhtar Generally, Spark streaming  is used for real time ...READ MORE

Feb 4, 2020 in Apache Spark by MD
• 95,440 points
3,464 views
0 votes
1 answer

How to insert data into Cassandra table using Spark DataFrame?

Hi@akhtar, You can write the spark dataframe in ...READ MORE

Sep 21, 2020 in Apache Spark by MD
• 95,440 points
3,454 views
0 votes
1 answer

Copy file from local to hdfs from the spark job in yarn mode

Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE

Jul 24, 2019 in Apache Spark by Yogi
3,436 views
0 votes
1 answer

The number of stages in a job is equal to the number of RDDs in DAG. however, under one of the cgiven conditions, the scheduler can truncate the lineage. identify it.

Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE

Nov 26, 2020 in Apache Spark by MD
• 95,440 points
3,431 views
+1 vote
1 answer

Scala: Convert text file data into ORC format using data frame

Converting text file to Orc: Using Spark, the ...READ MORE

Aug 1, 2019 in Apache Spark by Esha
3,355 views
0 votes
1 answer

What is the difference between persist() and cache() in apache spark?

Hi, persist () allows the user to specify ...READ MORE

Jul 3, 2019 in Apache Spark by Gitika
• 65,910 points
3,342 views
0 votes
1 answer

How to save RDD in Apache Spark?

Hey, There are few methods provided by the ...READ MORE

Jul 23, 2019 in Apache Spark by Gitika
• 65,910 points
3,317 views
0 votes
1 answer

"java.lang.ClassNotFoundException" in Spark on Amazon EMR

Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE

Apr 29, 2020 in Apache Spark by MD
• 95,440 points
3,310 views
0 votes
1 answer

How to add package com.databricks.spark.avro in spark?

Start spark shell using below line of ...READ MORE

Jul 23, 2019 in Apache Spark by Ritu
3,297 views
0 votes
1 answer

What is Spark Core?

It is not like a CPU to ...READ MORE

Mar 8, 2019 in Apache Spark by Raj
3,273 views
0 votes
1 answer

Unable to run select query with selected columns on a temp view registered in spark application

from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE

Mar 29, 2020 in Apache Spark by GAURAV
• 140 points
3,265 views
0 votes
1 answer

What are some of the things you can monitor in the Spark Web UI?

Option c) Mapr Jobs that are submitted READ MORE

Nov 25, 2020 in Apache Spark by Gitika
• 65,910 points
3,250 views
0 votes
1 answer

load/save text file in spark

The reason you are able to load ...READ MORE

Jul 22, 2019 in Apache Spark by Giri
3,207 views
+1 vote
1 answer

How to convert JSON file to AVRO file and vise versa

Try including the package while starting the ...READ MORE

Aug 26, 2019 in Apache Spark by Karan
3,205 views
0 votes
1 answer

12)Which one of the given flows correctly describe the Spark Streaming Architecture?

Hi@ritu, You need to learn the Architecture of ...READ MORE

Nov 23, 2020 in Apache Spark by MD
• 95,440 points
3,202 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
3,175 views
0 votes
1 answer

How to get Spark SQL configuration?

First create a Spark session like this: val ...READ MORE

Mar 18, 2019 in Apache Spark by John
3,171 views
0 votes
1 answer

SparkContext.addFile() not able to update file.

Spark by default won't let you overwrite ...READ MORE

Mar 10, 2019 in Apache Spark by Siri
3,168 views
0 votes
1 answer

How to get ID of a map task in Spark?

you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE

Nov 20, 2018 in Apache Spark by Frankie
• 9,830 points
3,104 views
0 votes
1 answer

Disable Web UI for Spark Application

You can disable it like this: val sc ...READ MORE

Mar 6, 2019 in Apache Spark by Rohit
3,042 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
3,026 views
0 votes
1 answer

How to create dataframe for the comma delimited file?

 Refer to the below command used: val df ...READ MORE

Jul 5, 2019 in Apache Spark by karan
2,994 views
0 votes
1 answer

Scala: error: value unary_+ is not a member of (Int, Int)

All prefix operators' symbols are predefined: +, -, ...READ MORE

Jul 22, 2019 in Apache Spark by karan
2,972 views
0 votes
1 answer

error: reassingment to val

Hi, This error will only generate when you ...READ MORE

Jul 5, 2019 in Apache Spark by Gitika
• 65,910 points
2,940 views
0 votes
1 answer

What is Spark UI and how to monitor a spark job?

Hey, Jobs- to view all the spark jobs Stages- ...READ MORE

Aug 6, 2019 in Apache Spark by Gitika
• 65,910 points
2,904 views
0 votes
1 answer

How to change commiter algorithm version in Spark?

To change to version 2, run the ...READ MORE

Mar 10, 2019 in Apache Spark by Siri
2,891 views
0 votes
1 answer

How to read a dataframe based on an avro schema?

Hi, I am able to understand your requirement. ...READ MORE

Oct 30, 2020 in Apache Spark by MD
• 95,440 points
2,835 views
0 votes
1 answer

Error : split value is not a member of org.apache.spark.sql.Row

spark.read.csv is used when loading into a ...READ MORE

Jul 22, 2019 in Apache Spark by Firoz
2,811 views
0 votes
1 answer

Scala: 30: error: value partitions is not a member of String

Try this code: val rdd= sc.textFile (“file.txt”, 5) rdd.partitions.size Output ...READ MORE

Jul 29, 2019 in Apache Spark by Nijit
2,807 views
+1 vote
1 answer

How to write Spark DataFrame to Avro Data File?

Hi@akhtar, Since Avro library is external to Spark, ...READ MORE

Nov 4, 2020 in Apache Spark by MD
• 95,440 points
2,803 views
0 votes
1 answer

How to start spark history server?

Hey, You can use this command to start​ ...READ MORE

Jul 25, 2019 in Apache Spark by Gitika
• 65,910 points
2,782 views
0 votes
1 answer

Apache Spark, usage of yield.

Yield is used in sequence comprehensions. It is ...READ MORE

Feb 22, 2019 in Apache Spark by Saruj
2,749 views
0 votes
1 answer

What is Piping in Spark?

Hi, Spark provides a pipe() method on RDDs. ...READ MORE

Jul 3, 2019 in Apache Spark by Gitika
• 65,910 points
2,720 views
+1 vote
0 answers

How to create a list of RDDs(or RDD of RDDs, if possible) from a single JavaRDD<List<Integers>> in Java?

Hi, I have the input RDD as a ...READ MORE

Jan 11, 2020 in Apache Spark by itsroops
• 130 points
2,679 views
0 votes
1 answer

How can I minimize data transfers when working with Spark?

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Sep 19, 2018 in Apache Spark by zombie
• 3,790 points
2,666 views
0 votes
1 answer

Error while reading multiline Json

peopleDF: org.apache.spark.sql.DataFrame = [_corrupt_record: string] The above that ...READ MORE

May 23, 2019 in Apache Spark by Conny
2,643 views
0 votes
1 answer

How do spark extra listeners work?

Yes. You can use extra listeners by setting ...READ MORE

Feb 24, 2019 in Apache Spark by Rishi
2,640 views
0 votes
1 answer

How to enable Spark event logging?

To make Spark store the event logs, ...READ MORE

Mar 6, 2019 in Apache Spark by Rohit
2,630 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,490 points
2,607 views
+1 vote
1 answer

By default how many partitions are created in RDD in Apache spark?

Well, it depends on the block of ...READ MORE

Aug 2, 2019 in Apache Spark by Gitika
• 65,910 points
2,555 views
0 votes
1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

May 3, 2018 in Apache Spark by Data_Nerd
• 2,390 points
2,553 views
0 votes
0 answers

One Hot Encoding in Apache Spark

The following code that I wrote for ...READ MORE

Feb 11, 2020 in Apache Spark by Manish
• 120 points
2,527 views
0 votes
1 answer