Most voted questions in Apache Spark

0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,260 points
85 views
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,360 points
109 views
0 votes
1 answer

Hadoop mandatory for Spark?

No not mandatory, but there is no ...READ MORE

Jun 1, 2018 in Apache Spark by kurt_cobain
• 9,260 points
46 views
0 votes
1 answer

How does partitioning work in Spark?

By default a partition is created for ...READ MORE

May 31, 2018 in Apache Spark by nitinrawat895
• 10,730 points
70 views
0 votes
6 answers

How to replace null values in Spark DataFrame?

Hi i hope this will help for ...READ MORE

Feb 5 in Apache Spark by Srinivasreddy
• 140 points
23,586 views
0 votes
1 answer

How to import the dependencies of Spark MLlib into eclipse project?

I would recommend you create & build ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,310 points
324 views
0 votes
1 answer

Spark Machine Learning pipeline works fine in Spark 1.6, but it gives error when executed on Spark 2.x?

You need to change the following: val pipeline ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,310 points
107 views
0 votes
1 answer

What is Spark Piping?

Spark provides a pipe() method on RDDs. ...READ MORE

May 31, 2018 in Apache Spark by kurt_cobain
• 9,260 points
575 views
0 votes
1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

May 31, 2018 in Apache Spark by Data_Nerd
• 2,360 points
332 views
0 votes
1 answer

How to convert rdd object to dataframe in spark

SqlContext has a number of createDataFrame methods ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,574 views
0 votes
1 answer

Is there any way to uncache RDD?

RDD can be uncached using unpersist() So. use ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 10,730 points
143 views
0 votes
1 answer

How can I write a text file in HDFS not from an RDD, in Spark program?

Yes, you can go ahead and write ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,310 points
1,564 views
0 votes
4 answers

How to change the spark Session configuration in Pyspark?

You can dynamically load properties. First create ...READ MORE

Dec 10, 2018 in Apache Spark by Vini
16,471 views
0 votes
1 answer

What are the parameters in local[a,b,c] explains?

SparkContext.createTaskScheduler property parses the master parameter Local: 1 ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,310 points
100 views
0 votes
1 answer

How to save and retrieve the Spark RDD from HDFS?

You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,310 points
2,519 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

May 29, 2018 in Apache Spark by Data_Nerd
• 2,360 points
66 views
0 votes
1 answer

What is Sliding Window?

Sliding Window controls transmission of data packets ...READ MORE

May 28, 2018 in Apache Spark by nitinrawat895
• 10,730 points
32 views
0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

May 28, 2018 in Apache Spark by kurt_cobain
• 9,260 points
70 views
0 votes
1 answer

What is the difference between Apache Spark SQLContext vs HiveContext?

Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,730 points
2,126 views
0 votes
1 answer

How to find max value in pair RDD?

Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE

May 25, 2018 in Apache Spark by nitinrawat895
• 10,730 points
2,591 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,310 points
453 views
0 votes
3 answers

How to transpose Spark DataFrame?

Please check the below mentioned links for ...READ MORE

Dec 31, 2018 in Apache Spark by anonymous
6,642 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,310 points
698 views
0 votes
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,310 points
636 views
0 votes
1 answer

How to set keys & access tokens for Twitter Spark streaming?

Either you have to create a Twitter4j.properties ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,310 points
169 views
0 votes
2 answers

In a Spark DataFrame how can I flatten the struct?

// Collect data from input avro file ...READ MORE

Jul 4 in Apache Spark by Dhara dhruve
1,203 views
0 votes
1 answer

Is it better to have one large parquet file or lots of smaller parquet files?

Ideally, you would use snappy compression (default) ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,730 points
2,529 views
0 votes
1 answer

What's the difference between 'filter' and 'where' in Spark SQL?

Both 'filter' and 'where' in Spark SQL ...READ MORE

May 23, 2018 in Apache Spark by nitinrawat895
• 10,730 points
7,594 views
0 votes
1 answer

Why does sortBy transformation trigger a Spark job?

Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So ...READ MORE

May 8, 2018 in Apache Spark by kurt_cobain
• 9,260 points
154 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

May 8, 2018 in Apache Spark by kurt_cobain
• 9,260 points
240 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

May 7, 2018 in Apache Spark by BD Master
99 views
0 votes
1 answer

start-master and start-all?

sbin/start-master.sh : Starts a master instance on ...READ MORE

May 7, 2018 in Apache Spark by kurt_cobain
• 9,260 points
149 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,360 points
865 views
0 votes
1 answer

Spark Monitoring with Ganglia

Ganglia looks like a good option for ...READ MORE

May 4, 2018 in Apache Spark by kurt_cobain
• 9,260 points
269 views
0 votes
1 answer

Spark cannot access local file anymore?

By default it will access the HDFS. ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
47 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,260 points
68 views
0 votes
1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

May 3, 2018 in Apache Spark by Data_Nerd
• 2,360 points
107 views
0 votes
1 answer

When not to use foreachPartition and mapPartition?

With mapPartion() or foreachPartition(), you can only ...READ MORE

Apr 30, 2018 in Apache Spark by Data_Nerd
• 2,360 points
2,524 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

Apr 30, 2018 in Apache Spark by shams
• 3,580 points
156 views
0 votes
1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,260 points
463 views
0 votes
1 answer

Difference between createOrReplaceTempView and registerTempTable

createOrReplaceTempView() creates/replaces a local temp view with the dataframe provided. Lifetime of this ...READ MORE

Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,260 points
2,315 views
0 votes
1 answer

Spark Kill Running Application

you can copy the application id from ...READ MORE

Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,260 points
199 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,340 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
1,423 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,260 points
618 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,260 points
133 views
0 votes
1 answer

Is there any way to check the Spark version?

There are 2 ways to check the ...READ MORE

Apr 19, 2018 in Apache Spark by nitinrawat895
• 10,730 points
1,460 views
0 votes
1 answer

Changing Column position in spark dataframe

Yes, you can reorder the dataframe elements. You need ...READ MORE

Apr 19, 2018 in Apache Spark by Ashish
• 2,630 points
4,946 views
–1 vote
1 answer

Deciding number of spark context objects

How many spark context objects you should ...READ MORE

Jan 16 in Apache Spark by Omkar
• 67,660 points
45 views
–1 vote
1 answer

Not able to use sc in spark shell

Seems like master and worker are not ...READ MORE

Jan 3 in Apache Spark by Omkar
• 67,660 points
177 views