Trending questions in Apache Spark

0 votes
1 answer

Difference between Spark ML & Spark MLlib package

org.apache.spark.mllib is the old Spark API while ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,480 points
927 views
0 votes
2 answers

Parquet Files Advantages

Parquet is a columnar format supported by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,790 points
934 views
0 votes
1 answer

Efficient way to read specific columns from parquet file in spark

As parquet is a column based storage ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
3,888 views
0 votes
1 answer

How to stop messages from being displayed on spark console?

In your log4j.properties file you need to ...READ MORE

Apr 24, 2018 in Apache Spark by kurt_cobain
• 9,390 points
3,672 views
0 votes
1 answer

Spark streaming with Kafka dependency error

Your error is with the version of ...READ MORE

Jul 5, 2018 in Apache Spark by Shubham
• 13,480 points
554 views
0 votes
2 answers

map() and flatmap()

map(): Return a new distributed dataset formed by ...READ MORE

Jul 3, 2018 in Apache Spark by zombie
• 3,790 points
316 views
0 votes
1 answer

How RDD persist the data in Spark?

There are two methods to persist the ...READ MORE

Jun 18, 2018 in Apache Spark by nitinrawat895
• 11,380 points
707 views
0 votes
1 answer

Persistence Levels in Spark

Spark has various persistence levels to store ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
1,096 views
0 votes
1 answer

How can I compare the elements of the RDD using MapReduce?

You have to use the comparison operator ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,480 points
1,722 views
0 votes
1 answer

Getting error while connecting zookeeper in Kafka - Spark Streaming integration

I guess you need provide this kafka.bootstrap.servers ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,480 points
1,643 views
0 votes
1 answer

Which is better in term of speed, Shark or Spark?

Spark is a framework for distributed data ...READ MORE

Jun 25, 2018 in Apache Spark by nitinrawat895
• 11,380 points
173 views
0 votes
1 answer

Minimizing Data Transfers in Spark

Minimizing data transfers and avoiding shuffling helps ...READ MORE

Jun 19, 2018 in Apache Spark by Data_Nerd
• 2,390 points
403 views
0 votes
1 answer

What is Spark Piping?

Spark provides a pipe() method on RDDs. ...READ MORE

May 31, 2018 in Apache Spark by kurt_cobain
• 9,390 points
1,159 views
0 votes
1 answer

Spark Driver roles

A Spark driver (aka an application’s driver ...READ MORE

Jun 21, 2018 in Apache Spark by Ashish
• 2,650 points
187 views
0 votes
1 answer

Spark standalone client mode

spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE

Jun 20, 2018 in Apache Spark by Ashish
• 2,650 points
196 views
0 votes
1 answer

How to import the dependencies of Spark MLlib into eclipse project?

I would recommend you create & build ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,480 points
962 views
+1 vote
1 answer

Can anyone explain what is RDD in Spark?

RDD is a fundamental data structure of ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,480 points
1,188 views
0 votes
1 answer

cache tables in apache spark sql

Caching the tables puts the whole table ...READ MORE

May 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
2,030 views
0 votes
1 answer

Is it mandatory to start Hadoop to run spark application?

No, it is not mandatory, but there ...READ MORE

Jun 14, 2018 in Apache Spark by nitinrawat895
• 11,380 points
231 views
0 votes
1 answer

Akka in Spark

Spark uses Akka basically for scheduling. All ...READ MORE

May 31, 2018 in Apache Spark by Data_Nerd
• 2,390 points
785 views
0 votes
1 answer

Kafka Feature

Here are some of the important features of ...READ MORE

Jun 7, 2018 in Apache Spark by Data_Nerd
• 2,390 points
397 views
0 votes
1 answer

Is there any way to uncache RDD?

RDD can be uncached using unpersist() So. use ...READ MORE

May 30, 2018 in Apache Spark by nitinrawat895
• 11,380 points
699 views
0 votes
1 answer

Convert the given Spar rdd object to Spark DataFrame.

You can create a DataFrame from the ...READ MORE

Jun 5, 2018 in Apache Spark by Shubham
• 13,480 points
336 views
0 votes
1 answer

What is Shark?

Shark is a tool, developed for people ...READ MORE

Jun 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
187 views
0 votes
1 answer

How to get Spark dataset metadata?

There are a bunch of functions that ...READ MORE

Apr 26, 2018 in Apache Spark by kurt_cobain
• 9,390 points
2,009 views
0 votes
1 answer

SQLInterpreter in Spark

SQL Interpreter & Optimizer handles the functional ...READ MORE

Jun 7, 2018 in Apache Spark by kurt_cobain
• 9,390 points
186 views
0 votes
1 answer

Parquet File

Parquet is a columnar format file supported ...READ MORE

Jun 4, 2018 in Apache Spark by Data_Nerd
• 2,390 points
300 views
0 votes
1 answer

Different Spark Ecosystem

Spark has various components: Spark SQL (Shark)- for ...READ MORE

Jun 4, 2018 in Apache Spark by kurt_cobain
• 9,390 points
215 views
0 votes
1 answer

Spark Machine Learning pipeline works fine in Spark 1.6, but it gives error when executed on Spark 2.x?

You need to change the following: val pipeline ...READ MORE

May 31, 2018 in Apache Spark by Shubham
• 13,480 points
353 views
0 votes
1 answer

How does partitioning work in Spark?

By default a partition is created for ...READ MORE

May 31, 2018 in Apache Spark by nitinrawat895
• 11,380 points
321 views
0 votes
1 answer

Hadoop mandatory for Spark?

No not mandatory, but there is no ...READ MORE

Jun 1, 2018 in Apache Spark by kurt_cobain
• 9,390 points
172 views
0 votes
1 answer

How to set keys & access tokens for Twitter Spark streaming?

Either you have to create a Twitter4j.properties ...READ MORE

May 24, 2018 in Apache Spark by Shubham
• 13,480 points
465 views
0 votes
1 answer

Is it possible to run Spark and Mesos along with Hadoop?

Yes, it is possible to run Spark ...READ MORE

May 29, 2018 in Apache Spark by Data_Nerd
• 2,390 points
251 views
0 votes
1 answer

What are the parameters in local[a,b,c] explains?

SparkContext.createTaskScheduler property parses the master parameter Local: 1 ...READ MORE

May 29, 2018 in Apache Spark by Shubham
• 13,480 points
220 views
0 votes
1 answer

What is Sliding Window?

Sliding Window controls transmission of data packets ...READ MORE

May 28, 2018 in Apache Spark by nitinrawat895
• 11,380 points
217 views
0 votes
1 answer

Spark 2.3? What is new in it?

Here are the changes in new version ...READ MORE

May 28, 2018 in Apache Spark by kurt_cobain
• 9,390 points
203 views
0 votes
1 answer

start-master and start-all?

sbin/start-master.sh : Starts a master instance on ...READ MORE

May 7, 2018 in Apache Spark by kurt_cobain
• 9,390 points
958 views
0 votes
1 answer

How to get the number of elements in partition?

rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE

May 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
751 views
0 votes
1 answer

Why does sortBy transformation trigger a Spark job?

Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So ...READ MORE

May 8, 2018 in Apache Spark by kurt_cobain
• 9,390 points
567 views
0 votes
1 answer

Spark Monitoring with Ganglia

Ganglia looks like a good option for ...READ MORE

May 4, 2018 in Apache Spark by kurt_cobain
• 9,390 points
729 views
0 votes
1 answer

reduceByKey or reduceByKeyLocally , which should be preferred ?

Yes, they both merge the values using ...READ MORE

Apr 20, 2018 in Apache Spark by kurt_cobain
• 9,390 points
1,193 views
0 votes
1 answer

Why is collect in SparkR slow?

It's not the collect() that is slow. ...READ MORE

May 3, 2018 in Apache Spark by Data_Nerd
• 2,390 points
628 views
0 votes
1 answer

Can I read a CSV represented as a string into Apache Spark?

You can use the following command. This ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,390 points
606 views
0 votes
1 answer

How is Apache Spark different from the Hadoop approach?

In Hadoop MapReduce the input data is ...READ MORE

May 7, 2018 in Apache Spark by BD Master
311 views
0 votes
1 answer

Spark Kill Running Application

you can copy the application id from ...READ MORE

Apr 25, 2018 in Apache Spark by kurt_cobain
• 9,390 points
799 views
0 votes
1 answer

Why is Spark faster than Hadoop Map Reduce

Firstly, it's the In-memory computation, if the file ...READ MORE

Apr 30, 2018 in Apache Spark by shams
• 3,660 points
565 views
0 votes
1 answer

Spark cannot access local file anymore?

By default it will access the HDFS. ...READ MORE

May 3, 2018 in Apache Spark by kurt_cobain
• 9,390 points
152 views
0 votes
1 answer

Which query to use for better performance, join in SQL or using Dataset API?

DataFrames and SparkSQL performed almost about the ...READ MORE

Apr 19, 2018 in Apache Spark by kurt_cobain
• 9,390 points
654 views