questions/apache-spark/page/12
org.apache.spark.mllib is the old Spark API while ...READ MORE
Parquet is a columnar format supported by ...READ MORE
There are a bunch of functions that ...READ MORE
Mainly, we use SparkConf because we need ...READ MORE
You have to use the comparison operator ...READ MORE
Your error is with the version of ...READ MORE
I guess you need provide this kafka.bootstrap.servers ...READ MORE
Sliding Window controls transmission of data packets ...READ MORE
RDD is a fundamental data structure of ...READ MORE
map(): Return a new distributed dataset formed by ...READ MORE
Here are some of the important features of ...READ MORE
Minimizing data transfers and avoiding shuffling helps ...READ MORE
There are two methods to persist the ...READ MORE
Spark provides a pipe() method on RDDs. ...READ MORE
Caching the tables puts the whole table ...READ MORE
Spark uses Akka basically for scheduling. All ...READ MORE
Spark is a framework for distributed data ...READ MORE
I would recommend you create & build ...READ MORE
A Spark driver (aka an application’s driver ...READ MORE
spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE
It's not the collect() that is slow. ...READ MORE
RDD can be uncached using unpersist() So. use ...READ MORE
No, it is not mandatory, but there ...READ MORE
Either you have to create a Twitter4j.properties ...READ MORE
You can create a DataFrame from the ...READ MORE
Shark is a tool, developed for people ...READ MORE
You can use the following command. This ...READ MORE
sbin/start-master.sh : Starts a master instance on ...READ MORE
Parquet is a columnar format file supported ...READ MORE
rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE
By default a partition is created for ...READ MORE
Spark has various components: Spark SQL (Shark)- for ...READ MORE
SQL Interpreter & Optimizer handles the functional ...READ MORE
You need to change the following: val pipeline ...READ MORE
Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So ...READ MORE
Yes, it is possible to run Spark ...READ MORE
Yes, they both merge the values using ...READ MORE
Here are the changes in new version ...READ MORE
No not mandatory, but there is no ...READ MORE
SparkContext.createTaskScheduler property parses the master parameter Local: 1 ...READ MORE
By default it will access the HDFS. ...READ MORE
you can copy the application id from ...READ MORE
In Hadoop MapReduce the input data is ...READ MORE
Ganglia looks like a good option for ...READ MORE
Firstly, it's the In-memory computation, if the file ...READ MORE
DataFrames and SparkSQL performed almost about the ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.