questions/apache-spark
I would recommend you create & build ...READ MORE
You need to change the following: val pipeline ...READ MORE
Spark provides a pipe() method on RDDs. ...READ MORE
Spark uses Akka basically for scheduling. All ...READ MORE
SqlContext has a number of createDataFrame methods ...READ MORE
RDD can be uncached using unpersist() So. use ...READ MORE
You aren't actually overwriting anything with this ...READ MORE
SparkContext.createTaskScheduler property parses the master parameter Local: 1 ...READ MORE
You can save the RDD using saveAsObjectFile and saveAsTextFile method. ...READ MORE
Yes, it is possible to run Spark ...READ MORE
Sliding Window controls transmission of data packets ...READ MORE
Here are the changes in new version ...READ MORE
Spark 2.0+ Spark 2.0 provides native window functions ...READ MORE
Use Array.maxBy method: val a = Array(("a",1), ("b",2), ...READ MORE
You have to use the comparison operator ...READ MORE
Please check the below mentioned links for ...READ MORE
I guess you need provide this kafka.bootstrap.servers ...READ MORE
Either you have to create a Twitter4j.properties ...READ MORE
// Collect data from input avro file ...READ MORE
Ideally, you would use snappy compression (default) ...READ MORE
Both 'filter' and 'where' in Spark SQL ...READ MORE
Actually, sortBy/sortByKey depends on RangePartitioner (JVM). So ...READ MORE
rdd.mapPartitions(iter => Array(iter.size).iterator, true) This command will ...READ MORE
In Hadoop MapReduce the input data is ...READ MORE
sbin/start-master.sh : Starts a master instance on ...READ MORE
Caching the tables puts the whole table ...READ MORE
Ganglia looks like a good option for ...READ MORE
By default it will access the HDFS. ...READ MORE
You can use the following command. This ...READ MORE
It's not the collect() that is slow. ...READ MORE
With mapPartion() or foreachPartition(), you can only ...READ MORE
Firstly, it's the In-memory computation, if the file ...READ MORE
There are a bunch of functions that ...READ MORE
I am pretty sure createOrReplaceTempView just replaced ...READ MORE
you can copy the application id from ...READ MORE
In your log4j.properties file you need to ...READ MORE
As parquet is a column based storage ...READ MORE
Yes, they both merge the values using ...READ MORE
DataFrames and SparkSQL performed almost about the ...READ MORE
There are 2 ways to check the ...READ MORE
Yes, you can reorder the dataframe elements. You need ...READ MORE
How can one parse an S3 XML ...READ MORE
what is the benefit of repartition(1) and ...READ MORE
How many spark context objects you should ...READ MORE
Seems like master and worker are not ...READ MORE
Hey, Java’s “If. Else”: In Java, “If. Else” is a statement, ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.