questions/apache-spark
No, it doesn’t provide storage layer but ...READ MORE
Spark SQL is capable of: Loading data from ...READ MORE
Apache Spark supports the following four languages: Scala, ...READ MORE
Spark is agnostic to the underlying cluster ...READ MORE
Just do the following: Edit your conf/log4j.properties file ...READ MORE
Let's first look at mapper side differences Map ...READ MORE
> In order to reduce the processing ...READ MORE
Comparison between Spark RDD vs DataFrame 1. Release ...READ MORE
ReduceByKey is the best for production. READ MORE
Fold in spark Fold is a very powerful ...READ MORE
Some of the key differences between an RDD and ...READ MORE
There are few reasons for keeping RDD ...READ MORE
Mainly, we use SparkConf because we need ...READ MORE
Can you share the screenshots for the ...READ MORE
Save it to a text file: line.saveAsTextFile("alicia.txt") Print contains ...READ MORE
val x = sc.parallelize(1 to 10, 2) // ...READ MORE
You can add external jars as arguments ...READ MORE
Yes, there is a difference between the ...READ MORE
Assuming your RDD[row] is called rdd, you ...READ MORE
org.apache.spark.mllib is the old Spark API while ...READ MORE
df.orderBy($"col".desc) - this works as well READ MORE
Your error is with the version of ...READ MORE
There is a difference between the two: mapValues ...READ MORE
According to me, start with a standalone ...READ MORE
val coder: (Int => String) = v ...READ MORE
Spark is a framework for distributed data ...READ MORE
A Spark driver (aka an application’s driver ...READ MORE
Parquet is a columnar format supported by ...READ MORE
map(): Return a new distributed dataset formed by ...READ MORE
spark-submit \ class org.apache.spark.examples.SparkPi \ deploy-mode client \ master spark//$SPARK_MASTER_IP:$SPARK_MASTER_PORT ...READ MORE
There are two popular ways using which ...READ MORE
Whenever a series of transformations are performed ...READ MORE
Minimizing data transfers and avoiding shuffling helps ...READ MORE
There are two methods to persist the ...READ MORE
The full form of RDD is a ...READ MORE
No, it is not necessary to install ...READ MORE
No, it is not mandatory, but there ...READ MORE
Spark has various persistence levels to store ...READ MORE
Shark is a tool, developed for people ...READ MORE
Here are some of the important features of ...READ MORE
SQL Interpreter & Optimizer handles the functional ...READ MORE
You can select the column and apply ...READ MORE
Use the function as following: var notFollowingList=List(9.8,7,6,3,1) df.filter(col("uid").isin(notFollowingList:_*)) You can ...READ MORE
You can create a DataFrame from the ...READ MORE
Spark has various components: Spark SQL (Shark)- for ...READ MORE
Parquet is a columnar format file supported ...READ MORE
No not mandatory, but there is no ...READ MORE
By default a partition is created for ...READ MORE
Hi, In Spark, fill() function of DataFrameNaFunctions class is used to replace ...READ MORE
I would recommend you create & build ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.