questions/apache-spark
Hi, @Ritu, option b for you, as Hash Partitioning ...READ MORE
Please check https://kb.databricks.com/streaming/file-sink-str ...READ MORE
Hello, From the error I get that the ...READ MORE
Hi@akhtar, To create multiple producer you have to ...READ MORE
Hi@akhtar Generally, Spark streaming is used for real time ...READ MORE
Hi, SparkSQL is a special component on the ...READ MORE
Hi@akhtar, You can write the spark dataframe in ...READ MORE
Refer to the below code: import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.FileSystem import ...READ MORE
Hi@Edureka, Spark's internal scheduler may truncate the lineage of the RDD graph ...READ MORE
Converting text file to Orc: Using Spark, the ...READ MORE
Hi, persist () allows the user to specify ...READ MORE
Hey, There are few methods provided by the ...READ MORE
Hi@akhtar, In /etc/spark/conf/spark-defaults.conf, append the path of your custom ...READ MORE
Start spark shell using below line of ...READ MORE
Try to put the kafka client for ...READ MORE
It is not like a CPU to ...READ MORE
from pyspark.sql.types import FloatType fname = [1.0,2.4,3.6,4.2,45.4] df=spark.createDataFrame(fname, ...READ MORE
Option c) Mapr Jobs that are submitted READ MORE
The reason you are able to load ...READ MORE
Try including the package while starting the ...READ MORE
Hi@ritu, You need to learn the Architecture of ...READ MORE
You have to use the comparison operator ...READ MORE
First create a Spark session like this: val ...READ MORE
Spark by default won't let you overwrite ...READ MORE
Hi@ritu, I think the problem can be solved ...READ MORE
you can access task information using TaskContext: import org.apache.spark.TaskContext sc.parallelize(Seq[Int](), ...READ MORE
You can disable it like this: val sc ...READ MORE
Caching the tables puts the whole table ...READ MORE
Refer to the below command used: val df ...READ MORE
All prefix operators' symbols are predefined: +, -, ...READ MORE
Hi, This error will only generate when you ...READ MORE
Hey, Jobs- to view all the spark jobs Stages- ...READ MORE
To change to version 2, run the ...READ MORE
Hi, I am able to understand your requirement. ...READ MORE
spark.read.csv is used when loading into a ...READ MORE
Try this code: val rdd= sc.textFile (“file.txt”, 5) rdd.partitions.size Output ...READ MORE
Hi@akhtar, Since Avro library is external to Spark, ...READ MORE
Hey, You can use this command to start ...READ MORE
Yield is used in sequence comprehensions. It is ...READ MORE
Hi, Spark provides a pipe() method on RDDs. ...READ MORE
Hi, I have the input RDD as a ...READ MORE
Minimizing data transfers and avoiding shuffling helps ...READ MORE
peopleDF: org.apache.spark.sql.DataFrame = [_corrupt_record: string] The above that ...READ MORE
Yes. You can use extra listeners by setting ...READ MORE
To make Spark store the event logs, ...READ MORE
I guess you need provide this kafka.bootstrap.servers ...READ MORE
Well, it depends on the block of ...READ MORE
It's not the collect() that is slow. ...READ MORE
The following code that I wrote for ...READ MORE
Hey, @Ritu, According to the question, the answer ...READ MORE
OR
At least 1 upper-case and 1 lower-case letter
Minimum 8 characters and Maximum 50 characters
Already have an account? Sign in.