5 Things One Must Know About Spark
Contents of the Webinar
1. Low Latency
2. Streaming support
3. Machine Learning and Graph
4. Data Frame API Introduction
5. Spark Integration with Hadoop
Similar to Hadoop, Spark is a framework as well. In the image below, Spark core is a processing engine which is the core spark API, that is internally written in Scala.
Spark cuts down read/write I/O to Disk
Spark stores its data in the form of RDDs and they’re nothing but in memory collection of the data which are distributed across the machines, however, there are limitations. The unique feature of spark is it stores data depending on the kind of infrastructure.
Used for processing real-time streaming data.
It uses the D-stream: A series of RDDs, to process the real-time data support.
Cyclic Data flows
1. All jobs in Spark comprise a series of operators and run on a set of data.
2. All the operators in a job are used to construct a DAG.
3. The DAG is optimized by rearranging and combining operators where its possible.
Support for data frames
Data frame features
Ability to scale from KBS to PBS.
Support for a wide array of data formats and storage systems.
Seemless integration with all big data tooling and infrastructure via spark.
Questions asked during the webinar
Mesos Vs YARN
Mesos and YARN are resource managers. YARN is popular because of Hadoop, mesos is not, although its functionality is the same.
Got a question for us? Please mention them in the comments section and we will get back to you.