Apache Spark and Scala (25 Blogs) Become a Certified Professional

Become a Certified Professional

9 Oct 2015

6.2K

5 Things One Must Know About Spark

Contents of the Webinar

1. Low Latency

2. Streaming support

3. Machine Learning and Graph

4. Data Frame API Introduction

5. Spark Integration with Hadoop

Spark Architecture

Similar to Hadoop, Spark is a framework as well. In the image below, Spark core is a processing engine which is the core spark API, that is internally written in Scala.

Low Latency

Spark cuts down read/write I/O to Disk

Spark stores its data in the form of RDDs and they’re nothing but in memory collection of the data which are distributed across the machines, however, there are limitations. The unique feature of spark is it stores data depending on the kind of infrastructure.

Streaming support

Event Processing

Used for processing real-time streaming data.

It uses the D-stream: A series of RDDs, to process the real-time data support.

Cyclic Data flows

1. All jobs in Spark comprise a series of operators and run on a set of data.

2. All the operators in a job are used to construct a DAG.

3. The DAG is optimized by rearranging and combining operators where its possible.

Support for data frames

Data frame features

Ability to scale from KBS to PBS.
Support for a wide array of data formats and storage systems.
Seemless integration with all big data tooling and infrastructure via spark.

Questions asked during the webinar

Mesos Vs YARN

Mesos and YARN are resource managers. YARN is popular because of Hadoop, mesos is not, although its functionality is the same.

Got a question for us? Please mention them in the comments section and we will get back to you.

Related Posts:

Get Started with Apache Spark and Scala

Apache Spark will replace Hadoop. Know why

ol/u/0/

Recommended blogs for you

Microsoft Fabric vs. Databricks

Copy Activity in Azure Data Factory and Azure Synapse Analytics

Azure Data Factory Vs Databricks

Data Engineer Salary in India

What is a Data Engineer? – A Comprehensive Guide

How to Create a Pipeline in Azure Data Factory Step-by-Step

What is Azure Cosmos DB? – Types, Features, Benefits

What is integration runtime in Azure data factory?

Azure Databricks Architecture Overview

What is Delta Lake?

Azure Synapse vs. Databricks – What Are the Differences?

What is Azure Data Factory – Here’s Everything You Need to Know

Azure Synapse: Unlocking the Power of Your Data

Azure Data Engineer Roadmap in 2025

30+ Azure Data Engineer Interview Questions

Azure Data Engineer Salary in India 2025

What are Kafka Streams and How are they implemented?

What are the Best books for Hadoop?

How to become an Apache Spark Developer?

How to Plan the Capacity of a Hadoop Cluster?

Comments

0 Comments

Join the discussionCancel reply

REGISTER FOR FREE WEBINAR

webinar_success

Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP