Free Webinar on ‘Big Data Processing with Scala and Spark’

Big Data Processing with Spark and Scala

The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.

Introduction to Spark & Scala:

Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.

Scala is an acronym for ‘Scalable Language’ Scala is a object-oriented language and its scalability is the result of a careful integration of object-oriented and functional language concepts. The language supports advanced component architectures through classes and traits. Scala also includes first-class functions and a library with resourceful immutable data structures.

Topics covered in the Video & Presentation:

What is Big Data?
What is Spark?
Why Spark?
Spark Ecosystem
A note about Scala
Why Scala?
Hello Spark

Spark Features:

Fast Analytics
Real-Time Stream Processing
Fault Tolerant
Powerful and Integrated Data Processing
Easy to use

Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.

ol/u/0/

Sree Eedupuganti says:
Feb 23, 2015 at 1:18 pm GMT
hi everyone i am trying to access the data from hive to spark when i am running a query i can’t see the jobs is either completed or running but i am getting the output in terminal.Any suggestions plz….
Reply
Netra says:
Aug 21, 2014 at 3:07 am GMT
Is spark is the replacement of MapReduce or YARN in future or they are complementary?
Reply
- EdurekaSupport says:
  Sep 18, 2014 at 10:11 am GMT
  Hi Netra, Spark is not a replacement as they have their own features. Spark runs on YARN cluster as well.
  Reply
venkata murty maddula says:
Jul 28, 2014 at 1:39 pm GMT
Excellent ….
Reply
- EdurekaSupport says:
  Jul 30, 2014 at 8:35 am GMT
  Thanks Venkata!! Feel free to go through our other blog posts as well.
  Reply
Kaustav Ray says:
Jul 28, 2014 at 5:56 am GMT
Being a fresher in data analytics, can I opt for learning spark before learning Hadoop ? [ I understand Java and have also worked with R. ]
Reply
- EdurekaSupport says:
  Aug 20, 2014 at 1:00 am GMT
  Absolutely Kaustav!! You can go for Spark. Since you already to know Java, you can also go for Hadoop. You can call us at US: 1800 275 9730 (Toll Free) or India: +91 88808 62004 to discuss in detail. You can also go through this link for more information: https://www.edureka.co/big-data-hadoop-training-certification
  Reply
Amitabh says:
Jul 27, 2014 at 2:05 pm GMT
Can we use Spark for unstructured data?
Reply
- EdurekaSupport says:
  Jul 28, 2014 at 5:47 am GMT
  Absolutely Amitabh!! Spark can be used for Unstructured data. Either we can do some data cleansing and bring that data to Spark or we can do that in Spark itself.
  Reply
Karuna Devanagavi says:
Jul 27, 2014 at 4:53 am GMT
Are scala and spark are also integrated in cloudera virtual machine ???
Reply
- EdurekaSupport says:
  Jul 28, 2014 at 5:53 am GMT
  Hi Karuna, You can install Spark on CDH4(cloudera) using cloudera manager. You can refer to the following link for this: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
  Spark(Scala is included along with spark) will come integrated with CDH 5.1. You can refer to the following link: http://blog.cloudera.com/blog/2014/05/apache-spark-1-0-is-released/
  Reply

Big Data Processing with Spark and Scala

Introduction to Spark & Scala:

Recommended blogs for you

Microsoft Fabric vs. Databricks

Copy Activity in Azure Data Factory and Azure Synapse Analytics

Azure Data Factory Vs Databricks

Data Engineer Salary in India

What is a Data Engineer? – A Comprehensive Guide

How to Create a Pipeline in Azure Data Factory Step-by-Step

What is Azure Cosmos DB? – Types, Features, Benefits

What is integration runtime in Azure data factory?

Azure Databricks Architecture Overview

What is Delta Lake?

Azure Synapse vs. Databricks – What Are the Differences?

What is Azure Data Factory – Here’s Everything You Need to Know

Azure Synapse: Unlocking the Power of Your Data

Azure Data Engineer Roadmap in 2025

30+ Azure Data Engineer Interview Questions

Azure Data Engineer Salary in India 2025

What are Kafka Streams and How are they implemented?

What are the Best books for Hadoop?

How to become an Apache Spark Developer?

How to Plan the Capacity of a Hadoop Cluster?

Playlist & Videos

Join the discussionCancel reply

Browse Categories

Big Data Processing with Spark and Scala

Introduction to Spark & Scala:

Recommended blogs for you

Microsoft Fabric vs. Databricks

Copy Activity in Azure Data Factory and Azure Synapse Analytics

Azure Data Factory Vs Databricks

Data Engineer Salary in India

What is a Data Engineer? – A Comprehensive Guide

How to Create a Pipeline in Azure Data Factory Step-by-Step

What is Azure Cosmos DB? – Types, Features, Benefits

What is integration runtime in Azure data factory?

Azure Databricks Architecture Overview

What is Delta Lake?

Azure Synapse vs. Databricks – What Are the Differences?

What is Azure Data Factory – Here’s Everything You Need to Know

Azure Synapse: Unlocking the Power of Your Data

Azure Data Engineer Roadmap in 2025

30+ Azure Data Engineer Interview Questions

Azure Data Engineer Salary in India 2025

What are Kafka Streams and How are they implemented?

What are the Best books for Hadoop?

How to become an Apache Spark Developer?

How to Plan the Capacity of a Hadoop Cluster?

Playlist & Videos

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.