27 Jul 2014

Big Data Processing with Spark and Scala

The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14. Introduction to Spark & Scala: Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community...
Read More

The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.

Introduction to Spark & Scala:

Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.

Scala is an acronym for ‘Scalable Language’ Scala is a object-oriented language and its scalability is the result of a careful integration of object-oriented and functional language concepts. The language supports advanced component architectures through classes and traits. Scala also includes first-class functions and a library with resourceful immutable data structures.

Topics covered in the Video & Presentation:
  • What is Big Data?
  • What is Spark?
  • Why Spark?
  • Spark Ecosystem
  • A note about Scala
  • Why Scala?
  • Hello Spark

Spark Features:

  • Fast Analytics
  • Real-Time Stream Processing
  • Fault Tolerant
    Powerful and Integrated Data Processing
    Easy to use

Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.

Continue Watching

Watch It Again

Comments
11 Comments
  • Sree Eedupuganti

    hi everyone i am trying to access the data from hive to spark when i am running a query i can’t see the jobs is either completed or running but i am getting the output in terminal.Any suggestions plz….

  • Netra

    Is spark is the replacement of MapReduce or YARN in future or they are complementary?

    • EdurekaSupport

      Hi Netra, Spark is not a replacement as they have their own features. Spark runs on YARN cluster as well.

  • venkata murty maddula

    Excellent ….

    • EdurekaSupport

      Thanks Venkata!! Feel free to go through our other blog posts as well.

  • Kaustav Ray

    Being a fresher in data analytics, can I opt for learning spark before learning Hadoop ? [ I understand Java and have also worked with R. ]

    • EdurekaSupport

      Absolutely Kaustav!! You can go for Spark. Since you already to know Java, you can also go for Hadoop. You can call us at US: 1800 275 9730 (Toll Free) or India: +91 88808 62004 to discuss in detail. You can also go through this link for more information: http://www.edureka.co/big-data-and-hadoop

  • Amitabh

    Can we use Spark for unstructured data?

    • EdurekaSupport

      Absolutely Amitabh!! Spark can be used for Unstructured data. Either we can do some data cleansing and bring that data to Spark or we can do that in Spark itself.

  • Karuna Devanagavi

    Are scala and spark are also integrated in cloudera virtual machine ???