Big Data Architect Masters Program (10 Blogs) Become a Certified Professional
27 Jul 2014

Big Data Processing with Spark and Scala

The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.

Introduction to Spark & Scala:

Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.

Scala-EdurekaScala is an acronym for ‘Scalable Language’ Scala is a object-oriented language and its scalability is the result of a careful integration of object-oriented and functional language concepts. The language supports advanced component architectures through classes and traits. Scala also includes first-class functions and a library with resourceful immutable data structures.

Topics covered in the Video & Presentation:
  • What is Big Data?
  • What is Spark?
  • Why Spark?
  • Spark Ecosystem
  • A note about Scala
  • Why Scala?
  • Hello Spark

Spark Features:

  • Fast Analytics
  • Real-Time Stream Processing
  • Fault Tolerant
    Powerful and Integrated Data Processing
    Easy to use

Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.



  • Sree Eedupuganti says:

    hi everyone i am trying to access the data from hive to spark when i am running a query i can’t see the jobs is either completed or running but i am getting the output in terminal.Any suggestions plz….

  • Netra says:

    Is spark is the replacement of MapReduce or YARN in future or they are complementary?

    • EdurekaSupport says:

      Hi Netra, Spark is not a replacement as they have their own features. Spark runs on YARN cluster as well.

  • venkata murty maddula says:

    Excellent ….

    • EdurekaSupport says:

      Thanks Venkata!! Feel free to go through our other blog posts as well.

  • Kaustav Ray says:

    Being a fresher in data analytics, can I opt for learning spark before learning Hadoop ? [ I understand Java and have also worked with R. ]

    • EdurekaSupport says:

      Absolutely Kaustav!! You can go for Spark. Since you already to know Java, you can also go for Hadoop. You can call us at US: 1800 275 9730 (Toll Free) or India: +91 88808 62004 to discuss in detail. You can also go through this link for more information:

  • Amitabh says:

    Can we use Spark for unstructured data?

    • EdurekaSupport says:

      Absolutely Amitabh!! Spark can be used for Unstructured data. Either we can do some data cleansing and bring that data to Spark or we can do that in Spark itself.

  • Karuna Devanagavi says:

    Are scala and spark are also integrated in cloudera virtual machine ???

Join the discussion

Browse Categories
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.