PySpark Certification Training Course
- 9k Enrolled Learners
- Live Class
The above video is the recorded webinar session on the topic “Big Data Processing with Spark and Scala”, held on 27th July’14.
Apache Spark is a fast and general engine for large-scale data processing, originally developed in the AMPLab at UC Berkeley. Spark is a good fit for the Hadoop open-source community as its built on top of the Hadoop Distributed File System (HDFS). But Spark has the added advantage of not being tied to the two-stage MapReduce paradigm and Apache Spark addresses the limitations of Hadoop MapReduce, by generalizing the MapReduce computation model, while dramatically improving performance and ease of use. Spark provides primitives for in-memory cluster computing that enables user programs to load data into a cluster’s memory and query it repeatedly, making it well suited to machine learning algorithms.
Scala is an acronym for ‘Scalable Language’ Scala is a object-oriented language and its scalability is the result of a careful integration of object-oriented and functional language concepts. The language supports advanced component architectures through classes and traits. Scala also includes first-class functions and a library with resourceful immutable data structures.
Please visit this link for more details about our course ‘Big Data Processing with Scala and Spark.’
Feel free to drop us a line for any clarifications.
hi everyone i am trying to access the data from hive to spark when i am running a query i can’t see the jobs is either completed or running but i am getting the output in terminal.Any suggestions plz….
Is spark is the replacement of MapReduce or YARN in future or they are complementary?
Hi Netra, Spark is not a replacement as they have their own features. Spark runs on YARN cluster as well.
Thanks Venkata!! Feel free to go through our other blog posts as well.
Being a fresher in data analytics, can I opt for learning spark before learning Hadoop ? [ I understand Java and have also worked with R. ]
Absolutely Kaustav!! You can go for Spark. Since you already to know Java, you can also go for Hadoop. You can call us at US: 1800 275 9730 (Toll Free) or India: +91 88808 62004 to discuss in detail. You can also go through this link for more information: https://www.edureka.co/big-data-hadoop-training-certification
Can we use Spark for unstructured data?
Absolutely Amitabh!! Spark can be used for Unstructured data. Either we can do some data cleansing and bring that data to Spark or we can do that in Spark itself.
Are scala and spark are also integrated in cloudera virtual machine ???
Hi Karuna, You can install Spark on CDH4(cloudera) using cloudera manager. You can refer to the following link for this: http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
Spark(Scala is included along with spark) will come integrated with CDH 5.1. You can refer to the following link: http://blog.cloudera.com/blog/2014/05/apache-spark-1-0-is-released/