How should i prepare for CCA 175 Exam?

0 votes
I want to get the Cloudera certification for their Hadoop and Spark Dev exam, ie. CCA 175.

I have roughly 2 months of time to prepare.

Any thoughts on how to approach this?
May 10, 2018 in Career Counselling by Data_Nerd
• 2,340 points
1,067 views

3 answers to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
+1 vote

Edureka has one of the most detailed and comprehensive course on Apache Spark and Hadoop online. But before going for any online training just go through this to have a basic grasp of the technology and the fundamentals

To learn Spark and Hadoop, you need to start with the basics, i.e Big Data and emergence of Hadoop.

Moving forward you need to focus on the main reason Hadoop became popular. It was because of HDFS (Hadoop Distributed File System).

Further moving on take a deep dive into Hadoop Ecosystem and learn various tools inside Hadoop Ecosystem with their functionalities. So, that you will learn how to create a tailored solution according to your requirements

The main components of HDFS are NameNode and DataNode.

NameNode

It is the master daemon that maintains and manages the DataNodes (slave nodes). It records the metadata of all the files stored in the cluster, e.g. location of blocks stored, the size of the files, permissions, hierarchy, etc. It records each and every change that takes place to the file system metadata.

For example, if a file is deleted in HDFS, the NameNode will immediately record this in the EditLog. It regularly receives a Heartbeat and a block report from all the DataNodes in the cluster to ensure that the DataNodes are live. It keeps a record of all the blocks in HDFS and in which nodes these blocks are stored.

DataNode

These are slave daemons which runs on each slave machine. The actual data is stored on DataNodes. They are responsible for serving read and write requests from the clients. They are also responsible for creating blocks, deleting blocks and replicating the same based on the decisions taken by the NameNode.

For processing, we use YARN(Yet Another Resource Negotiator). The components of YARN are ResourceManager and NodeManager.

ResourceManager

It is a cluster level (one for each cluster) component and runs on the master machine. It manages resources and schedule applications running on top of YARN.

NodeManager

It is a node level component (one on each node) and runs on each slave machine. It is responsible for managing containers and monitoring resource utilization in each container. It also keeps track of node health and log management. It continuously communicates with ResourceManager to remain up-to-date.

So, you can perform parallel processing on HDFS using MapReduce.

Next comes the concepts of PigHive and Hbase.

Moving on to Spark you need to learn about Scala, as Spark-shell by default runs on Scala.

  • Scala is a general-purpose programming language, which is aimed to implement common programming patterns in a concise, elegant, and type-safe way
  • It supports both object-oriented and functional programming styles,thus helping programmers to be more productive.

Further moving forward, you need to learn about RDDs , which are the basic building blocks for any spark code.

  • RDD(Resilient Distributed Dataset) is a distributed memory abstraction which lets programmers perform in-memory computations on large clusters in a fault-tolerant manner.
  • They are read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
  • RDDs can be created from multiple data sources e.g. Scala collection, local file system, Hadoop, Amazon S3, HBase table etc.

SparkSQL is another main component of Spark which is very important to process structured data in an sql style format.

Next comes the Machine Learning library of Spark, ie. MLlibHow it is used to perform various ML algorithms through Spark. (Regressions and K-means Clustering)

Flume also plays an important role in the process of Streaming data and so does Kafka.

Spark itself has the ability to process and Stream data, which is done through Spark Streaming using DStreams.

Edureka’s Apache Spark and Scala Certification training offers a detailed course specifically designed for the CCA175 exam, covering all the above mentioned topics.

Edureka provides a good list of Spark Videos. I would recommend you go through this Edureka Spark Playlist as well as the Spark Tutorial

There are a lot of Hadoop Videos too.

Hope this helps.

answered May 10, 2018 by kurt_cobain
• 9,260 points
0 votes

CCA 175 is a very important exam for the people who want to excel in Hadoop and Spark. Since you have less time and importance of this exam is very high, you should go for Edureka's course on Hadoop and Spark. They have covered everything and their instructors are well versed with the topics, with ample of examples for better understanding.

answered Jun 25, 2018 by zombie
• 3,690 points
0 votes

If you have to prepare for Cloudera CCA175 exam and need to get help, by then DumpsStar is an extraordinary stage for you. You can pass Cloudera exam easily by getting Cloudera CCA175 exam dumps in actuality that these CCA175 exam dumps are offered by DumpsStar are affirmed by Cloudera specialists. Its self-appraisal mechanical assembly is shocking, which evaluate your performance and pointed out weak areas. DumpsStar is the best webpage forgiving on the web preparing material to Cloudera CCA175 exam.

You can find related material of CCA175 exam on the DumpsStar that will help you with clearing your Cloudera CCA175 exam on the vital undertaking. DumpsStar is the best source where you can get all the available online exam material. You can without quite a bit of a stretch get Cloudera CCA175 exam dumps and can pass your CCA175 exam with comfort. I authorize to at first get a look at DumpsStar. This gainful resource will help you with understanding the focuses and honest to goodness exam configuration attached into the exam and where to focus your essentialness on. DumpsStar look at material for the CCA175 exam has made things incredibly less requesting.

answered Mar 14 by erichamm
• 140 points

Related Questions In Career Counselling

0 votes
2 answers

How much salary should I ask for?

Hey, This is a crucial question to ask ...READ MORE

answered May 16 in Career Counselling by Gitika
• 13,670 points
30 views
0 votes
2 answers

How do I know linux administrator role right for me?

If you like working with Operating Systems, ...READ MORE

answered Apr 3 in Career Counselling by Prateek
42 views
0 votes
3 answers

How do I know Digital Marketing is for me?

It depends on where you want to ...READ MORE

answered Apr 25 in Career Counselling by Jaideep
• 370 points

reshown Apr 25 by Vardhan 44 views
0 votes
3 answers

After how many years of experience should I get into MBA?

There's no exact answer to the number ...READ MORE

answered Feb 18 in Career Counselling by Shri
72 views
0 votes
1 answer

How to prepare for technical analyst job profile?

Hey, The job profile you want to opt ...READ MORE

answered Apr 23 in Career Counselling by Gitika
• 13,670 points
17 views
0 votes
0 answers
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,310 points
1,824 views
0 votes
1 answer

hadoop.mapred vs hadoop.mapreduce?

org.apache.hadoop.mapred is the Old API  org.apache.hadoop.mapreduce is the ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 9,310 points
152 views
0 votes
10 answers

hadoop fs -put command?

copy command can be used to copy files ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Sujay
9,016 views

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.