What is CCA-175 Spark and Hadoop Developer Certification?

Last updated on Jul 21,2020 6.5K Views
Tech Enthusiast working as a Research Analyst at Edureka. Curious about learning... Tech Enthusiast working as a Research Analyst at Edureka. Curious about learning more about Data Science and Big-Data Hadoop.

What is CCA-175 Spark and Hadoop Developer Certification?

edureka.co

CCA-175 Spark and Hadoop Developer Certification is the emblem of Precision, Proficiency, and Perfection in Apache Hadoop Development. There is a lofty demand for CCA-175 Certified Developers in the current IT-industry. Let us understand more about this.

What is CCA-175 Spark and Hadoop Developer Certification?

CCA-175 is basically an Apache Hadoop with Apache Spark and Scala Training and Certification Program. The major objective of this program is to help Hadoop developers to establish a formidable command, over the current traditional Hadoop Development protocols with advanced tools and operational procedures.

The program deals with the following segments:

Apache Hadoop

Apache Hadoop is a powerful software framework designed to provide distributed storage and processing system. It is open-source software designed an deployed by the Apache Foundation.

Apache Spark

Apache Spark is a Lightning Fast Data Processing Tool used on top of Apache Hadoop Distributed File System(HDFS). It is an Open-Source Data Processing tool developed by Apache Foundation.

Scala Programming Language

Scala is an advanced programmed language developed using Java Programming language. It is used to execute data processing commands in Spark on top of Hadoop.

 

CCA-175 Spark and Hadoop Developer Certification Exam Format

Let me give you a brief understanding of CCA-175 Spark and Hadoop Developer Certification Exam.

The total number of questions asked in this particular exam will be in between 8 to 12. The questions will be purely performance-based (hands-on) tasks on Cloudera Enterprise cluster. 

The total time limit provided for this particular exam is exactly 120 minutes.

Any Hadoop Developer attending this Examination is expected to score at least 70% to clear this examination and get certified.

The current pricing of this examination is US$ 295$ to 300$. (21,000 rupees INR to 22,000 INR)

 

Sample Question from CCA-175 exam

Solution:

Step 1:

sqoop import 
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" 
--username retail_dba 
--password cloudera 
--table orders 
--compress 
--compression-codec org.apache.hadoop.io.compress.SnappyCodec 
--target-dir /user/cloudera/problem1/orders 
--as-avrodatafile;

Step 2:

sqoop import 
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" 
--username retail_dba 
--password cloudera 
--table order_items 
--compress 
--compression-codec org.apache.hadoop.io.compress.SnappyCodec 
--target-dir /user/cloudera/problem1/order-items 
--as-avrodatafile;

Step3:

//start a new terminal and fire this command.
spark-shell
//now, continue executing these commands below.
import com.databricks.spark.avro._;
var ordersDF = sqlContext.read.avro("/user/cloudera/problem1/orders");
var orderItemDF = sqlContext.read.avro("/user/cloudera/problem1/order-items");

Step 4:

var joinedOrderDataDF = ordersDF.join(orderItemDF,ordersDF("order_id")===orderItemDF("order_item_order_id"))

Step 4a:

import org.apache.spark.sql.functions._;
var dataFrameResult = joinedOrderDataDF.groupBy(to_date(from_unixtime(col("order_date")/1000)).alias("order_formatted_date"),col("order_status")).agg(round(sum("order_item_subtotal"),2).alias("total_amount"),countDistinct("order_id").alias("total_orders")).orderBy(col("order_formatted_date").desc,col("order_status"),col("total_amount").desc,col("total_orders"));
dataFrameResult.show();

Step 4b:

joinedOrderDataDF.registerTempTable("order_joined");
var sqlResult = sqlContext.sql("select to_date(from_unixtime(cast(order_date/1000 as bigint))) as order_formatted_date, order_status, cast(sum(order_item_subtotal) as DECIMAL (10,2)) as total_amount, count(distinct(order_id)) as total_orders from order_joined group by to_date(from_unixtime(cast(order_date/1000 as bigint))), order_status order by order_formatted_date desc,order_status,total_amount desc, total_orders");
sqlResult.show();

Step 4c:

var comByKeyResult =joinedOrderDataDF.map(x=> ((x(1).toString,x(3).toString),(x(8).toString.toFloat,x(0).toString))).combineByKey((x:(Float, String))=>(x._1,Set(x._2)),(x:(Float,Set[String]),y:(Float,String))=>(x._1 + y._1,x._2+y._2),(x:(Float,Set[String]),y:(Float,Set[String]))=>(x._1+y._1,x._2++y._2)).map(x=> (x._1._1,x._1._2,x._2._1,x._2._2.size)).toDF().orderBy(col("_1").desc,col("_2"),col("_3").desc,col("_4"));
comByKeyResult.show();

Step 5:

sqlContext.setConf("spark.sql.parquet.compression.codec","gzip");
dataFrameResult.write.parquet("/user/cloudera/problem1/result4a-gzip");
sqlResult.write.parquet("/user/cloudera/problem1/result4b-gzip");
comByKeyResult.write.parquet("/user/cloudera/problem1/result4c-gzip");

Step 6:

sqlContext.setConf("spark.sql.parquet.compression.codec","snappy");
dataFrameResult.write.parquet("/user/cloudera/problem1/result4a-snappy");
sqlResult.write.parquet("/user/cloudera/problem1/result4b-snappy");
comByKeyResult.write.parquet("/user/cloudera/problem1/result4c-snappy");

Step 7:

dataFrameResult.map(x=> x(0) + "," + x(1) + "," + x(2) + "," + x(3)).saveAsTextFile("/user/cloudera/problem1/result4a-csv")
sqlResult.map(x=> x(0) + "," + x(1) + "," + x(2) + "," + x(3)).saveAsTextFile("/user/cloudera/problem1/result4b-csv")
comByKeyResult.map(x=> x(0) + "," + x(1) + "," + x(2) + "," + x(3)).saveAsTextFile("/user/cloudera/problem1/result4c-csv")

Step 8a:

Fire up a new terminal and type in this command
mysql -h localhost -u retail_dba -p
//default password is:
cloudera 

Step 8b:

create table retail_db.result(order_date varchar(255) not null,order_status varchar(255) not null, total_orders int, total_amount numeric, constraint pk_order_result primary key (order_date,order_status)); 

Step 8c:

sqoop export 
--table result 
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" 
--username retail_dba 
--password cloudera 
--export-dir "/user/cloudera/problem1/result4a-csv" 
--columns "order_date,order_status,total_amount,total_orders"

Let us now get into the CCA-175 Spark and Certification offered by Edureka.

 

Edureka CCA-175 Certification

Apache Spark and Scala Certification Training is designed to prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175). You will gain in-depth knowledge on Apache Spark and the Spark Ecosystem,

Edureka Training includes RDD, Spark SQL, Spark MLlib and Spark Streaming. You will get comprehensive knowledge on Scala Programming language, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka.

Required Skills for CCA-175

Data Ingestion:

Transform Stage Store:

Data Analysis:

Salary of a Hadoop Developer

Hadoop Developer is one of the most highly paid profiles in the current IT Industry. The Salary trends for a Hadoop developer have increased decently since last few years. Let us check out the raise in the salary trends for Hadoop developers compared to other profiles.

Let us now discuss the salary trends for a Hadoop Developer in different countries based on the experience. Firstly, let us consider the United States of America. Based On Experience, the Big Data professionals working in the domains are offered with respective salaries as described below.

The freshers or entry-level salaries start at 75,000 USD to 80,000 USD while the candidates with considerable experience are being offered 125,000 USD to 150,000 USD per annum.

Followed by the United Nations of America, we will now discuss the Hadoop Developer Salary Trends in India.

The Salary trends for a Hadoop Developer in India for the freshers or entry-level salaries start at 400,00 INR to 500,000 INR while the candidates with considerable experience are being offered 4,500,000  to 5,000,000 .

Next, let us take a look at the salary trends for Hadoop Developers in the United Kingdom.

The Salary trends for a Hadoop Developer in India for the freshers or entry-level salaries start at 25,000 Pounds to 30,000 Pounds and on the other hand, for an experienced candidate, the salary offered is 80,000 Pounds to 90,000 Pounds.

So, with this, we come to an end of this “CCA-175 Spark and Hadoop Developer Certification” article. I hope we sparked a little light upon your knowledge about Spark, Scala and Hadoop along with CCA-175 certification features and its importance.

This article based on Apache Spark and Scala Certification Training is designed to prepare you for the Cloudera Hadoop and Spark Developer Certification Exam (CCA175). You will get in-depth knowledge on Apache Spark and the Spark Ecosystem, which includes Spark RDD, Spark SQL, Spark MLlib and Spark Streaming. You will get comprehensive knowledge on Scala Programming language, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka.

BROWSE COURSES