Introduction to Apache MapReduce and HDFS

Karthik Mannepalli says:
Jul 1, 2016 at 4:39 am GMT
I am sorry, typo in my previous question :
If the assumption is Write Once only and Read many times, does it mean, we cannot use HDFS for transactional data?
Reply
Karthik Mannepalli says:
Jul 1, 2016 at 4:38 am GMT
If the assumption is Write Once only and Read many times, does it mean, we can use HDFS for transactional data?
Reply
Khalid says:
Jun 1, 2016 at 4:23 am GMT
I expected to see here concise discussions on HDFS components: Namenode, Datanode and Secondary Namenode, but there isn’t.
Reply
Khalid says:
Jun 1, 2016 at 4:11 am GMT
Under 5. Data Replication and Fault Tolerance, it is pointed out the default HDFS block size being 64 MB. This is in fact true with Hadoop 1.x, but since Hadoop 2.0 it’s been 128 MB. This blog was posted in May 2013 and apparently have not been updated since. So, I guess it’d be good if it was updated.
Reply
Kumar says:
Feb 4, 2015 at 7:47 am GMT
Hi edureka, I want some resume formats for hadoop developer. Please forward if ur having that. Im new to this technology.
this is my mail id: akumarhadoop@gmail.com
Thanks in advance.
Kumar
Reply
- EdurekaSupport says:
  Feb 9, 2015 at 11:55 am GMT
  Hi Kumar, the sample resumes will be shared with you, by our support team only after you enroll for our ‘Big Data & Hadoop’ course.
  Reply
  - Kushal says:
    Jun 22, 2016 at 5:20 am GMT
    Hi edureka team, Please share some resume formats for hadoop developer that relate to Big Data course. You can send me at kbvprasad@gmail.com ; Kushal.alester@gmail.com
    @EdurekaSupport:disqus
    Reply
Abhishek says:
Jan 27, 2015 at 2:07 pm GMT
Can you please elaborate point #3?
“As HDFS is designed more for batch processing rather than interactive
use by users. The emphasis is on high throughput of data access rather
than low latency of data access. HDFS focuses not so much on storing the
data but how to retrieve it at the fastest possible speed, especially
while analyzing logs. In HDFS, reading the complete data is more
important than the time taken to fetch a single record from the data.”
Reply
- EdurekaSupport says:
  Jul 6, 2015 at 6:42 am GMT
  Hi Abhishek, batch processing is a technique which helps us to process the jobs without any manual information after submitting the job with required information ( input, program name) . It keeps a track of jobs submitted and executes them in first come first serve fashion.
  In Interactivity mode, User uses an interface to interact with system. It take the inputs from the user and output the result to the user using an
  interface.
  In Hadoop, once the job is submitted it takes the inputs and stores the results from/to the location we have given in the command. Hence
  we call it as batch processing.
  Throughput is nothing but the number of processed completed in a unit amount of time whereas Latency is the delay from the time we submit the job and get the desired outcome.
  In Hadoop, we concentrate on increasing the throughput than decreasing the latency while processing a job as we need to retrieve the output at fast possible speed irrespective of size of data.
  Hope this helps!
  Reply
  - Karthik Mannepalli says:
    Jul 1, 2016 at 4:50 am GMT
    @EdurekaSupport – Doesn’t increasing throughput reduce the latency? Both will go hand in hand right? Please correct me if I am wrong
    Reply
Dr M. NAGABHUSHANA RAO says:
Jan 22, 2015 at 8:50 am GMT
Nice to see edureka blog, edureak is trying to spread knowledge on big data more. thank’s to it’s team for hardworking.
Reply
- EdurekaSupport says:
  Jan 22, 2015 at 11:43 am GMT
  Thanks a lot, Dr. Rao. Please feel free to go through our other blog posts as well.
  Reply
Deepak Sharma says:
Jan 18, 2015 at 12:44 pm GMT
Could you please elaborate on point #7 a bit more?
and also the line “Apache HDFS provides interfaces for applications to relocate themselves nearer to where the data is located”
Reply
- EdurekaSupport says:
  Jan 19, 2015 at 8:13 am GMT
  Hi Deepak,
  Let us assume that we have a submitted a job and now jobtracker need to choose to which tasktracker node the job need to be allocated.
  While assigning this job to the tasktracker, the jobtracker first finds out on which nodes the data resides and checks whether if that nodes are available to run the job/task. If yes, then it will assign the task to that
  tasktracker nodes and then transfer the computed results to the other
  nodes whichever are required. If not, it will assign that task to the
  tasktracker nodes which are nearest to the nodes where the data resides. The reason why jobtracker tries to assign to the nodes
  where the data resides because as the data in HDFS will be huge, it
  may consume more amount of time due to network congestion/any other issues just to transfer the data instead of actual computation (the
  actual thing which is important/required). Hence it is better to move the computed results ( less data) instead of the actual data ( huge data).
  Hope this help!!!
  Reply
Sushobhit Rajan says:
Jul 12, 2014 at 3:50 pm GMT
Nicely Explained
Reply
- EdurekaSupport says:
  Jul 24, 2014 at 1:45 pm GMT
  Thanks Sushobhit!!! Feel free to go through our other blog posts as well.
  Reply
  - srini says:
    Mar 29, 2019 at 5:37 am GMT
    Hi Team,
    Can you please share the sample Hadooop Resumes, i laredy enrolled, Please share to my mail ID : srenivas35@gmail.com
    Reply
Gaurav Dighe says:
Jan 23, 2014 at 12:05 pm GMT
Very nice information information about Hadoop. Keep up the good work.
Hope to see some more topics on DataFlow, Map Reduce.
Reply

Introduction to Apache MapReduce and HDFS

What is HDFS (Hadoop Distributed File System)?

Assumptions and Goals/Objectives behind HDFS:

1. Large Data Sets:

2. Write Once, Read Many Model:

3. Streaming Data Access:

4. Commodity Hardware:

5. Data Replication and Fault Tolerance:

6. High Throughput:

7. Moving Computation is better than Moving Data:

8. File System Namespace:

Recommended videos for you

Logistic Regression In Data Science

Improve Customer Service With Big Data

Secure Your Hadoop Cluster With Kerberos

Big Data Tutorial – Get Started With Big Data And Hadoop

Reduce Side Joins With MapReduce

Spark SQL | Apache Spark

Advanced Security In Hadoop Cluster

Big Data – XML Parsing With MapReduce

Filtering on HBase Using MapReduce Filtering Pattern

Apache Spark For Faster Batch Processing

What is Big Data and Why Learn Hadoop!!!

Top Hadoop Interview Questions and Answers – Ace Your Interview

Distributed Cache With MapReduce

When not to use Hadoop

Apache Spark Will Replace Hadoop ! Know Why

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Introduction to Apache Solr-1

Is It The Right Time For Me To Learn Hadoop ? Find out.

Hadoop Tutorial – A Complete Tutorial For Hadoop

HBase Tutorial – A Complete Guide On Apache HBase

Recommended blogs for you

Splunk Use Case: Domino’s Success Story

Big Data In Healthcare: How Hadoop Is Revolutionizing Healthcare Analytics

What is Big Data? – A Beginner’s Guide to the World of Big Data

Introduction to Pig

Setting Up A Multi Node Cluster In Hadoop 2.X

Apache Spark Lighting up the Big Data World

How To Install MongoDB On Windows Operating System?

Hadoop Admin Responsibilities

Big Data Applications in Healthcare

Apache Flink: The Next Gen Big Data Analytics Framework For Stream And Batch Data Processing

Hive Tutorial – Hive Architecture and NASA Case Study

Why should a Software Testing Engineer learn Big Data and Hadoop Ecosystem Technologies?

A Day In The Life Of A Hadoop Administrator

Increasing Demand for ‘ Hadoop and NoSQL Skills ’

7 Ways Big Data Training Can Change Your Organization

Hadoop Components that you Need to know about

How to become a Hadoop Developer? Job Trends and Salary

Real Time Storm Project

Is Big Data the Right Move for You?

Spark Accumulators Explained: Apache Spark

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Introduction to Apache MapReduce and HDFS