What are the prerequisites to learn Hadoop in java perspective?

0 votes
I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, I would like a list of things I would need to pick up. Of course, I would need to learn the core language, but what else?

I did Google around for this and this could be seen as a possible duplicate of "I want to learn Java. Show me how?" but it's not. Java is a huge programming language with lots, of libraries and what I need to learn will depend largely on what I am using Hadoop for. But I suppose it is possible to say something like don't bother learning this. This will be quite useful too.
Oct 11, 2018 in Big Data Hadoop by Neha
• 6,280 points

1 answer to this question.

0 votes

In my day job, I've just spent some time helping a C++ person to pick up enough Java to use some Java libraries via JNI (Java Native Interface) and then shared memory into their primarily C++ application. Here are some of the key things I noticed:

  1. You cannot manage for anything beyond a toy project without an IDE. The very first thing you should do is download a popular Java IDE (Eclipse is a fine choice, but there are also alternatives including Netbeans and IntelliJ). Do not be tempted to try and manage with vi / emacs and javac / make. You will be living in a cave and not realising it. Once you're up to speed with even basic IDE functions you will be literally dozens of times more poductive than without an IDE.
  2. Learn how to layout a simple project structure and packages. There will be simple walkthroughs of how to do this on the Eclipse site or elsewhere. Never put anything into the default package.
  3. Java has a type system whereby the reference and primitive types are relatively separate for historic / performance reasons.
  4. Java's generics are not the same as C++ templates. Read up on "type erasure".
  5. You may wish to understand how Java's GC works. Just google "mark and sweep" - at first, you can just settle for the naivest mental model and then learn the details of how a modern production GC would do it later.
  6. The core of the Collections API should be learned without delay. Map / HashMap, List / ArrayList & LinkedList and Set should be enough to get going.
  7. Learn modern Java concurrency. Thread is an assembly-language level primitive compared to some of the cool stuff in java.util.concurrent. Learn ConcurrentHashMap, Atomic*, Lock, Condition, CountDownLatch, BlockingQueue and the threadpools from Executors. Good books here are those by Brian Goetz and Doug Lea.
  8. As soon as you want to use 3rd party libraries, you'll need to learn how the classpath works. It's not rocket science, but it is a bit verbose.

If you're a low-level C++ guy, then you may find some of this interesting also:

  1. Java has virtual dispatch by default. The keyword static on a Java method is used to indicate a class method. private Java methods use invokespecial dispatch, which is a dispatch onto the exact type in use.
  2. On an Oracle VM at least, objects comprise two machine words of header (the mark word and the class word). The mark word is a bunch of flags the VM uses - notably for thread synchronization. The class word you can think of as a pointer to the VM's representation of the Class object (which is where the vtables for methods live). Following the class word are the member fields of the instance of the object.
  3. Java .class files are an intermediate language, and not really that similar to x86 object code. In particular there are lots more useful tools for .class files (including the javap disassembler which ships with the JVM)
  4. The Java equivalent of the symbol table is called the Constant Pool. It's typed and it has a lot of information in it - arguably more than the x86 object code equivalent.
  5. Java virtual method dispatch consists of looking up the correct method to be called in the Constant Pool and then converting that to an offset into a vtable. Then walking up the class hierarchy until a not-null value is found at that vtable offset.
  6. Java starts off interpreted and then goes compiled (for Oracle and some other VMs anyway). The switch to compiled mode is done method-by-method on a as-need basis. When benchmarking and perf tuning you need to make sure that you've warmed the system up before you start, and that you should typically profile at the method level to start with. The optimizations that are made can be quite aggressive / optimistic (with a check and a fallback if the assumptions are violated) - so perf tuning is a bit of an art

    I hope this helps you :)
answered Oct 11, 2018 by Frankie
• 9,810 points

Related Questions In Big Data Hadoop

0 votes
11 answers
0 votes
1 answer

What are the different ways to load data from Hadoop to Azure Data Lake?

I would recommend you to go through ...READ MORE

answered Apr 18, 2018 in Big Data Hadoop by coldcode
• 2,020 points
0 votes
1 answer

What are the site-specific configuration files in Hadoop?

There are different site specific configuration to ...READ MORE

answered Apr 9 in Big Data Hadoop by Gitika
• 25,360 points
0 votes
1 answer

What are the relational operators available related to loading and storing in pig language?

Hey, For loading data and storing it into ...READ MORE

answered May 3 in Big Data Hadoop by Gitika
• 25,360 points
0 votes
1 answer

Moving files in Hadoop using the Java API?

I would recommend you to use FileSystem.rename(). ...READ MORE

answered Apr 15, 2018 in Big Data Hadoop by Shubham
• 13,310 points
0 votes
1 answer

Hadoop giving java.io.IOException, in mkdir Java code.

I am not sure about the issue. ...READ MORE

answered May 3, 2018 in Big Data Hadoop by Shubham
• 13,310 points
0 votes
1 answer

Hadoop Mapreduce word count Program

Firstly you need to understand the concept ...READ MORE

answered Mar 16, 2018 in Data Analytics by nitinrawat895
• 10,730 points
0 votes
10 answers

hadoop fs -put command?

put syntax: put <localSrc> <dest> copy syntax: copyFr ...READ MORE

answered Dec 7, 2018 in Big Data Hadoop by Aditya
0 votes
1 answer

What are the problems in creating volume in mapr hadoop?

It sounds like the user you are ...READ MORE

answered Aug 21, 2018 in Big Data Hadoop by Frankie
• 9,810 points
+1 vote
1 answer

What is the technique to know the Default scheduler in hadoop?

Default scheduler in hadoop is JobQueueTaskScheduler, which is ...READ MORE

answered Oct 30, 2018 in Big Data Hadoop by Frankie
• 9,810 points