Is Java alone Sufficient for Hadoop?

Question

I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, I would like a list of things I would need to pick up. Of course, I would need to learn the core language, but what else?

I did Google around for this and this could be seen as a possible duplicate of "I want to learn Java. Show me how?" but it's not. Java is a huge programming language with lots, of libraries and what I need to learn will depend largely on what I am using Hadoop for. But I suppose it is possible to say something like don't bother learning this. This will be quite useful too.

Frankie · Answer

In my day job, I've just spent some time helping a C++ person to pick up enough Java to use some Java libraries via JNI (Java Native Interface) and then shared memory into their primarily C++ application. Here are some of the key things I noticed:You cannot manage for anything beyond a toy project without an IDE. The very first thing you should do is download a popular Java IDE (Eclipse is a fine choice, but there are also alternatives including Netbeans and IntelliJ). Do not be tempted to try and manage with vi / emacs and javac / make.&#160;You will be living in a cave and not realising it. Once you're up to speed with even basic IDE functions you will be literally dozens of times more poductive than without an IDE.Learn how to layout a simple project structure and packages. There will be simple walkthroughs of how to do this on the Eclipse site or elsewhere. Never put anything into the default package.Java has a type system whereby the reference and primitive types are relatively separate for historic / performance reasons.Java's generics are&#160;not&#160;the same as C++ templates. Read up on "type erasure".You may wish to understand how Java's GC works. Just google "mark and sweep" - at first, you can just settle for the naivest mental model and then learn the details of how a modern production GC would do it later.The core of the Collections API should be learned without delay. Map / HashMap, List / ArrayList & LinkedList and Set should be enough to get going.Learn modern Java concurrency. Thread is an assembly-language level primitive compared to some of the cool stuff in java.util.concurrent. Learn ConcurrentHashMap, Atomic*, Lock, Condition, CountDownLatch, BlockingQueue and the threadpools from Executors. Good books here are those by Brian Goetz and Doug Lea.As soon as you want to use 3rd party libraries, you'll need to learn how the classpath works. It's not rocket science, but it is a bit verbose.If you're a low-level C++ guy, then you may find some of this interesting also:Java has virtual dispatch by default. The keyword static on a Java method is used to indicate a class method. private Java methods use invokespecial dispatch, which is a dispatch onto the exact type in use.On an Oracle VM at least, objects comprise two machine words of header (the mark word and the class word). The mark word is a bunch of flags the VM uses - notably for thread synchronization. The class word you can think of as a pointer to the VM's representation of the Class object (which is where the vtables for methods live). Following the class word are the member fields of the instance of the object.Java .class files are an intermediate language, and not really that similar to x86 object code. In particular there are lots more useful tools for .class files (including the javap disassembler which ships with the JVM)The Java equivalent of the symbol table is called the Constant Pool. It's typed and it has a lot of information in it - arguably more than the x86 object code equivalent.Java virtual method dispatch consists of looking up the correct method to be called in the Constant Pool and then converting that to an offset into a vtable. Then walking up the class hierarchy until a not-null value is found at that vtable offset.Java starts off interpreted and then goes compiled (for Oracle and some other VMs anyway). The switch to compiled mode is done method-by-method on a as-need basis. When benchmarking and perf tuning you need to make sure that you've warmed the system up before you start, and that you should typically profile at the method level to start with. The optimizations that are made can be quite aggressive / optimistic (with a check and a fallback if the assumptions are violated) - so perf tuning is a bit of an art.Hope this helps!Check out our Java Certification Program and learn more about it.Thanks!

Is Java alone Sufficient for Hadoop

Your comment on this question:

1 answer to this question.

Your answer

Your comment on this answer:

Related Questions In Big Data Hadoop

Is java necessary for Hadoop?

Which programming language is most preferable for working Hadoop? Java and Python? and which one is most easy to learn?

Is there a good online tutorial for Hadoop development on a Windows 7 machine?

What Distributed Cache is actually used for in Hadoop?

Moving files in Hadoop using the Java API?

Hadoop giving java.io.IOException, in mkdir Java code.

Hadoop Mapreduce word count Program

hadoop.mapred vs hadoop.mapreduce?

How does Hadoop/Spark is used for building large analytics report?

Which is the Real Time Monitoring tool/API for Hadoop?

Subscribe to our Newsletter, and get personalized recommendations.

TRENDING CERTIFICATION COURSES

TRENDING MASTERS COURSES

COMPANY

WORK WITH US

DOWNLOAD APP

CATEGORIES

CATEGORIES

TRENDING BLOG ARTICLES

TRENDING BLOG ARTICLES