Big Data and Hadoop (170 Blogs) Become a Certified Professional

Apache Flink: The Next Gen Big Data Analytics Framework For Stream And Batch Data Processing

Last updated on May 22,2019 5.3K Views

Awanish
Awanish is a Sr. Research Analyst at Edureka. He has rich expertise... Awanish is a Sr. Research Analyst at Edureka. He has rich expertise in Big Data technologies like Hadoop, Spark, Storm, Kafka, Flink. Awanish also...

Apache Flink is an open source platform for distributed stream and batch data processing. It can run on Windows, Mac OS and Linux OS. In this blog post, let’s discuss how to set up Flink cluster locally. It is similar to Spark in many ways – it has APIs for Graph and Machine learning processing like Apache Spark – but Apache Flink and Apache Spark are not exactly the same.

To set up Flink cluster, you must have java 7.x or higher installed on your system. Since I have Hadoop-2.2.0 installed at my end on CentOS ( Linux ), I have downloaded Flink package which is compatible with Hadoop 2.x. Run below command to download Flink package.

Command: wget http://archive.apache.org/dist/flink/flink-1.0.0/flink-1.0.0-bin-hadoop2-scala_2.10.tgz

Command-Apache-Flink

Untar the file to get the flink directory.

Command: tar -xvf Downloads/flink-1.0.0-bin-hadoop2-scala_2.10.tgz

Command: ls

Flink-directory-Apache-Flink

Add Flink environment variables in .bashrc file.

Command: sudo gedit .bashrc

Variables-Apache-Flink

You need to run the below command so that the changes in .bashrc file are activated

Command: source .bashrc

Now go to flink directory and start the cluster locally.

Command: cd flink-1.0.0

Command: bin/start-local.sh

Once you have started the cluster, you will be able to see a new daemon JobManager running.

Command: jps

Job-manager-Apache-Flink

Open the browser and go to http://localhost:8081 to see Apache Flink web UI.

Web-UI-Apache-Flink

Let us run a simple wordcount example using Apache Flink.

Before running the example install netcat on your system ( sudo yum install nc ).

Now in a new terminal run the below command.

Command: nc -lk 9000

Command2-Apache-Flink

Run the below given command in the flink terminal. This command runs a program which takes the streamed data as input and performs wordcount operation on that streamed data.

Command: bin/flink run examples/streaming/SocketTextStreamWordCount.jar –hostname localhost –port 9000

Streamed-data-Apache-Flink

In the web ui, you will be able to see a job in running state.

Running-jobs-Apache-Flink

Web-dashboard-Apache-Flink

Run below command in a new terminal, this will print the data streamed and processed.

Command: tail -f log/flink-*-jobmanager-*.out

Job-manager-2-Apache-Flink

Now go to the terminal where you started netcat and type something.

Terminal-Apache-Flink

The moment you press enter button on your keyword after you typed some data on netcat terminal, wordcount operation will be applied on that data and the output will be printed here ( flink’s jobmanager log ) within milliseconds!

Job-manager-log-Apache-Flink

Within a very very short span of time, data will be streamed, processed and printed.

There is much more to learn about Apache Flink. We will touch upon other Flink topics in our upcoming blog.

Got a question for us? Mention them in the comment section and we will get back to you. 

Related Posts:

Get Started with Big Data and Hadoop

Apache Falcon: New Data Management Platform for the Hadoop Ecosystem

Comments
1 Comment

Join the discussion

Browse Categories

webinar REGISTER FOR FREE WEBINAR
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP

Subscribe to our Newsletter, and get personalized recommendations.

image not found!
image not found!

Apache Flink: The Next Gen Big Data Analytics Framework For Stream And Batch Data Processing

edureka.co