Drilling Down On Apache Drill, the New-Age Query Engine

Become a Certified Professional

Apache Drill is the industry’s first schema-free SQL Engine. Drill isn’t the world’s first query engine, but it’s the first that strikes the fine balance between flexibility and speed. Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.

It can integrate with several data sources like Hive, HBase, MongoDB, file system, RDBMS. Also, input formats like Avro, CSV, TSV, PSV, Parquet, Hadoop Sequence files, and many others can be used in Drill with ease.

Why Apache Drill?

The biggest advantage of Apache Drill is that it can discover the schema on the fly as you query any data. Moreover, it can work with your BI tools like Tableau, Qlikview, MicroStrategy etc for better analytics.

Here’s a quote from an industry analyst that summarizes the value of Apache Drill:

“Drill isn’t just about SQL-on-Hadoop. It’s about SQL-on-pretty-much-anything, immediately, and without formality.”

– Andrew Burst, Gigaom Research, January 2015

Drillbit is Apache Drill’s daemon that runs on each node in the cluster. It uses ZooKeeper for all the communication in the cluster and maintaisn cluster membership. It is responsible for accepting requests from the client, processing the queries, and returning results to the client. The drillbit which receives the request from the client is called ‘foreman’. It generates the execution plan, the execution fragments are sent to other drillbits running in the cluster.

Drillbits-Apache-Drill

One more advantage is that the installation and setup of drill is pretty simple. Let us learn how to install Apache Drill.

The first step is to download the drill package.

Command: wget https://archive.apache.org/dist/drill/drill-1.5.0/apache-drill-1.5.0.tar.gz

Command: tar -xvf apache-drill-1.5.0.tar.gz

Command: ls

Next, set the environment variables in .bashrc file.

Command: sudo gedit .bashrc

export DRILL_HOME=/home/edureka/apache-drill-1.5.0

export PATH=$PATH:/home/edureka/apache-drill-1.5.0/bin

This command will update the changes:

Command: source .bashrc

Now go to drill conf directory and edit drill-override.conf file with cluster id and zookeeper host & port, we will run it on a local cluster.

Command: cd apache-drill-1.5.0

Command: sudo gedit conf/drill-override.conf

By default, DRILL_MAX_DIRECT_MEMORY will be 8 GB in drill-env.sh, and we need to keep it according to the memory we have.

Command: sudo gedit conf/drill-env.sh

To install drill only in a single node, you can use embedded mode, where it will run locally. It will automatically start the drillbit service when you run this command.

Command: ./bin/drill-embedded

You can run a simple query to check the installation.

Command: select * from sys.options WHERE type = ‘SYSTEM’ and name like ‘security%’;

To check the web console of Apache Drill, we need to go to localhost:8047 in the web browser.

You can run your query from the Query tab as well.

To run drill in distributed mode, you need to edit cluster ID and add ZooKeeper information in drill-override.conf as below.

Then we need to start ZooKeeper service on each node. After that you have to start the drillbit service on each node with this command.

Command: ./bin/drillbit.sh start

Command: jps

Now, we use below command to start the drill shell.

Now, we can execute our queries on the cluster in the distributed mode.

This is the first blog post in a two-part Apache Drill blog series. The second blog in the series is coming soon.

Got a question for us? Mention them in the comment section and we will get back to you.

Related Posts:

Get Started with Apache Spark and Scala

Get Started with Big Data and Hadoop

Drilling Down On Apache Drill, the New-Age Query Engine

Why Apache Drill?

Recommended videos for you

What is Big Data and Why Learn Hadoop!!!

Bulk Loading Into HBase With MapReduce

New-Age Search through Apache Solr

Power of Python With BigData

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

Big Data Tutorial – Get Started With Big Data And Hadoop

Real-Time Analytics with Apache Storm

Logistic Regression In Data Science

Spark SQL | Apache Spark

Ways to Succeed with Hadoop in 2015

Hadoop Tutorial – A Complete Tutorial For Hadoop

5 Scenarios: When To Use & When Not to Use Hadoop

Advanced Security In Hadoop Cluster

Tailored Big Data Solutions Using MapReduce Design Patterns

Top Hadoop Interview Questions and Answers – Ace Your Interview

When not to use Hadoop

Big Data Processing With Apache Spark

Is Hadoop A Necessity For Data Science?

Administer Hadoop Cluster

MapReduce Tutorial – All You Need To Know About MapReduce

Recommended blogs for you

Apache Spark with Hadoop – Why it Matters?

Splunk Careers – Your Pathway To Hot Big Data Jobs

Apache Hadoop HDFS Architecture

What are the Key Terminologies in Hadoop Security?

Hadoop and Java Job Trends

How Predictive Analysis can Help you Combat Employee Attrition

Apache Falcon: New Data Management Platform For The Hadoop Ecosystem

Install Puppet – Install Puppet in Four Simple Steps

How to become a Hadoop Administrator?

What are Kafka Streams and How are they implemented?

Apache Sqoop Tutorial – Import/Export Data Between HDFS and RDBMS

Scala Functional Programming

Real Time Storm Project

Top 5 Hadoop Admin Tasks

All You Need To Know About Splunk

Top Hadoop Interview Questions To Prepare In 2024 – Apache Hive

5 Reasons When to and When not to use Hadoop

Hadoop Components that you Need to know about

Rio Olympics 2016: Big Data powers the biggest sporting spectacle of the year!

Introduction to Hadoop 2.0 and Advantages of Hadoop 2.0 over 1.0

Join the discussion Cancel reply

Trending Courses in Big Data

Azure Data Engineer Certification (DP-203) Co ...

PySpark Course Online Training

Big Data Hadoop Certification Training Course

Apache Spark and Scala Certification Training ...

Apache Kafka Certification Training Course

Splunk Certification Training: Power User and ...

Leveraging Big Data for Business Intelligence ...

ELK Stack Training & Certification

Apache Solr Certification Training

Apache Storm Certification Training

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Drilling Down On Apache Drill, the New-Age Query Engine