Drilling Down On Apache Drill, the New-Age Query Engine

Become a Certified Professional

Apache Drill is the industry’s first schema-free SQL Engine. Drill isn’t the world’s first query engine, but it’s the first that strikes the fine balance between flexibility and speed. Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require.

It can integrate with several data sources like Hive, HBase, MongoDB, file system, RDBMS. Also, input formats like Avro, CSV, TSV, PSV, Parquet, Hadoop Sequence files, and many others can be used in Drill with ease.

Why Apache Drill?

The biggest advantage of Apache Drill is that it can discover the schema on the fly as you query any data. Moreover, it can work with your BI tools like Tableau, Qlikview, MicroStrategy etc for better analytics.

Here’s a quote from an industry analyst that summarizes the value of Apache Drill:

“Drill isn’t just about SQL-on-Hadoop. It’s about SQL-on-pretty-much-anything, immediately, and without formality.”

– Andrew Burst, Gigaom Research, January 2015

Drillbit is Apache Drill’s daemon that runs on each node in the cluster. It uses ZooKeeper for all the communication in the cluster and maintaisn cluster membership. It is responsible for accepting requests from the client, processing the queries, and returning results to the client. The drillbit which receives the request from the client is called ‘foreman’. It generates the execution plan, the execution fragments are sent to other drillbits running in the cluster.

One more advantage is that the installation and setup of drill is pretty simple. Let us learn how to install Apache Drill.

The first step is to download the drill package.

Command: wget https://archive.apache.org/dist/drill/drill-1.5.0/apache-drill-1.5.0.tar.gz

Command: tar -xvf apache-drill-1.5.0.tar.gz

Command: ls

Next, set the environment variables in .bashrc file.

Command: sudo gedit .bashrc

export DRILL_HOME=/home/edureka/apache-drill-1.5.0

export PATH=$PATH:/home/edureka/apache-drill-1.5.0/bin

This command will update the changes:

Command: source .bashrc

Now go to drill conf directory and edit drill-override.conf file with cluster id and zookeeper host & port, we will run it on a local cluster.

Command: cd apache-drill-1.5.0

Command: sudo gedit conf/drill-override.conf

By default, DRILL_MAX_DIRECT_MEMORY will be 8 GB in drill-env.sh, and we need to keep it according to the memory we have.

Command: sudo gedit conf/drill-env.sh

To install drill only in a single node, you can use embedded mode, where it will run locally. It will automatically start the drillbit service when you run this command.

Command: ./bin/drill-embedded

You can run a simple query to check the installation.

Command: select * from sys.options WHERE type = ‘SYSTEM’ and name like ‘security%’;

To check the web console of Apache Drill, we need to go to localhost:8047 in the web browser.

You can run your query from the Query tab as well.

To run drill in distributed mode, you need to edit cluster ID and add ZooKeeper information in drill-override.conf as below.

Then we need to start ZooKeeper service on each node. After that you have to start the drillbit service on each node with this command.

Command: ./bin/drillbit.sh start

Command: jps

Now, we use below command to start the drill shell.

Now, we can execute our queries on the cluster in the distributed mode.

This is the first blog post in a two-part Apache Drill blog series. The second blog in the series is coming soon.

Got a question for us? Mention them in the comment section and we will get back to you.

Related Posts:

Get Started with Apache Spark and Scala

Get Started with Big Data and Hadoop

Drilling Down On Apache Drill, the New-Age Query Engine

Why Apache Drill?

Recommended videos for you

Secure Your Hadoop Cluster With Kerberos

Is Hadoop A Necessity For Data Science?

Top Hadoop Interview Questions and Answers – Ace Your Interview

Real-Time Analytics with Apache Storm

Administer Hadoop Cluster

MapReduce Design Patterns – Application of Join Pattern

Big Data Processing With Apache Spark

Pig Tutorial – Know Everything About Apache Pig Script

Is It The Right Time For Me To Learn Hadoop ? Find out.

Hive Tutorial – Understanding Hive In Depth

What is Apache Storm all about?

Tailored Big Data Solutions Using MapReduce Design Patterns

Spark SQL | Apache Spark

Filtering on HBase Using MapReduce Filtering Pattern

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

Big Data Tutorial – Get Started With Big Data And Hadoop

Introduction to Apache Solr-1

Apache Spark Redefining Big Data Processing

Hadoop Cluster With High Availability

Apache Spark Will Replace Hadoop ! Know Why

Recommended blogs for you

All You Need To Know About Splunk

Operators in Apache Pig: Part 2- Diagnostic Operators

How to Plan the Capacity of a Hadoop Cluster?

Apache Spark with Hadoop – Why it Matters?

Why do we need Hadoop for Data Science?

Why Should you go for Hadoop Administration Course?

RDDs in PySpark – Building Blocks Of PySpark

How to Run Hive Scripts?

Explaining Hadoop Configuration

What is integration runtime in Azure data factory?

Apache Pig UDF: Part 3 – Store Functions

Map Side Join Vs. Join

Elasticsearch Tutorial – Power Up Your Searches

Game Changing Big Data Use Cases

We Are Deloitte’s #1 Fastest Growing Tech Company!

Dataframes in Spark: All you need to know about Structured Data Processing

Install Apache Hadoop Cluster on Amazon EC2 free tier Ubuntu server in 30 minutes

Steps to Create UDF in Apache Pig

Zookeeper Tutorial: The Guide you need to Master Zookeeper

Install Hadoop: Setting up a Single Node Hadoop Cluster

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Drilling Down On Apache Drill, the New-Age Query Engine