Published on Jan 17,2019
544 Views
Email Post

Data analysis is a very complex process and there has always been attempts to ease it. There are many tools for analytics, and even the popular tech giant Amazon provides an AWS service named Amazon Athena. This Amazon Athena tutorial will guide you through the basics and advance usage of Amazon Athena.

Amazon Athena is an interactive data analysis tool used to process complex queries in relatively less time. It is server-less hence, there is no hassle of setting up and doesn’t require managing the infrastructure. It is not a Database service hence, you just pay for the queries you run. You just point your data in S3, define the schema required and with a standard SQL you are good to go.

The topics covered in this article are as follows:

Introduction To Amazon Athena

On November 20, 2016, Amazon launched Athena as one of its services. As described earlier, Amazon Athena is a serverless query service that makes analysis of data, using standard SQL, stored in Amazon S3 simpler. With few clicks in the AWS Management Console, customers can point Amazon Athena at their data stored in Amazon S3 and run queries using standard SQL to get results in seconds.

With Amazon Athena, there is no infrastructure to set up or manage, and the customer pays only for the queries they run. Amazon Athena scales automatically, executing queries in parallel, which gives fast results, even with a large dataset and complex queries. Now, that you what is Amazon Athena let me take you through the difference it has compared to SQL Server.

Difference Between Microsoft SQL Server And Amazon Athena

Features

Microsoft SQL Server

Amazon Athena

DEFINITION

Microsoft SQL Server is a database management and analysis system.Amazon Athena is an interactive query service that makes data analysis easy.

USAGE

Used for DML, DCL, DDL and TCL operations on Database.Used for DML operations on Database.

BENEFITS

1. Reliable and easy to use.

2. High performance.

3. Easy to maintain.

4. Easy server installation.

5. Multiple tools integration possible.

1. Easy to use.

2. High performance.

3. No maintenance required.

4. No server configuration required.

5. Multiple tools integration possible.

INTEGRATION1. Sequlize

2. SQLDep

3. Presto

1. Amazon S3

2. AWS Glue

3. Presto

LIMITATIONS1. Limited RDS storage.

2. Limited instances.

3. Can not handle recursion.

1. No DDL’s supported. 

2. Works with external table only.

3. User Defined Functions not supported.

MySQL vs Amazon Athena


Use Of Amazon Athena

If you are a Data Analyst and have an experience of analyzing data stored on S3, you will relate to this,

Data Analysts/Developers: Do you offer Storage?
AWS: Yes.
Data Analysts/Developers: Do you have tools for Analytics?
AWS: Not sure.” 

Amazon worked on this and came up with Amazon Athena. Now, you have a tool to play with your data. Athena helps you analyze unstructured, semi-structured and structured data that is stored in Amazon S3. Using Athena you can create dynamic queries for your dataset. Athena also works with AWS Glue to give you a better way to store the metadata in S3.

Using AWS CloudFormation and Athena, you can use named queries. Named query allows you to name your query and then call it using the name. 

This interactive service from AWS can be used by Data Scientists, developers to take a sneak peak into the table instead of running the complete query. It is also used to fetch data from S3, load it to different data stores using Athena JDBC driver, for log storage/analysis and Data Warehousing events.

Now that you know Athena is an interesting tool, let’s find out in this Amazon Athena tutorial how to get your hands on this amazing service from Amazon.

Accessing Amazon Athena

 Accessing Athena is very easy and it can be done by either:

These are few of the ways to access Amazon Athena. By now, you pretty much know everything important about Amazon Athena. Let’s me walk you through the different features of Athena.

Features Of Athena

Out of the many services provided by Amazon, Athena is one of the services. It has many features that makes it suitable for Data Analysis. Let’s take a look at the different features one by one.

  1. Easy Implementation: Athena doesn’t require installation. It can be accessed directly from the AWS Console also directly by AWS CLI.
  2. Serverless: It is serverless, so the end-user doesn’t need to worry about infrastructure, configuration, scaling or failure. Athena takes care of everything on its own.
  3. Pay per query: Athena charges you only for the query you run, i.e. the amount of data that is managed per query. You can save a lot if you can compress them and format your dataset accordingly.
  4. Fast: Athena is a very fast analytics tool. It can perform complex queries in less time by breaking the complex queries into simpler ones and run them parallelly, then combine the results to give the desired output.
  5. Secure: With the help of IAM policies and AWS Identity, Athena gives you complete control over the data set. As the data is stored in S3 buckets, IAM policies can help you manage control to users.
  6. Highly available: With the assurance of AWS, Athena is highly available and the user can execute queries round the clock. As AWS is 99.999% available, so is Athena.
  7. Integration: The best feature of Athena is that it can be integrated with AWS Glue. AWS Glue will help the user to create a better-unified data repository. This helps you create better versioning of data, better tables, views, etc.

Great isn’t it? Athena provides many features at the same time, it is cost-efficient.

By now you must be impressed by AWS Athena. Now that you know quite a lot about Athena. Let’s roll our sleeves and understand the working of Athena by performing a small demo. In this Amazon Athena tutorial, we will work on two Demos, let’s find out what are they. 

Demo – I (Creating Tables In Athena)

As you know all about Amazon Athena, let’s take a dive on how to query your data stored as .json file in Amazon S3 using Athena.

  1. Create multiple JSON files containing entries
  2. Store the files to S3 bucket
  3. Create an external table for the files stored in S3
  4. Write a Query for accessing the data

Let’s understand how to do the above-said tasks one by one. 

  1. Create JSON Files. (Create the data without using newline character)
    JSON File- Amazon Athena Tutorial                                                                      JSON File- Amazon Athena Tutorial

  2. We will access S3 bucket using AWS CLI
    1. Configure IAM UserConfigure IAM User - Amazon Athena Tutorial-Edureka
    2. Create S3 BucketCreate S3 Bucket- Amazon Athena Tutorial- Edureka
    3. Copy files to S3 BucketCopy Json File To S3- Amazon Athena Tutorial- Edureka
  3. Create External Table in Athena. There are two ways of doing this: 
    1. Using AWS Glue Crawler
    2. Manually
  4. We will create it manually:
    1. Create table.
      Create External Table1-Amazon Athena Tutorial- Edureka                                                                    Amazon Athena Console
    2. Create a new database if you don’t have one. Give a table name. Give the location of your file.
      Creating external table2-Amazon Athena Tutorial- Edureka                                                                    Amazon Athena Console
    3. Select the type of file you will be working with. Select the architecture of the data in your file.
      Creating external table4-Amazon Athena Tutorial- Edureka                                                                   Amazon Athena Console
    4. As the entered data is not that complex, we don’t need a partition. Click on “Create Table”.
      Creating external table5-Amazon Athena Tutorial- Edureka                                                    Amazon Athena Console – Amazon Athena Tutorial
    5. Athena will auto-generate the Query for creating External Table and run it.
      Creating external table6-Amazon Athena Tutorial- Edureka                                                                Amazon Athena Console

      You have your external table ready.
  5. We write a query to select all data from the table.
    1. select * from testdb;
    2. Click on Run Query and you have all the information in your table.Running Query- Amazon Athena Tutorial- Edureka                                                                       Amazon Athena Tutorial

Demo – II (Comparison Between Amazon Athena And MySQL)

In this Amazon Athena tutorial, now we will compare MySQL and Athena and understand as to how even simple queries take less time to execute in Athena.

  1. Loading CSV file to MySQL took around 1 hour but in Athena, it took just 3 mins to upload the CSV file to S3 and 0.42 seconds to create a table for the same.athena table-AmazonAthena Tutorial-Edureka
  2. Select query. select * from table.Athena Select query-AmazonAthena Tutorial-EdurekaSelect query in Athena.Mysql Select Query-AmazonAthena Tutorial-EdurekaSelect query in MySQL.
  3. Selecting a specific column from the table.SelectQuery-AmazonAthena Tutorial-EdurekaSelect a specific column in Athena
    SelectQueryMySql-AmazonAthena Tutorial-EdurekaSelect a specific column in MySQL.
  4. Getting the count of a specific column.CountQueryAthena-AmazonAthena Tutorial-EdurekaCount of a specific column in Athena.CountQueryMysql-AmazonAthena Tutorial-EdurekaCount of a specific column in MySQL.
  5. Counting number of records in the table.CountQueryAthena-AmazonAthena Tutorial-EdurekaCount all records in Athena.
    CountQueryMySql-AmazonAthena Tutorial-EdurekaCount all records in MySQL.
  6. Select query with a specified range.SelectQueryAthena-AmazonAthena Tutorial-EdurekaSelect query within the said range in Athena.
    SelectQueryMysql-AmazonAthena Tutorial-EdurekaSelect query within the said range in MySQL.

That was a brief comparison on basic SQL commands between MySql and Amazon Athena.

I hope this blog was informative and helped in gaining an idea about AWS Athena. Choose a career in AWS and get certified in AWS Architect which will boost your professional career. Take a look at the different use case which are revolutionizing business. All The Best!

If you wish to learn more about AWS and the amazing services like Athena, then check out our AWS Architect Course which comes with instructor-led live training and real-life project experience. This training will help you understand the different services offered by AWS in depth and help you achieve mastery over the subject.

Got a question for us? Please mention it in the comments section of ”Amazon Athena” and we will get back to you.

About Author
Priyaj Kumar
Published on Jan 17,2019
Priyaj Kumar is an AWS aspirant working as a Research Analyst at Edureka. He is working on different AWS services and has good knowledge in Python and Java. He is from Dumka, Jharkhand. He loves playing Badminton. You can reach him at www.linkedin.com/in/priyajkumar/

Share on

Browse Categories

Comments
0 Comments