AWS Glue Crawler Creates Partition and File Tables

+1 vote
I have a pretty basic s3 setup that I would like to query against using Athena. The data is all stored in one bucket, organized into year/month/day/hour folders.

|--data
|   |--2018
|   |   |--01
|   |   |   |--01
|   |   |   |   |--01
|   |   |   |   |   |--file1.json
|   |   |   |   |   |--file2.json
|   |   |   |   |--02
|   |   |   |   |   |--file3.json
|   |   |   |   |   |--file4.json
...
I then setup an AWS Glue Crawler to crawl s3://bucket/data. The schema in all files is identical. I would expect that I would get one database table, with partitions on the year, month, day, etc.

What I get instead are tens of thousands of tables. There is a table for each file, and a table for each parent partition as well. So far as I can tell, separate tables were created for each file/folder, without a single overarching one where I can query across a large date range.

I followed instructions https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html to the best of my ability, but cannot figure out how to structure my partitions/scanning such that I don't get this huge, mostly worthless dump of data.

Thanks!

Dinesh Singh

dinesh.singh2003@gmail.com
Oct 30 in AWS by Dinesh
• 130 points
84 views

No answer to this question. Be the first to respond.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.

Related Questions In AWS

0 votes
2 answers

How to skip headers when reading a CSV file in S3 and creating a table in AWS Athena?

Thanks for the answer. This should be clear ...READ MORE

answered Aug 14 in AWS by athenauserz
1,482 views
+1 vote
1 answer

What are AWS Glue Crawler?

AWS Glue crawler is used to connect ...READ MORE

answered Feb 4 in AWS by Heena
322 views
0 votes
2 answers

Receiving SMS from users and stores in AWS

As far as I know, receiving international ...READ MORE

answered Aug 21, 2018 in AWS by Priyaj
• 56,940 points
90 views
+1 vote
2 answers

How do I get my AWS Glue client in JAVA?

Hey, you've been using a correct code ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,280 points
1,032 views
+1 vote
2 answers

Starting with an AWS Instance with API and AUTHPARAMS

The API is usually much easier to ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,280 points
490 views
+1 vote
3 answers

Log in to AWS using Access Key ID and Secret Access Key ID

Access keys consist of an access key ...READ MORE

answered Aug 17, 2018 in AWS by Priyaj
• 56,940 points
1,201 views
+13 votes
2 answers

Git management technique when there are multiple customers and need multiple customization?

Consider this - In 'extended' Git-Flow, (Git-Multi-Flow, ...READ MORE

answered Mar 26, 2018 in DevOps & Agile by DragonLord999
• 8,380 points
216 views
0 votes
1 answer