AWS Glue Crawler Creates Partition and File Tables

+2 votes
I have a pretty basic s3 setup that I would like to query against using Athena. The data is all stored in one bucket, organized into year/month/day/hour folders.

|   |--2018
|   |   |--01
|   |   |   |--01
|   |   |   |   |--01
|   |   |   |   |   |--file1.json
|   |   |   |   |   |--file2.json
|   |   |   |   |--02
|   |   |   |   |   |--file3.json
|   |   |   |   |   |--file4.json
I then setup an AWS Glue Crawler to crawl s3://bucket/data. The schema in all files is identical. I would expect that I would get one database table, with partitions on the year, month, day, etc.

What I get instead are tens of thousands of tables. There is a table for each file, and a table for each parent partition as well. So far as I can tell, separate tables were created for each file/folder, without a single overarching one where I can query across a large date range.

I followed instructions to the best of my ability, but cannot figure out how to structure my partitions/scanning such that I don't get this huge, mostly worthless dump of data.


Dinesh Singh
Oct 30, 2019 in AWS by Dinesh
• 140 points

1 answer to this question.

+2 votes
Hi, you need to add this option "Create a single schema for each S3 path" in Schedule(Crawler) and if you have 2 tables you need to add 2 store data(from each table)
answered Jan 30 by Fito
Thanks, @Fito for your contribution.

Please register at Edureka Community and earn credits for every contribution. A contribution could be asking a question, answering, commenting or even upvoting/downvoting an answer or question.

These credits can be used to get a discount on the course. Also, you could become the admin at Edureka Community with certain points.


Related Questions In AWS

0 votes
2 answers

How to skip headers when reading a CSV file in S3 and creating a table in AWS Athena?

Thanks for the answer. This should be clear ...READ MORE

answered Aug 14, 2019 in AWS by athenauserz
+1 vote
1 answer

What are AWS Glue Crawler?

AWS Glue crawler is used to connect ...READ MORE

answered Feb 4, 2019 in AWS by Heena
0 votes
2 answers

Receiving SMS from users and stores in AWS

As far as I know, receiving international ...READ MORE

answered Aug 21, 2018 in AWS by Priyaj
• 57,530 points
+1 vote
2 answers

How do I get my AWS Glue client in JAVA?

Hey, you've been using a correct code ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,630 points
+1 vote
2 answers

Starting with an AWS Instance with API and AUTHPARAMS

The API is usually much easier to ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,630 points
+1 vote
3 answers

Log in to AWS using Access Key ID and Secret Access Key ID

Access keys consist of an access key ...READ MORE

answered Aug 17, 2018 in AWS by Priyaj
• 57,530 points
+13 votes
2 answers

Git management technique when there are multiple customers and need multiple customization?

Consider this - In 'extended' Git-Flow, (Git-Multi-Flow, ...READ MORE

answered Mar 26, 2018 in DevOps & Agile by DragonLord999
• 8,420 points
0 votes
1 answer