AWS Glue Crawler Creates Partition and File Tables

+2 votes
I have a pretty basic s3 setup that I would like to query against using Athena. The data is all stored in one bucket, organized into year/month/day/hour folders.

|--data
|   |--2018
|   |   |--01
|   |   |   |--01
|   |   |   |   |--01
|   |   |   |   |   |--file1.json
|   |   |   |   |   |--file2.json
|   |   |   |   |--02
|   |   |   |   |   |--file3.json
|   |   |   |   |   |--file4.json
...
I then setup an AWS Glue Crawler to crawl s3://bucket/data. The schema in all files is identical. I would expect that I would get one database table, with partitions on the year, month, day, etc.

What I get instead are tens of thousands of tables. There is a table for each file, and a table for each parent partition as well. So far as I can tell, separate tables were created for each file/folder, without a single overarching one where I can query across a large date range.

I followed instructions https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html to the best of my ability, but cannot figure out how to structure my partitions/scanning such that I don't get this huge, mostly worthless dump of data.

Thanks!

Dinesh Singh

dinesh.singh2003@gmail.com
Oct 30, 2019 in AWS by Dinesh
• 140 points
1,860 views

1 answer to this question.

+2 votes
Hi, you need to add this option "Create a single schema for each S3 path" in Schedule(Crawler) and if you have 2 tables you need to add 2 store data(from each table)
answered Jan 30 by Fito
Thanks, @Fito for your contribution.

Please register at Edureka Community and earn credits for every contribution. A contribution could be asking a question, answering, commenting or even upvoting/downvoting an answer or question.

These credits can be used to get a discount on the course. Also, you could become the admin at Edureka Community with certain points.

Cheers!

Related Questions In AWS

0 votes
2 answers

How to skip headers when reading a CSV file in S3 and creating a table in AWS Athena?

Thanks for the answer. This should be clear ...READ MORE

answered Aug 14, 2019 in AWS by athenauserz
4,285 views
+1 vote
1 answer

What are AWS Glue Crawler?

AWS Glue crawler is used to connect ...READ MORE

answered Feb 4, 2019 in AWS by Heena
2,322 views
0 votes
2 answers

Receiving SMS from users and stores in AWS

As far as I know, receiving international ...READ MORE

answered Aug 21, 2018 in AWS by Priyaj
• 57,700 points
327 views
+1 vote
2 answers

How do I get my AWS Glue client in JAVA?

Hey, you've been using a correct code ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,650 points
2,116 views
+1 vote
2 answers

Starting with an AWS Instance with API and AUTHPARAMS

The API is usually much easier to ...READ MORE

answered Apr 17, 2018 in AWS by Cloud gunner
• 4,650 points
1,272 views
+15 votes
2 answers

Git management technique when there are multiple customers and need multiple customization?

Consider this - In 'extended' Git-Flow, (Git-Multi-Flow, ...READ MORE

answered Mar 26, 2018 in DevOps & Agile by DragonLord999
• 8,450 points
732 views
+2 votes
1 answer