Loading data incrementally into Amazon Redshift S3 vs DynamoDB vs Insert

0 votes
I have a web app that needs to send reports on its usage, I want to use Amazon RedShift as a data warehouse for that purpose, How should i collect the data ?

Every time, the user interact with my app, i want to report that.. so when should i write the files to S3 ? and how many ? What i mean is: - If do not send the info immediately, then I might lose it as a result of a connection lost, or from some bug in my system while its been collected and get ready to be sent to S3... - If i do write files to S3 on each user interaction, i will end up with hundreds of files (on each file has minimal data), that need to be managed, sorted, deleted after been copied to RedShift.. that dose not seems like a good solution .

What am i missing? Should i use DynamoDB instead, Should i use simple insert into Redshift instead !?
If i do need to write the data to DynamoDB, should i delete the hold table after been copied .. what are the best practices ?

On any case what are the best practices to avoid data duplication in RedShift ?

Appreciate the help!
Mar 3 in Others by Edureka
• 13,640 points
65 views

1 answer to this question.

0 votes
Before feeding event logs into Amazon Redshift, it's best to aggregate them. The advantages are that you will be able to take advantage of Redshift's parallel nature; COPY on a series of larger files in S3 (or from a large DynamoDB database) will be significantly faster than INSERT or COPY on a single small file. Before loading your data into Redshift, you can sort it (particularly if the sorting is based on event time). This also increases load performance and reduces the need for table VACUUM. Before aggregating and importing your events into Redshift, you can collect them in a variety of places: Local file to S3 - The most typical method is to collect your logs on the client/server and upload them to S3 every x MB or y minutes.There are several log appenders that enable this functionality, and no code changes are required (for example, FluentD or Log4J). This is only possible with container configuration. The disadvantage is that you may lose certain logs, and these local log files can be removed before uploading.
DynamoDB - As @Swami mentioned, DynamoDB is a great approach to collect events.
Amazon Kinesis - a newly announced service - is also a fantastic approach to broadcast events from various clients and servers to a central place in a quick and reliable manner. The events are arranged in order of insertion, making it simple to load pre-sorted data into Redshift afterwards.
answered Mar 3 by gaurav
• 13,560 points

Related Questions In Others

0 votes
1 answer

Reverse back functional principle components into original data

Hi@Faheem, It depends on your Dataset. You have ...READ MORE

answered Oct 21, 2020 in Others by MD
• 95,360 points
129 views
0 votes
0 answers

Using StringSets Amazon DynamoDB Android

I have an android application which uses ...READ MORE

Feb 28 in Others by Edureka
• 13,640 points
22 views
0 votes
1 answer

NodeJS Amazon AWS S3 getObject how to send file in API response to download

Server Side const aws = require('aws-sdk'); router.get('/getfilefroms3', async (req, ...READ MORE

answered Mar 24 in Others by gaurav
• 13,560 points
632 views
0 votes
0 answers
0 votes
0 answers

What is DynamoDB number and string data type storage space

What is the storage space for a ...READ MORE

Apr 5 in Others by Kichu
• 19,040 points
30 views
0 votes
0 answers

To query Last 7 days data in DynamoDB

HashKey(Date) ,RangeKey(timestamp) This is what my dynamo DB ...READ MORE

Apr 9 in Others by Kichu
• 19,040 points
134 views
0 votes
1 answer

Amazon S3 Glacier vs Glacier Storage Class

The Amazon S3 Glacier storage classes are ...READ MORE

answered Mar 25 in Others by gaurav
• 13,560 points
40 views
0 votes
1 answer

Exporting a table from Amazon RDS into a CSV file

Presumably, you're attempting to export data from ...READ MORE

answered Mar 3 in Others by gaurav
• 13,560 points
367 views
webinar REGISTER FOR FREE WEBINAR X
Send OTP
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP