How can I do Continuous data ingestion from on prem data sources to redshift

0 votes

I have a required to ingest data from multiple on-prem data sources into my Redshift. This ingestion will be a scheduled activity running every 6 hours in a day. The process should be able to identify the delta records and load only new/changed records in Redshift. In all these processes, restart option should also be made available.I am trying to do this using either entirely AWS services or with a combination of python programs and aws services.

My idea is to setup a data flow from external sources to s3, then temporarily launch a ec2 instance for any data processing/wrangling requirement, then write the curated data back to s3, terminate the ec2 instance and load data into redshift using datapipeline.

Can you suggest some pointers to start with. If you have experience with a similar project , do share your experiences. Also if possible, please share a design and associated code for reference.

Aug 8, 2018 in AWS by bug_seeker
• 15,350 points
233 views

1 answer to this question.

0 votes

I can Recommend looking into AWS Schema Conversion Tool (AWS SCT) and AWS Database Migration Service (AWS DMS).

DMS can help you establish ongoing movement of data from on prem sources to Redshift, including staging the data to S3. Supported sources are list in the docs.

Start with the walk through in this blog post: "How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS"

If you still don’t get an answer do comment i would then look into this and surely help you.

answered Aug 8, 2018 by Priyaj
• 56,520 points

Related Questions In AWS

0 votes
1 answer

How do I roll-back a message to Amazon MQ (AMQ) from Lambda?

It is apparently not possible using STOMP ...READ MORE

answered Sep 3, 2018 in AWS by Priyaj
• 56,520 points
146 views
0 votes
1 answer
+13 votes
2 answers

Git management technique when there are multiple customers and need multiple customization?

Consider this - In 'extended' Git-Flow, (Git-Multi-Flow, ...READ MORE

answered Mar 26, 2018 in DevOps & Agile by DragonLord999
• 8,380 points
145 views
+1 vote
2 answers

AWS CloudWatch Logs in Docker

The awslogs works without using ECS. you need to configure ...READ MORE

answered Sep 6, 2018 in AWS by bug_seeker
• 15,350 points
326 views
0 votes
1 answer

How can i copy tables from one database to other on AWS?

You can use AWS Data pipeline to ...READ MORE

answered Jul 5, 2018 in AWS by Priyaj
• 56,520 points
60 views