In Amazon Data Pipeline, how to make sure only once instance of a pipeline is running at any time?

0 votes

I have a pipeline with two tasks. Task 2 depends on Task 1 and maxActiveInstances is set to 1 for both tasks. Despite this dependency, under certain circumstances, Task 2 runs at the same time as Task 1. For example, if Task 2 takes too long and the scheduled start time of the pipeline's next execution is reached, Task 1 starts running at the same time. Same thing happens in case of backfilling.

Since these two tasks interfere with each other, I don't want them to run at the same time under any circumstances. Ideally, I'd want only want instance of the pipeline (not individual tasks) to run at a time. But I can't figure out how to do that.

Here's what the pipeline looks like with uninteresting parts replaced with ...:

{
  "objects": [
    {
      "period": "15 Minutes",
      "name": "Every 15 minutes",
      "id": "DefaultSchedule",
      "type": "Schedule",
      "startAt": "FIRST_ACTIVATION_DATE_TIME"
    },
    {
      "failureAndRerunMode": "CASCADE",
      "resourceRole": "...",
      "role": "...",
      "pipelineLogUri": "...",
      "scheduleType": "cron",
      "schedule": {
        "ref": "DefaultSchedule"
      },
      "maxActiveInstances": "1",
      "name": "Default",
      "id": "Default"
    },
    {
      "name": "CopyTablesActivity",
      "id": "CopyTablesActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "..."
    },
    {
      "name": "CreateReportsActivity",
      "id": "CreateReportsActivity",
      "workerGroup": "dp01",
      "type": "ShellCommandActivity",
      "command": "...",
      "dependsOn": {
        "ref": "CopyTablesActivity"
      }
    }
  ],
  "parameters": [...]
}

Sep 19, 2018 in AWS by bug_seeker
• 14,970 points
136 views

1 answer to this question.

Your answer

Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
0 votes

On the CopyTablesActivity, you could set a lateAfterTimeout attribute to be 5 minutes or so and then add an attribute called onLateAction, and set that to terminate. The idea is if the CopyTablesActivity doesn't finish after 5 minutes, terminate the pipeline. As an example, the CopyTablesActivity object could look like so:

{ "name": "CopyTablesActivity", "id": "CopyTablesActivity", "workerGroup": "dp01", "lateAfterTimeout" : "5 minutes", "type": "ShellCommandActivity", "onLateAction" : { "ref" : "DefaultAction1" } "command": "..." } 

And then, you could define DefaultAction1 as such:

{ "name" : "TerminateTasks", "id" : "DefaultAction1", "type" : "Terminate" } 

See this link for more information: https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-terminate.html

answered Sep 19, 2018 by Priyaj
• 56,120 points

Related Questions In AWS

0 votes
1 answer

if instance is a part of some AutoScaling Group in AWS or not ?

You can run this command below on ...READ MORE

answered Jun 1, 2018 in AWS by Cloud gunner
• 4,240 points
21 views
0 votes
1 answer

How to upload a lib for Tomcat in Amazon EC2?

You need to set the proper privileges ...READ MORE

answered Aug 20, 2018 in AWS by Archana
• 3,770 points
26 views
0 votes
1 answer
+1 vote
1 answer

© 2018 Brain4ce Education Solutions Pvt. Ltd. All rights Reserved.
"PMP®","PMI®", "PMI-ACP®" and "PMBOK®" are registered marks of the Project Management Institute, Inc. MongoDB®, Mongo and the leaf logo are the registered trademarks of MongoDB, Inc.