Consider the following two-step algorithm:
Iteration 1: A1 A2 A3
Iteration 2: B2 B3
Where task B{i}
depends on tasks A{i-1}
and A{i}
.
The question is how to execute this workflow with Celery such that:
- No
A
task is executed twice. - Each
B
task can start as soon as bothA
tasks it depends on are finished?
I've tried the following options but none has both properties.
Option 1: Two groups
I have split the two iterations into two separate groups like this.
result = group([A.s(1), A.s(2), A.s(3)]) | group([B.s(2), B.s(3)])
The problem with this execution is that the whole group with A
tasks needs to finish before the B
group can start. This leads to the desired result but not to the optimal utilization of the resources.
Option 2: Chords
result = group([
chord([A.s(1), A.s(2)], B.s(2)),
chord([A.s(2), A.s(3)], B.s(3))
])
The problem here is that A.s(2)
is called twice. I can manage this inside of my application but that requires sort of distributed locks and more careful handling of what is already done and what needs to be done. Executing A.s(2)
twice is not an option. Tasks are idempotent but the execution takes too long.
from Task chaining with shift using Celery
No comments:
Post a Comment