fix: set FailureAction=rollback for swarm services default UpdateConfig

Docker Swarm's default FailureAction is "pause". When a task fails or is
terminated early during a rolling update, Swarm pauses the update and
stops ALL reconciliation — orphan containers persist indefinitely, even
when healthy. This is the root cause of orphan container issues reported
in production (services showing Replicas: N/1 with multiple healthy
containers that never get cleaned up).

Setting FailureAction to "rollback" makes Swarm automatically revert to
the previous working service spec on failure, preventing orphans while
preserving service availability. Also adds a default RollbackConfig with
Order: "start-first" to match the update config (Docker defaults rollback
to "stop-first" otherwise).

Only affects the default config — users who have configured their own
updateConfigSwarm/rollbackConfigSwarm are not affected.

Relates to #1669, #2223, #2911, #2150
This commit is contained in:
Jaime Herrero
2026-02-28 17:18:42 -05:00
parent 345023f090
commit fadc7fede5

View File

@@ -550,9 +550,15 @@ export const generateConfigContainer = (
},
},
}),
...(rollbackConfigSwarm && {
RollbackConfig: rollbackConfigSwarm,
}),
...(rollbackConfigSwarm
? { RollbackConfig: rollbackConfigSwarm }
: {
// default rollback config to match update config
RollbackConfig: {
Parallelism: 1,
Order: "start-first",
},
}),
...(updateConfigSwarm
? { UpdateConfig: updateConfigSwarm }
: {
@@ -560,6 +566,7 @@ export const generateConfigContainer = (
UpdateConfig: {
Parallelism: 1,
Order: "start-first",
FailureAction: "rollback",
},
}),
...(sanitizedStopGracePeriodSwarm !== null &&