Mergify incident

Engine Down

Critical Resolved View vendor source →

Mergify experienced a critical incident on November 19, 2021 affecting Engine, lasting 5h 55m. The incident has been resolved; the full update timeline is below.

Started
Nov 19, 2021, 02:29 AM UTC
Resolved
Nov 19, 2021, 08:24 AM UTC
Duration
5h 55m
Detected by Pingoru
Nov 19, 2021, 02:29 AM UTC

Affected components

Engine

Update timeline

  1. monitoring Nov 19, 2021, 06:57 AM UTC

    The Mergify engine is unable to process most events received.

  2. monitoring Nov 19, 2021, 06:59 AM UTC

    We have fixed the underlying issue and restored the service. We are now monitoring the platform and planning long term action to have this incident not happen again.

  3. identified Nov 19, 2021, 07:00 AM UTC

    We're implementing long term fixes.

  4. identified Nov 19, 2021, 07:11 AM UTC

    We are continuing to work on a fix for this issue.

  5. resolved Nov 19, 2021, 08:24 AM UTC

    Everything is back to normal.

  6. postmortem Nov 19, 2021, 12:16 PM UTC

    # 19th November @ 1:00 UTC * We start receiving more than 5000 events/minute, while our max rate is usually around 1000 events/minute. # 19th November @ 3:00 UTC * The high load of incoming events continued our Redis database got full as it has been sized for only 3000 events/minute. * Events processing got stuck, and some processes started to crash. # 19th November @ 6:00 UTC * The engineering team is notified and investigates the issue and remediation solution. * The Redis Database gets replicated for further investigation. * We increased the Redis database size to be able to absorb up to 6000 events/minute. * The engine starts reprocessing events. # 19th November @ 6:10 UTC * The abusing user has been identified and flagged. Its Mergify installation has been suspended. Its account was generating 100 commit/s on a repository triggering associated CIs. The abusing repository also has been suspended/deleted on the GitHub side. * The engine has automatically dropped all its events and does not receive events from it anymore.