Recart incident

Reporting attribution problem

Minor Resolved View vendor source →

Recart experienced a minor incident on January 21, 2023 affecting Messaging, lasting 9h 50m. The incident has been resolved; the full update timeline is below.

Started
Jan 21, 2023, 11:16 AM UTC
Resolved
Jan 21, 2023, 09:06 PM UTC
Duration
9h 50m
Detected by Pingoru
Jan 21, 2023, 11:16 AM UTC

Affected components

Messaging

Update timeline

  1. identified Jan 21, 2023, 11:16 AM UTC

    We identified a problem in our system that generates a problem around attribution, and because of that reporting about attribution is slower than usual.

  2. identified Jan 21, 2023, 12:14 PM UTC

    We identified the root cause of the problem. Memory usage of a component is unpredictable and because of that, we saw many restarts in one of our subsystems. We found an approach with which we will be able to process all bloated messages.

  3. monitoring Jan 21, 2023, 12:57 PM UTC

    We were able to handle the problem and we see the metrics are good and the system copes with the load. We will still monitor our metrics and report if the problem would come up again.

  4. monitoring Jan 21, 2023, 09:06 PM UTC

    We were able to fix the problem. Operation is back to normal.

  5. resolved Jan 21, 2023, 09:06 PM UTC

    This incident has been resolved.