Routific incident

New Routific outage

Critical Resolved View vendor source →

Routific experienced a critical incident on January 22, 2025 affecting New Routific, lasting 8h 29m. The incident has been resolved; the full update timeline is below.

Started
Jan 22, 2025, 06:37 PM UTC
Resolved
Jan 23, 2025, 03:07 AM UTC
Duration
8h 29m
Detected by Pingoru
Jan 22, 2025, 06:37 PM UTC

Affected components

New Routific

Update timeline

  1. investigating Jan 22, 2025, 06:37 PM UTC

    We are actively working to restore user access to the new Routific. The issue is affecting most users and our engineering team is working on a fix at the moment. More updates will be provided in the next 30 minutes.

  2. investigating Jan 22, 2025, 07:11 PM UTC

    We are actively working to restore user access to the new Routific. This issue is our team's highest priority. We thank you for your patience as we work to resolve this as soon as possible. The next update will come within 30 minutes.

  3. investigating Jan 22, 2025, 07:19 PM UTC

    Access to the new Routific has been restored. Thank you for your patience! We are still investigating the cause of the outage. We will report our findings as soon as possible.

  4. monitoring Jan 22, 2025, 07:22 PM UTC

    The new Routific is stable and operational. Our team will continue monitoring and provide a report on the cause of the issue soon.

  5. resolved Jan 23, 2025, 03:07 AM UTC

    The new Routific continues to be stable and fully operational. Our team has identified the root cause of the issue. A bug in our internal messaging system caused an overload, resulting in about 45 minutes of downtime. We have since fixed the issue and will improve our monitoring systems to identify these types of issues earlier to avoid future disruptions.

  6. postmortem Jan 24, 2025, 05:01 PM UTC

    On January 22, 2025 at approximately 10:16 AM PST, Routific experienced an outage that prevented users from accessing the new Routific. The disruption lasted around 45 minutes before full service was restored. Our investigation determined that a bug in our internal messaging system caused a backlog of unacknowledged messages, which overwhelmed the system and led to the outage. Our engineering team quickly identified the root cause and deployed a fix to clear the message backlog and stabilize the platform. We have also improved our monitoring and alerting systems to detect similar issues sooner and avoid future disruptions. Thank you for your patience as we worked to resolve this matter.