Trello incident

Trello is slow or unavailable

Critical Resolved View vendor source →

Trello experienced a critical incident on April 25, 2025 affecting Trello.com and API and 1 more component, lasting 32m. The incident has been resolved; the full update timeline is below.

Started
Apr 25, 2025, 06:39 PM UTC
Resolved
Apr 25, 2025, 07:12 PM UTC
Duration
32m
Detected by Pingoru
Apr 25, 2025, 06:39 PM UTC

Affected components

Trello.comAPIAtlassian Support - Support PortalAtlassian Support TicketingAtlassian Support Knowledge Base

Update timeline

  1. monitoring Apr 25, 2025, 06:39 PM UTC

    Trello was slow or unavailable. Our engineering team is actively investigating this incident to determine root cause. Users affected by this incident may have noticeed that Trello was slow or completely unavailable in both the web and mobile apps. Trello operations have recovered. We will update this page as we have additional information.

  2. resolved Apr 25, 2025, 07:12 PM UTC

    Between 18:18 UTC to 18:33 UTC, we experienced an outage for Trello. The issue has been resolved and the service is operating normally. Our teams are investigating and will publish the root cause as soon as available.

  3. postmortem May 16, 2025, 06:16 PM UTC

    ### **SUMMARY** On April 25, 2025, between 18:18 and 18:33 UTC, Atlassian customers using Trello may have experienced service interruptions. The event was triggered by temporarily reduced capacity following a rollback deployment, with insufficient nodes to handle the load. Automated monitoring systems detected the incident within one minute and mitigated it by scaling up the deployment, which put Atlassian systems into a known-good state. The total time to resolution was about 15 minutes. ### **IMPACT** The overall impact was on April 25, 2025, between 18:18 and 18:33 UTC, on Trello. The incident caused service disruption to Trello users, resulting in reduced functionality, slower response times, and errors when performing key actions such as loading boards and cards. ### **ROOT CAUSE** The root cause of the incident was a failure to scale our nodes to optimal capacity caused by a release rollback. If an issue is found during a deployment, we can roll back to a previous release. In this case, a rollback was executed to a previous release that had already undergone a scaling-down process. As that rollback happened, more compute nodes needed to be available to handle the high traffic. ### **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity and strive to avoid incidents like these. We are prioritizing the following efforts as next steps: * Improvements to rollback tooling including UX upgrades and pre-scaling * Conduct an updated incident response training focused on rollback tooling and best practices We apologize to customers whose services were impacted during this incident; we are taking steps designed to improve the platform’s performance and availability. Thanks, Atlassian Customer Support