Learnosity incident

Degraded performance affecting Live Progress Report in US-East-1

Learnosity experienced a minor incident on April 16, 2026 affecting Live Progress (Live Activity by User) report, lasting 1h 15m. The incident has been resolved; the full update timeline is below.

Started: Apr 16, 2026, 02:08 PM UTC
Resolved: Apr 16, 2026, 03:23 PM UTC
Duration: 1h 15m
Detected by Pingoru: Apr 16, 2026, 02:08 PM UTC

Affected components

Live Progress (Live Activity by User) report

Update timeline

investigating Apr 16, 2026, 02:08 PM UTC

As of 13:30 UTC, we are currently experiencing slow downs affecting the Live Progress report in the us-east-1 region. Atypical use of eventbus may also be affected, such as custom reports/implementations. Learnosity Support and Systems Engineering teams are actively investigating the issue, and will follow on with an update and resolution as soon as possible.
resolved Apr 16, 2026, 03:23 PM UTC

As of 14:30 UTC, the Live Progress report event degradation issue in the us-east-1 region has been resolved. Additional capacity processed the event load and these systems are operating normally. Brief additional latency during this scaling may have been introduced for users of Learnosity's premium Firehose feature, but we've seen no evidence that this affected customers. This, too, was resolved once scaling was complete. Learnosity Support and Systems Engineering teams will follow up with a post mortem once we have completed root cause analysis and finalized any next steps or preventative measures required. Please reach out if you have any questions or concerns.
postmortem May 04, 2026, 02:39 PM UTC

### Affected Systems and Regions On 2026-04-16, Learnosity experienced a service degradation impacting the Live Progress report. The issue began at approximately 13:12 UTC and was resolved at 14:40 UTC. The total duration of customer impact was approximately 88 minutes. ### Investigation The issue was detected following elevated error rates on the load balancer serving the eventbus service. Investigation determined that an unhealthy condition within the EC2 instances led to elevated CPU utilization and memory pressure within the Auto Scaling Group, resulting in application instability and repeated service restarts. Inconsistent recovery caused traffic to concentrate on a subset of hosts, further elevating error rates. A secondary effect of the instability was increased request pressure on downstream dependencies. ### Resolution Service was restored by stabilizing the affected EC2 instance and restoring consistent application availability across the Auto Scaling Group. Load distribution normalized once all instances returned to a healthy state. ### Prevention Learnosity is implementing the following measures to mitigate: * Improve service startup and dependency handling to ensure consistent recovery behavior * Review resource thresholds to reduce the likelihood of similar instability * Enhance monitoring to detect and respond more quickly to similar conditions