Zaptec incident

Zaptec Portal delay

Critical Resolved View vendor source →

Zaptec experienced a critical incident on October 8, 2025 affecting Portal and API and 1 more component, lasting 17h 43m. The incident has been resolved; the full update timeline is below.

Started
Oct 08, 2025, 12:18 PM UTC
Resolved
Oct 09, 2025, 06:01 AM UTC
Duration
17h 43m
Detected by Pingoru
Oct 08, 2025, 12:18 PM UTC

Affected components

PortalAPICharger backendOCPPAPIPortal

Update timeline

  1. investigating Oct 08, 2025, 12:18 PM UTC

    We are currently investigating this issue.

  2. investigating Oct 08, 2025, 12:24 PM UTC

    We’re currently experiencing issues with activating and deactivating chargers in the portal. There’s also some delay in allocating current to the chargers. We are investigating the issue!

  3. investigating Oct 08, 2025, 12:53 PM UTC

    We are continuing to investigate this issue.

  4. investigating Oct 08, 2025, 01:03 PM UTC

    The portal will be temporarily unavailable due to a restart.

  5. investigating Oct 08, 2025, 01:16 PM UTC

    There are still delays in the Zaptec Portal after the restart, and we are continuing to investigate the issue.

  6. investigating Oct 08, 2025, 01:40 PM UTC

    The portal will be unavailable for approximately 15 minutes.

  7. investigating Oct 08, 2025, 01:56 PM UTC

    Portal is up again, but we are still seeing delays. We are continuing to investigate the issue.

  8. investigating Oct 08, 2025, 02:07 PM UTC

    There is still delays in Zaptec Portal, this will affect adding new chargers in installations, also Zaptec App We are continuing to investigate the issue.

  9. investigating Oct 08, 2025, 02:18 PM UTC

    We are continuing to investigate this issue.

  10. investigating Oct 08, 2025, 02:33 PM UTC

    We’re seeing some improvements, but issues are still under investigation

  11. identified Oct 08, 2025, 02:42 PM UTC

    The issue has been identified and a fix is being implemented.

  12. monitoring Oct 08, 2025, 02:45 PM UTC

    A fix has been implemented and we are monitoring the results.

  13. monitoring Oct 08, 2025, 02:53 PM UTC

    We are continuing to monitor for any further issues.

  14. monitoring Oct 08, 2025, 08:28 PM UTC

    We are continuing to monitor for any further issues.

  15. monitoring Oct 09, 2025, 05:07 AM UTC

    We are continuing to monitor for any further issues.

  16. resolved Oct 09, 2025, 06:01 AM UTC

    This incident has been resolved, postmortems will be posted later today.

  17. postmortem Oct 09, 2025, 10:54 AM UTC

    On October 9, 2025, we experienced our customer portal showing the stale data and then disruptions to charging operations. We want to provide a transparent overview of what happened, how we responded, and the steps we are taking to prevent this from happening again. ‌ \## Timeline of Events 13:45: Replication issues detected between database instances. It was growing slowly but steadily and initially did not caught attention as a transient issue. 14:27: Our engineering team started to recycle most loaded backend systems to clean up potentially hanging database connections. No improvement observed. 14:40: High-traffic API endpoints temporarily disabled to reduce load. No improvement observed. 15:00: Web services temporarily shut down. The underlying problem persisted despite reduced load. 15:00: Charging control plane restarted. Load decreased, but the event backlog continued to grow. 15:10: Web services restored. Event backlog remained unchanged. 15:37: Portal services shut down. Event backlog remained unchanged. 16:03: Incident escalated to emergency status. 16:15: Online detection services temporarily shut down. Event backlog cleared and returned to normal levels. 16:30: All systems stabilized and resumed normal operations. ‌ Root Cause Analysis Our cloud database service stopped shipping transaction logs, causing the secondary database replica to fall behind. This occurred despite adequate resources being available to handle the incoming transaction volume and is clearly visible on the respective server metric. ‌ We are still investigating the underlying cause of the log shipping failure. Current areas of investigation include: Cloud infrastructure networking issues Long-running database transactions caused by application behavior We are actively working to reproduce the issue to better understand the failure conditions. ‌ Preventative Measures and Follow-up To prevent similar incidents and improve our response capabilities, we are implementing the following measures: Infrastructure optimization: Reducing workload on our database infrastructure through architectural improvements. Enhanced monitoring: Deploying additional observability tools to detect replication issues earlier. Automated alerting: New alerts configured for replication lag and transaction log queue sizes to enable faster detection and response. We are committed to continuous improvement and will provide updates as our investigation progresses.