GUIDEcx incident

Login Instability Issues

Minor Resolved View vendor source →

GUIDEcx experienced a minor incident on August 19, 2024, lasting —. The incident has been resolved; the full update timeline is below.

Started
Aug 19, 2024, 06:21 PM UTC
Resolved
Aug 19, 2024, 06:21 PM UTC
Duration
Detected by Pingoru
Aug 19, 2024, 06:21 PM UTC

Update timeline

  1. resolved Aug 19, 2024, 06:21 PM UTC

    Type: Incident Duration: 1 hour and 14 minutes Affected Components: Report Navigator and Report Builder, Project Management, Resource Management, Compass Customer Portal, Advanced Time Tracking Aug 19, 18:21:44 GMT+0 - Investigating - We are currently investigating this incident that's causing intermittent login issues. Aug 19, 18:53:26 GMT+0 - Identified - Diagnosis is complete. We are working on solving the main login issue now. Aug 19, 19:12:22 GMT+0 - Monitoring - We implemented a fix and are currently monitoring the result. Aug 19, 19:35:30 GMT+0 - Resolved - The login fix was successful. Monitoring has proven stable. All access is restored. Aug 19, 23:00:00 GMT+0 - Resolved - ## Summary During a routine upgrade of our system infrastructure, we encountered an issue related to the rate-limiting of image downloads from an external service. This rate limit disrupted the startup of essential services, leading to a temporary outage that affected the availability of certain features. ## What Happened The issue occurred during the upgrade process when the rapid and simultaneous restarting of multiple system components led to a higher-than-usual number of download requests within a short time frame. This exceeded the limits set by the external service provider, disrupting the startup of critical services. ## How We Fixed It * **Enhanced Access:** At **3:10 PM ET**, we upgraded our access to the external service, allowing for higher download limits. A new access credential was created and applied, which allowed the impacted services to restart successfully. * **Configuration Update:** We updated our system configurations to ensure more reliable access to required components in the future, reducing the likelihood of similar issues. ## What We've Done to Prevent This in the Future To prevent this from happening again, we took several steps: * **Image Caching & Version Control:** We implemented changes to cache frequently used components and pin them to specific versions, reducing the need to download them from external sources repeatedly and avoiding future rate limits. * **Upgraded Service Plan:** We upgraded our plan with the external service provider to a higher tier, increasing our allowed download capacity and providing more robust support for future operations.