Provation Software incident

Provation Apex is currently experiencing technical difficulties

Minor Resolved View vendor source →

Provation Software experienced a minor incident on March 7, 2024 affecting Provation Apex, lasting 41m. The incident has been resolved; the full update timeline is below.

Started
Mar 07, 2024, 03:19 PM UTC
Resolved
Mar 07, 2024, 04:00 PM UTC
Duration
41m
Detected by Pingoru
Mar 07, 2024, 03:19 PM UTC

Affected components

Provation Apex

Update timeline

  1. identified Mar 07, 2024, 03:19 PM UTC

    Provation Apex is currently experiencing a partial outage, resulting in some users finding it unreachable. Investigation is underway. New information will be posted here as it becomes available. Please reference this link for Apex Desktop App offline mode instructions - Offline Mode Instructions Click the "Subscribe to Updates" button on this page to get email updates sent to your inbox whenever a change is made to this page.

  2. resolved Mar 07, 2024, 04:00 PM UTC

    Provation Apex has fully recovered. We apologize for the inconvenience.

  3. postmortem Mar 21, 2024, 11:02 PM UTC

    **Postmortem: Sporadic Error Saving Notes & Printing Issues** **Incident Summary** On March 7th 09:19 CST Apex customers were experiencing sporadic errors when saving notes and encountering printing issues. Investigation revealed that **1 out of 4 apex instances were not processing larger payload traffic successfully**. All Apex instances were cleared, and issue was resolved at 10:00 CST. **Root Cause** The root cause of the issue was a **lack of available disk space on certain apex instances**. **Detailed Analysis** 1. **Disk Space Shortage**: * The lack of available disk space was identified as the primary issue. * apex instances were unable to process larger payloads due to insufficient disk space. * This impacted the overall system performance and caused sporadic errors for users. 2. **Excessive Log Files**: * Further investigation revealed that log files were consuming a significant amount of disk space. * These log files were not being deleted frequently enough, leading to the disk space shortage. * The increasing Apex traffic contributed to the accumulation of log files. 3. **Log File Management**: * The team had not adjusted the log file deletion frequency based on the increased Apex traffic. * As a result, log files were not being purged at an acceptable rate. * No alerting mechanism existed to warn the team about the scarce disk space capacity. **Corrective Actions** 1. **Immediate Disk Space Cleanup**: * The team performed an emergency cleanup to free up disk space on affected Apex instances. * Old log files were removed to alleviate the shortage. 2. **Log Rotation and Deletion Strategy**: * A log rotation and deletion strategy was implemented. * Log files are now rotated and deleted at regular intervals based on traffic patterns. * The deletion frequency is adjusted dynamically to accommodate increased traffic. 3. **Alerting System Enhancement**: * An alerting system was set up to notify the team when disk space reaches critical levels. * Alerts are triggered based on predefined thresholds to prevent future incidents. **Preventive Measures** 1. **Capacity Planning**: * Regular capacity planning exercises will be conducted to anticipate resource needs. * Disk space requirements will be reviewed and adjusted as necessary. 2. **Automated Log Management**: * Explore automated log management tools to ensure timely deletion and rotation. * Regularly monitor log file sizes and adjust retention policies accordingly. 3. **Documentation and Training**: * Document the log management process and educate team members. * Ensure everyone understands the importance of disk space management.