Bentley Systems incident
IMS - Intermittent Login Authentication
Bentley Systems experienced a critical incident on April 3, 2025 affecting Authentication / Login, lasting 4h 4m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Apr 03, 2025, 12:52 PM UTC
Our team is currently investigating an issue with IMS in the US Region. Some users may be having trouble authenticating their login. We are working diligently to identify the root cause of the problem and implement a solution. We will provide an update as we learn more. In the meantime, we apologize for any inconvenience this may cause and appreciate your patience and understanding.
- investigating Apr 03, 2025, 12:52 PM UTC
We are continuing to investigate this issue.
- investigating Apr 03, 2025, 01:54 PM UTC
We are continuing to investigate this issue.
- investigating Apr 03, 2025, 01:54 PM UTC
We are continuing to actively work on the issue. We apologize for any inconvenience this may cause and appreciate your patience and understanding. Thank you.
- investigating Apr 03, 2025, 03:09 PM UTC
We are continuing to actively work on the issue. We apologize for any inconvenience this may cause and appreciate your patience and understanding. Thank you.
- investigating Apr 03, 2025, 03:10 PM UTC
We are continuing to investigate this issue.
- investigating Apr 03, 2025, 04:07 PM UTC
We are continuing to actively work on the issue. We apologize for any inconvenience this may cause and appreciate your patience and understanding. Thank you.
- monitoring Apr 03, 2025, 04:39 PM UTC
We have identified the issue, and the fix has been implemented. We are closely monitoring the performance. Thanks for your patience.
- resolved Apr 03, 2025, 04:57 PM UTC
This incident has been resolved, and IMS is now working normally. Thank you for your patience.
- postmortem Apr 15, 2025, 04:14 PM UTC
**Bentley Systems RCA Report** Incident Start Date: 04-03-2025 Incident Start Time: 12:04 UTC Incident End Date: 04-03-2025 Incident End Time: 16:16 UTC Duration of Incident: 252 minutes Bentley Systems Problem Ticket: PRB0041210 Incident System: Authentication Servers Service Impacted: IMS Customers Impacted: Multiple Customers ServiceNow Case Number\(s\): Multiple Cases **Problem Service Impacted** Issue: Multiple accounts were unable to authenticate to Bentley services through Bentley’s IMS \(identify management system\). Service: IMS **Root Cause Analysis** Following a comprehensive investigation, it was determined that the IMS outage on April 3rd, 2025, was caused by a lapse in the proactive monitoring by our vendor that operates IMS. This oversight led to a disk reaching full capacity on a server within the US region. Although the service initially appeared active, the disk's storage volume and operating system eventually reached maximum capacity, which in turn corrupted the configuration database. This ultimately caused end users to encounter 500 errors during authentication attempts. The corrupted database and disk issues and lapse of active monitoring led to the extended duration of the outage. Bentley Support collaborated diligently, performing multiple troubleshooting steps. Once the offending server was identified it was removed from service and normal operations resumed. The remainder of the environment was immediately scanned for any other potential disk issues of which none were found. The Incident occurred during peak traffic within the US region. The irregular token requests and 500 gateway errors stemmed from a cascading effect triggered by the disk reaching full capacity. **Resolution Measures** Immediate actions were taken to review the remaining servers in the cluster to ensure they were stable. The PingAccess server was pulled from the load balancer and the servers were restarted. The IMS login was restored at 16:06 UTC. **Next Steps** We are actively working to remove the vendor from the environment and take over complete operations and maintenance of the IMS Systems. Monitoring Improvements – Monitoring failed to notify about the increasing disk place. We are working to replace the vendors existing monitoring system with new Bentley operated monitoring which will proactively notify our Network Operations Center of any pending issues. This will work to identify all steps that can be taken to ensure proper monitoring and communication is in place. Proactive Process Improvements – The offending server also stopped reporting logs once the disk was full. Accurate logging would’ve also notified us of an impending issue. Systems will be put in place which monitor the monitors and alert us if system monitors should stop. **Bentley Systems Contact Information** To obtain the local contact information please click on the link below: [Bentley Office Locations](https://www.bentley.com/company/offices/) You can also follow the status of major events on [www.status.bentley.com](http://www.status.bentley.com) We apologize for any inconvenience this may have caused. We recognize that any disruption of service is undesirable, so we will continue to research and evaluate potential changes to ensure a consistent, high-level quality of service.