FBS incident

Flexmls Photos

FBS experienced a minor incident on May 1, 2025 affecting Flexmls Photos, lasting 1h 45m. The incident has been resolved; the full update timeline is below.

Started: May 01, 2025, 09:41 PM UTC
Resolved: May 01, 2025, 11:27 PM UTC
Duration: 1h 45m
Detected by Pingoru: May 01, 2025, 09:41 PM UTC

Affected components

Flexmls Photos

Update timeline

investigating May 01, 2025, 09:41 PM UTC

We are currently investigating an issue where some property photos are not displaying correctly in our system. This may affect photo visibility in listing searches, reports, or client portals. Our team is actively working to identify the root cause and restore full functionality. We will provide updates here as we learn more and will notify you once the issue is resolved. We apologize for the inconvenience and appreciate your patience as we work to resolve this as quickly as possible.
monitoring May 01, 2025, 10:30 PM UTC

We've implemented a temporary workaround that has restored most photo functionality across the system. At this time, photos should be displaying as expected for the majority of users. We are continuing to monitor the situation closely and are working toward a full confirmation that the issue has been fully resolved. Thank you for your continued patience.
resolved May 01, 2025, 11:27 PM UTC

This issue has been fully resolved. Photo functionality has been restored across the system, and all affected services are now operating normally. We’ve confirmed that property photos are displaying as expected in listing searches, reports, and client portals. No further impact is anticipated. Thank you for your patience while we worked to address this.
postmortem May 02, 2025, 02:16 PM UTC

# **Incident Summary** On May 1st, 2025, users reported issues with loading listing photos. It was determined that the problem was related to our storage vendor, Backblaze. While deploying changes to our failover mechanisms to restore user-facing services, we were also working with our Content Delivery Network \(CDN\) vendor, who provided troubleshooting assistance. Shortly after, Backblaze identified and resolved the underlying issue, restoring backend services. ## **Timeline of Events** | **Time \(CDT\)** | **Event** | | --- | --- | | 15:43 | Initial reports of listing photos not loading | | 16:17 | Hosting team engaged | | 16:28 | Additional Hosting team resources engaged | | 16:41 | [Status page event](http://fbs.statuspage.io/incidents/l42hhhmfgp5g) posted | | 17:04 | Root cause identified | | 17:07 | Failover mechanism tweaked/deployed. End-user service restored. | | 17:15 | Engagement with CDN vendor for additional troubleshooting support. | | 17:18 | Notification to Backblaze regarding potential storage issues. | | 17:30 | Status page event updated \(Investigating->Monitoring\) | | 17:40 | Backblaze acknowledges the issue and identifies the root cause. | | 18:00 | Backblaze implements remediation efforts. | | 18:27 | Status page event closed \(Monitoring->Resolved\) | ‌ ## **Root Cause** The root cause of the issue was identified by Backblaze as an increase in HTTP 400 responses from their storage platform. ‌ While multiple failover safeguards are in place to prevent storage outages from affecting the end-user experience, they didn’t cover all failure scenarios. This allowed the backend errors to be delivered to the user, instead of engaging our backup storage platforms. ## **Resolution** The first resolution came shortly after the root cause was identified, when FBS modified the failover mechanism to include this failure scenario. ‌ Backblaze remediated the underlying 400 issue to correct the problems and restore service from their platform. ## **Lessons Learned and Follow-Up Actions** From this incident, we have identified several areas for improvement: ‌ * **Monitoring Enhancements:** We will enhance monitoring systems to allow for quicker identification of problems, as well as provide more proactive alerts for storage and delivery issues. * **Vendor Communication Protocol:** We will verify the communication protocols that we have in place with all vendors for incident reporting and escalation. * **Redundancy and Failover:** We will review and strengthen our redundancy and failover strategies for critical data storage.