Ouriginal experienced a minor incident on October 30, 2021 affecting Report Processing, lasting 1d 19h. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Oct 29, 2021, 09:45 PM UTC
We currently experiencing issues some issues with our external source retrieval. We're currently investigating the issue to try to find the root cause. During this time some reports have been produced with insufficient results and we'll investigate and make new reports for those ones as soon as the issue have been resolved. During the investigation we've stopped processing with delays in report generation as an effect. We'll update as soon as possible.
- identified Oct 29, 2021, 10:32 PM UTC
The issue has been identified and a fix is being implemented.
- identified Oct 30, 2021, 05:51 AM UTC
We are still working on a fix for the problem with external source retrieval. Next update in 2 h.
- identified Oct 30, 2021, 07:56 AM UTC
We are still working on a fix for the problem with external source retrieval. We are sorry for caused inconvenience. Next update in 2 h.
- monitoring Oct 30, 2021, 10:03 AM UTC
A fix for the problem has been implemented, we are processing queue but expect delays. We are sorry for caused inconvenience. Next update in 2 h.
- monitoring Oct 30, 2021, 12:08 PM UTC
We are continuing to monitor, processing queue but expect delays. Next update in 2 h.
- monitoring Oct 30, 2021, 02:04 PM UTC
We are continuing to monitor, processing queue but expect delays.
- monitoring Oct 30, 2021, 03:51 PM UTC
Queue handled, we will start to create new reports for the ones with insufficient results.
- monitoring Oct 31, 2021, 09:00 AM UTC
Queue handled, we are creating new reports for the ones with insufficient results.
- monitoring Oct 31, 2021, 08:02 PM UTC
Continuing to create new reports for the ones with insufficient results. From tomorrow Monday at 10 CET a message will appear in view7 if there is a new report available with a link to the newer report.
- resolved Nov 01, 2021, 11:39 AM UTC
This incident has been resolved. Processing of documents has been up since 9 am Saturday CEST. Queue of re-analyzed documents has been processed and a message in View, when accessing an old report indicating that a newer report is available is live. A postmortem will follow.
- postmortem Nov 01, 2021, 11:58 AM UTC
This has been an isolated EU incident so any reports generated using our US endpoint \([us.ouriginal.com/api](http://us.ouriginal.com/api)\) is NOT affected. **Course of events** We halted processing of documents at around 23:00 CEST Friday since we saw that quality was poor for many reports produced. This was caused by a service fetching external sources which was unstable on several servers and remained unnoticed for a couple of days. After the problem was resolved on Saturday at 09:00 CEST, we restarted the processing and were done melting through the queue that had been built up since Friday at about 17:00 CEST Saturday. The problem with poor quality affected many but not all reports since report generation is distributed over several servers and on some of the servers, the service was still working correctly. Since not all reports were affected, it caused the incident to fly under the radar for a period. All reports from Wednesday 00:01 CEST for all documents where we saw that quality was lower than expected, were enqueued again. We melted that queue by approximately 04:00 CET today \(Monday morning\). Yesterday evening \(Sunday\), we started pushing the new results back to Canvas since Canvas will keep the old result until we have pushed the new. As a parallel effort, have also implemented a message in the Report view, indicating that there is a new report available if accessing an old report, with a link to the new report. If there is no message in the report, this means that the user is looking at the new report already. The root cause of the incident was a stability issue in the service fetching external sources, causing the service to crash when exposed to high loads under specific circumstances. **How do we prevent this from happening again?** Following our immediate actions to mitigate the effects of the incident, we are also addressing the stability issue of the service fetching external sources. We are also implementing a failover so that if the service fetching external sources goes down, the job fetching the sources will automatically be distributed to a new instance. **How does this affect your organisation?** There are no immediate actions you need to take since we are taking measures to minimize the inconvenience for all customers. You may however want to convey this message or communicate to your users that any report produced between Wednesday and Friday that was already reviewed should be reopened to see if there is a new report available. A message with the link to the new report will appear if an old report is opened. If no such message is visible, it means that the report being viewed is the latest version of it, or the report was not affected by this incident. The message in the report went live today \(Monday\) at 10:00 CET. We apologize for the inconvenience this may have caused you. **Yours sincerely,** **Team Ouriginal**