SignRequest experienced a critical incident on August 13, 2020 affecting API and Web app, lasting 32m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- investigating Aug 13, 2020, 12:07 PM UTC
We are currently investigating.
- investigating Aug 13, 2020, 12:07 PM UTC
We are experiencing a service interruption due to a failed automatic certificate renewal.
- resolved Aug 13, 2020, 12:40 PM UTC
This incident has been resolved.
- postmortem Aug 17, 2020, 10:55 AM UTC
August 13 2020 - TLS certificate incident **Summary and key takeaways** * Our certificate expired on August 13, 2020. * Some of our customers experienced service outages for up to 1 hour. * The issue has been fully mitigated, and the service availability was restored for everyone. On August 13, 2020, 12:00 UTC, we experienced a rare situation where our TLS certificate expired. In theory, this isn’t anything unusual. Certificates expire all the time. But on August 13th, the expiring certificate exposed an underlying issue which made our API and website unavailable for some of our customers. On August 13th, a second after 12:00 UTC when our certificate expired, our Engineering team got notified that there was a certificate problem with our service. This was an unexpected message as our certificates are set up to renew automatically. A quick verification in the browser also confirmed that everything seemed abnormal and that the API was not responding correctly. We immediately took action to manually renew our certificate in AWS certificate manager. This is when we realized that our certificate was ineligible for automatic renewal and according to AWS we have received a notification. We checked the logs and our Health Dashboard and found this to be the case. The direction of the investigation quickly changed when we realized this. We investigated the error message in the logs when trying to renew our certificate and we found that our DNS Certification Authority Authorization \(CAA\) record was the culprit and our investigation into why is ongoing. We proceeded to request a new certificate at 12:34 UTC on August 13, 2020, and we were issued a new certificate at 12:38 UTC on August 13, 2020. We immediately updated our systems with the new certificate and our services were operational at 12:40 UTC on August 13, 2020 again. We updated the status page at 12:07 UTC on August 13, 2020 with the information we had, dedicated people to respond to emails coming to our support and started to work on mitigations of what we thought was the source of the problem. We have implemented more robust metrics and error tracking and set up alarms to notify us in advance when certificates are about to expire in order for us to keep providing an uninterrupted service to our customers.