Skeddly experienced a critical incident on August 26, 2018 affecting Action Infrastructure, lasting 31m. The incident has been resolved; the full update timeline is below.
Affected components
Update timeline
- identified Aug 26, 2018, 04:42 AM UTC
We have identified a clock jump on one of our EC2 instances. Action processing has been halted.
- identified Aug 26, 2018, 04:44 AM UTC
The clock has been corrected by NTP. Action processing has resumed. Affected actions are being attended to.
- resolved Aug 26, 2018, 04:57 AM UTC
Action processing has been resumed and affected actions have been restored and/or cancelled as necessary. Our SLA will be applied.
- postmortem Aug 26, 2018, 04:57 AM UTC
This is a repeat of the incident that occurred on August 10, 2018. However, this time, the problem was caught faster. There are 4 locations where the time issue could have originated: * The EC2 instance \(hardware or VM level\) * The OS * NTP server * The application process Based on deeper investigation following today’s incident, we observed the following: * The time jump was not localized to the application process because the time jump was recorded in the OS logs * The time jump was not caused by the NTP server because the time jump was not logged to the OS logs, however, the time was corrected by the NTP client \(which was logged\). Based on the above, the time jump occurred at the EC2 instance \(hardware or VM level\) or OS. Further investigation will occur along with more discussions with AWS support.