Wyng incident

Intermittent performance issues and errors due to Amazon Web Services service disruption

Wyng experienced a minor incident on November 25, 2020 affecting Login and Salesforce Marketing Cloud Integration and 1 more component, lasting 22h 24m. The incident has been resolved; the full update timeline is below.

Started: Nov 25, 2020, 03:37 PM UTC
Resolved: Nov 26, 2020, 02:02 PM UTC
Duration: 22h 24m
Detected by Pingoru: Nov 25, 2020, 03:37 PM UTC

Affected components

LoginSalesforce Marketing Cloud IntegrationExperiencesMetricsMailChimp IntegrationExperience ManagementInstagram Content StreamOracle Responsys IntegrationWebhook IntegrationContent Management

Update timeline

identified Nov 25, 2020, 03:37 PM UTC

AWS is reporting a disruption of some services in the US-East-1 region. As a result, Wyng customers may experience performance issues or intermittent errors when accessing experiences, submitting forms, or accessing the management dashboard. Content streams may also be impacted.
identified Nov 25, 2020, 04:07 PM UTC

Management dashboard features are working normally.
identified Nov 25, 2020, 06:20 PM UTC

Amazon Web Services is continuing to work towards resolution of the service disruption impacting Wyng and other AWS customers in the US-EAST-1 region. We are re-posting their most recent update here: [09:32 AM PST] The Kinesis Data Streams API is currently impaired in the US-EAST-1 Region. As a result customers are not able to write or read data published to Kinesis streams. CloudWatch metrics and events are also affected, with elevated PutMetricData API error rates and some delayed metrics. While EC2 instances and connectivity remain healthy, some instances are experiencing delayed instance health metrics, but remain in a healthy state. AutoScaling is also experiencing delays in scaling times due to CloudWatch metric delays. The issue is also affecting other services, including ACM, Amplify Console, API Gateway, AppMesh, AppStream2, AppSync, Athena, Batch, CloudFormation, CloudTrail, Cognito, Connect, DynamoDB, EventBridge, Glue, IoT Services, Lambda, LEX, Managed Blockchain, Marketplace, Personalize, RDS, Resource Groups, SageMaker, Support Console, Well Architected, and Workspaces. For further details on each of these services, please see the Personal Health Dashboard. Other services, like S3, remain unaffected by this event. This issue has also affected our ability to post updates to the Service Health Dashboard. We are continuing to work towards resolution.
identified Nov 26, 2020, 01:05 AM UTC

Amazon Web Services team has identified the root cause continues to work toward full recovery of the core service failure that is impacting Wyng services. We are actively monitoring, and seeing progress towards recovery. Platform login and dashboards are operating normally, but experiences continue to see sporadic failures, including delays or problems submitting forms. We are including for reference the most recent update from https://status.aws.amazon.com/. 4:42 PM PST: We continue to work towards full recovery of the issue affecting the Kinesis Data Streams API in the US-EAST-1 Region, and are observing steady signs of recovery. We will be throttling APIs as we work on recovery. Impact to Cognito User & Identity Pools was resolved at 2:28 PM PST. Other AWS Services that rely on Kinesis Data Stream APIs have begun observing recovery, and will continue to as we work toward full resolution. We are actively working toward full recovery for all affected services, and will provide updates as we have new information to share. [...]
monitoring Nov 26, 2020, 02:32 AM UTC

We are continuing to see progress toward recovery, and actively monitoring.
monitoring Nov 26, 2020, 06:18 AM UTC

At this time, all Wyng systems are operational. Over the next few hours, a small number of users may experience slower response times; if this occurs, the issue should be temporary, and resolve on its own. Amazon Web Services continues to work to fully restore underlying services, and indicates all systems will be fully restored over the next few hours. We are including for reference the most recent update from https://status.aws.amazon.com/:
resolved Nov 26, 2020, 02:02 PM UTC

All Wyng systems are back to normal operations. The underlying root cause, a failure in AWS Kinesis sub-system, and failures in many dependent AWS sub-systems, has been resolved, and AWS indicates that the issue will not recur due to the actions taken.