OpsGenie incident

Delays in Android push notifications

Major Resolved View vendor source →

OpsGenie experienced a major incident on April 4, 2023 affecting Mobile Notification Delivery and Mobile Notification Delivery, lasting 3h 45m. The incident has been resolved; the full update timeline is below.

Started
Apr 04, 2023, 02:30 PM UTC
Resolved
Apr 04, 2023, 06:16 PM UTC
Duration
3h 45m
Detected by Pingoru
Apr 04, 2023, 02:30 PM UTC

Affected components

Mobile Notification DeliveryMobile Notification Delivery

Update timeline

  1. investigating Apr 04, 2023, 02:30 PM UTC

    We are investigating an issue with our Android push notifications that is impacting some of our notifications for Android. We will provide more details within the next hour.

  2. monitoring Apr 04, 2023, 02:34 PM UTC

    The issue has been identified as caused by an error on Firebase (https://status.firebase.google.com/incidents/9ZPv9faHLen8bzLVSaft). We continue to monitor the situation and send update within the next hour.

  3. monitoring Apr 04, 2023, 03:23 PM UTC

    Android push notification delivery is fully operational for now, but we are still monitoring the Firebase outage (https://status.firebase.google.com/incidents/9ZPv9faHLen8bzLVSaft).

  4. monitoring Apr 04, 2023, 06:13 PM UTC

    We are continuing to monitor for any further issues.

  5. resolved Apr 04, 2023, 06:16 PM UTC

    This incident on Firebase has been resolved. Android push notifications are operational.

  6. postmortem Apr 26, 2023, 06:41 AM UTC

    ### **SUMMARY** On April 4, 2023, between 13:32 and 14:50 UTC, Atlassian customers using Opsgenie faced significant delays while receiving Android push notifications. This was caused by an incident in a third party messaging service, which is responsible for Android push notification delivery. This in turn affected our systems. The incident was immediately detected by our monitoring tools, our on-call engineers were paged, and at 14:50 UTC our systems recovered successfully. The total time to resolution was about 80 minutes. ### **IMPACT** The overall impact was between April 4, 2023, 13:32 - 14:50 UTC in Opsgenie_._ The incident only resulted in delays in Android push notifications only, and these notifications were delivered successfully after FCM service was restored and no data loss occurred. ### **ROOT CAUSE** The issue was caused by an incident in a third party messaging service, which is responsible for delivering push notifications to Android devices. ‌ **REMEDIAL ACTIONS PLAN & NEXT STEPS** We know that outages impact your productivity. The impact was immediately caught by our monitoring tools, and the responsible team immediately started analysis of incident. We value transparency with our customers and will continue to notify you and take any necessary actions promptly during an incident. In order to handle degradation or outage of messaging channels, Opsgenie recommends that users configure multiple channels of message delivery - including push notifications, mobile SMS, phone calls, and email. In order to improve our response for the future, we will also be analyzing whether we can employ autoscaling solutions for our systems in case of an outage/high load related to one notification channel. We apologize to customers whose services were impacted during this incident. Thanks, Atlassian Customer Support