Higher Logic incident

Marketing Enterprise (Real Magnet) - Campaign Deployment/Sending Delays

Notice Resolved View vendor source →

Higher Logic experienced a notice incident on September 21, 2023 affecting Marketing Enterprise (Real Magnet), lasting 19h 14m. The incident has been resolved; the full update timeline is below.

Started
Sep 21, 2023, 09:10 PM UTC
Resolved
Sep 22, 2023, 04:24 PM UTC
Duration
19h 14m
Detected by Pingoru
Sep 21, 2023, 09:10 PM UTC

Affected components

Marketing Enterprise (Real Magnet)

Update timeline

  1. investigating Sep 21, 2023, 09:10 PM UTC

    A subset of customers are experiencing issues deploying a campaign and/or delays with messages being sent from the campaign module. Our Engineering team has been notified and they are investigating the issue. In the meantime, please leave your campaigns as they are and they will deploy or send the message once the issue has been resolved. We apologize for the inconvenience and appreciate your patience.

  2. investigating Sep 22, 2023, 01:18 AM UTC

    We're still experiencing deploying and message sending with campaigns. So far our attempts to resolve those issues have been unsuccessful. We are continuing to investigate and troubleshoot and will continue to share updates. We do not have a timeline for resolution at this time.

  3. monitoring Sep 22, 2023, 12:00 PM UTC

    We implemented a fix around 1 AM ET last night and are monitoring that the fix fully resolved the issue. We will provide another update once we have fully confirmed that the fix is effective.

  4. resolved Sep 22, 2023, 04:24 PM UTC

    We have confirmed that this incident is fully resolved. We have not seen further issues in the past 12 hours. We plan to have a root cause analysis (RCA) ready for distribution in the next 3 business days. Thank you for your patience as we worked to resolve this issue.

  5. postmortem Sep 26, 2023, 02:58 PM UTC

    **Incident Root Cause Analysis** **Date: September 25, 2023** **What Happened** A brief interruption in communications between the central database and a database containing a specific set of customers \(Shard 4\) resulted in an incorrect attempt to reprocess a batch of records. The attempt to re-insert records with the same unique identifiers failed and blocked the ability to process any further campaigns for the customers on Shard 4. **Timeline \(all times EDT\)** September 21, 2023 * 10:30 AM – The first issue was reported to customer support. * 4:06 PM – The issue was escalated to development after multiple customers reported the issue happening to their campaigns. * 5:33 PM – An attempt was made to resolve by removing one of the records that seemed to be the cause of the blocked campaigns. * 8:59 PM – Development was notified that deleting the one record did not allow the campaigns to process again. Additional work by development proceeded. September 22, 2023 * 12:57 AM – All of the duplicate batch were removed, and campaigns began processing normally. **Root Cause** A loss of communication between the central database and the customer database \(Shard 4\) caused a duplicate batch to be reprocessed. **Details** The campaigns were blocked because the database was trying to insert duplicate records. Once these duplicate records were addressed, the campaigns began processing again. **Corrective Actions** The duplicate records were removed allowing the normal processing of other campaigns. * Update the database table design to gracefully handle when there is an attempt to insert a duplicate record. * Add monitoring alerts to detect this error proactively and provide for more timely remediation.