Fluke MET/TEAM incident

Multiple Bugs in MET/CAL version 11.0.2

Major Resolved View vendor source →

Fluke MET/TEAM experienced a major incident on June 19, 2024 affecting MET/CAL, lasting 36d 9h. The incident has been resolved; the full update timeline is below.

Started
Jun 19, 2024, 02:24 PM UTC
Resolved
Jul 26, 2024, 12:15 AM UTC
Duration
36d 9h
Detected by Pingoru
Jun 19, 2024, 02:24 PM UTC

Affected components

MET/CAL

Update timeline

  1. investigating Jun 18, 2024, 06:20 PM UTC

    The following FSCs are currently known to be impacted, as well as other functionality. 4180/4181 53131/53132/53181 5502A 5730A 9100 9500 PM6680/PM6681 MATH FSC functions We are investivating both the identified issues and reviewing others areas that may have been impacted to ensure no similar issues have not yet been reported.

  2. identified Jun 19, 2024, 02:24 PM UTC

    The changes that resulted in the reported issues have been identified and are being resolved. Additional investigation is still in progress to determine if other as yet unreported issues still exist in order to resolve them proactively.

  3. identified Jun 20, 2024, 09:10 PM UTC

    A MET/CAL 11.0.3 release is planned for July 12th to resolve all currently known issues as well as any identified during the further review and testing.

  4. identified Jun 20, 2024, 09:11 PM UTC

    A MET/CAL 11.0.3 release is planned for July 12th to resolve all currently known issues as well as any identified during the further review and testing.

  5. identified Jul 10, 2024, 09:41 PM UTC

    The planned release has been pushed to July 19th to allow for extended testing as we ensure that all the issues have been resolved. We have a beta version of the release available. If you would like to leverage this beta release that currently solves many of the issues introduced in version 11.0.2 and also will help us accelerate confirmation testing, please contact [email protected]

  6. resolved Jul 26, 2024, 12:15 AM UTC

    MET/CAL version 11.0.3 has been released to resolve the known issues identified in this incident. MET/TEAM Users: https://us.flukecal.com/literature/software/met-cal/minor-update/update-met-cal-version-1103-met-team-version-331 MET/CONNECT Users: https://us.flukecal.com/literature/software/met-cal/minor-update/update-met-cal-version-1103-met-connect-version-211 Any bugs identified in this release should be reported to technical support at [email protected]

  7. postmortem Jul 26, 2024, 12:15 AM UTC

    ### Introduction This postmortem aims to provide a transparent account of the issues encountered with MET/CAL version 11.0.2 and the steps we have taken to prevent such incidents in the future. ### Incident Summary MET/CAL version 11.0.2 was intended to be a minor release, targeting specific known issues primarily related to instrument communications. However, a process failure led to unintended code changes being included in this release. ### What Went Wrong During development, additional work was undertaken to replace insecure functions with secure counterparts to prevent potential memory corruption issues. These changes were not part of the original scope of version 11.0.2 and were not extensively tested or reviewed due to process failures. ### Impact The inclusion of these untested changes led to unforeseen issues affecting the stability and performance of the software, impacting our users' experience. ### Corrective Actions In developing version 11.0.3, we took the following steps to address the issues: 1. **Code Review**: Re-evaluated all areas where function replacements were made, ensuring correct implementation. 2. **Additional Protections**: Developed wrapper functions to add extra layers of protection against failures. 3. **Process Improvements**: Implemented changes to ensure all code alterations undergo adequate review and testing before release. 4. **Additional Hardware Testing**: Performed extensive regression testing with a variety of FSCs to confirm proper operation in multiple configurations. ### Moving Forward We apologize for any inconvenience caused by this incident. We are committed to maintaining the highest quality standards and have taken significant steps to improve our processes to prevent similar issues in the future. Thank you for your understanding and continued support.