JS SDK file not working on IE 11 since 11:30 AM PT (18:30 UTC) on Sunday June 7

Incident Report for LiftIgniter

Postmortem

Starting Sunday, June 7, 2020, 11:26 AM Pacific Time (PDT), LiftIgniter's JS SDK stopped working on Internet Explorer (IE) 11 and lower. The problem was fixed on Tuesday June 9 at 7:18 AM PDT. A minor related problem was fixed on Friday June 19, at 1:48 PM PDT.

As a reminder, LiftIgniter attempts to support only IE 9 and higher, so the affected browser versions are IE 9, IE 10, and IE 11.

Event timeline

  1. On Sunday, June 7, 2020, at 18:26 UTC (11:26 AM PDT), LiftIgniter released a new version of its browser-client, the JS file that supports LiftIgniter's JavaScript SDK.
  2. On Tuesday, June 9, after reports from two customers that the JS file was not working on IE 11, LiftIgniter investigated and pushed a hotfix that caused the JS file to resume working on IE 9, 10, and 11. The fix was pushed at 14:18 UTC (7:18 AM PDT). However, diagnostic and debugging functionalities (specifically, $p("runDiagnostics") and $p("printDebugInfo")) still did not work on IE 9, 10, and 11. More generally, diagnostic and debugging functionalities were unavailable whenever window.Promise did not exist, with IE 9, 10, and 11 as example browsers.
  3. On Friday, June 19, at 20:48 UTC (1:48 PM PDT) we pushed out the fix making diagnostic and debugging functionalities (specifically, $p("runDiagnostics") and $p("printDebugInfo")) available on browsers that do not have window.Promise defined, including IE 9, 10, and 11. The way the fix works: if the user runs $p("runDiagnostics"), and the global window.Promise isn't defined, the user is prompted to load the global promise polyfill using $p("loadPromise"). After this is loaded, the user can rerun the diagnostic command and it should work.

Cause and impact

General background of changes we were trying to make

LiftIgniter has an ongoing project to reduce the size of the browser-client JS file. As one step toward this, we moved two of our JS file's diagnostic functions, $p("runDiagnostics") and $p("printDebugInfo"), to a separate "Debug" chunk file, that is lazy-loaded whenever somebody first runs either of the commands. The main JS file is therefore shrunk somewhat. Since most end users will not run diagnostic functions, we save on the total amount of data downloaded for a typical user, while still keeping diagnostic functionality available.

Cause of the JS file not working at all on IE 11 and lower: use of document.currentScript

The implementation of lazy loading used the document.currentScript construct. This is not available on IE 11 and lower. As a result, the JS file was crashing on these browsers. Because of the place this crash was occurring, it did not even report any errors to LiftIgniter's backend.

Cause of diagnostic functions not working after the initial fix: use of window.Promise

After we fixed the issue with document.currentScript, LiftIgniter's JavaScript file was now working on IE 11 and lower. However, the actual loading of the Debug chunk files for diagnostic functions still failed, because this loading relied on window.Promise being available, and window.Promise is not available in IE 11 and lower.

Therefore, on IE 9, 10, and 11, diagnostic functions $p("runDiagnostics") and $p("printDebugInfo") were only available on customer sites that were already doing a global polyfill for window.Promise prior to the execution of the diagnostic function.

Learnings and improvements for the future

LiftIgniter has implemented three broad categories of system improvements to reduce the risk of similar problems in thee future:

  1. Automated alerts around traffic volume by browser family
  2. Reinstatement of IE 9 and IE 11 checks in the release process
  3. More IE 9 and IE 11 compatibility testing in pull request review

Automated alerts around traffic volume by browser family

At the time of this faulty release, LiftIgniter's automated alerting included alerts around overall traffic volume, traffic volume by customer, as well as traffic volume by country. Our release process also included a manual review of global traffic patterns. However, there was no automated alerting around traffic volume by browser family, and the share of traffic of IE 9, 10, and 11 is small enough (close to about 0.1%) that its impact on overall metrics can hardly be noticed.

We have now added automated alerting around traffic volume by browser family. With this alert in place, if traffic from the "IE" family drops to zero, an alert will trigger in about 30 minutes. The alert does not separately check for traffic levels by individual browser versions (e.g., IE 9 versus IE 10 versus IE 11) because data at that level of granularity can be too noisy.

Reinstatement of IE 9 and IE 11 checks in the release process

Historically, the release process for the LiftIgniter JS file included manual compatibility checks with IE 9 and IE 11. At some point, we removed these checks. The removal was due to a mix of factors, including the fact that the test sites we were using to test IE 9 and IE 11 compatibility either stopped supporting those browsers or stopped using LiftIgniter. We also thought the checks are probably not necessary because we had rarely found any of our incremental changes break these browsers.

The recent incident highlights the importance of these checks, so we are adding them back to the release process, and also changing the sites used for testing to ones that we expect should continue to function properly and use LiftIgniter for the foreseeable future.

More IE 9 and IE 11 compatibility testing in pull request review

Ideally, we would like to identify browser compatibility problems even before getting to the release stage. One aspect of this is to include compatibility testing when reviewing pull requests that affect the source code of our JS file. We will be doing this more often, particularly for pull requests that use nontrivial JavaScript constructs that may not be available on older browsers.

Posted Jun 22, 2020 - 15:04 UTC

Resolved

On Friday, June 19, 2020, around 20:48 UTC, LiftIgniter released an updated version of the JS SDK file that fixes the unavailability of diagnostic and debugging commands (specifically, $p("runDiagnostics") and $p("printDebugInfo")) on IE 9, 10, and 11. With the new version, if it is found that the browser does not support window.Promise, the user is prompted to load a global promise polyfill using $p("loadPromise"), and then rerun the diagnostic command. This should allow users to access diagnostic and debugging functionality on IE 9, 10, and 11, where it was previously unavailable due to window.Promise not being present on these browsers.

Customers who are loading their own global promise polyfill are unaffected; diagnostic and debugging commands would already have been available to them in IE 9, 10, and 11 prior to this fix.
Posted Jun 22, 2020 - 13:56 UTC

Monitoring

We have deployed the first fix to the JS SDK file (the fix was released at 7:18 AM PDT, or 14:18 UTC). The core functionality of the file should now work on IE 9, 10, and 11. Diagnostic and debugging commands are still not available. We have also communicated with the customers who reached out individually regarding this.
Posted Jun 09, 2020 - 15:02 UTC

Identified

On Sunday June 7, LiftIgniter released a new version of its JavaScript SDK file. The new version used a smaller file, with some diagnostic and debugging functionalities moved to separate chunk files. The logic used to handle these chunk files was not compatible with IE 11 and earlier versions of IE, so the script stopped working for these users.

We have a hotfix ready and are rolling it out. The hotfix still does not provide diagnostic functionalities on IE 11 and below, but at least the core JS SDK works correctly on these browsers now.
Posted Jun 09, 2020 - 14:17 UTC