11 Dec 2017, 17:41

How to not break your site with Service Workers

Over the past few months I’ve been dabbling with Progressive Web Apps (PWA), but earlier this month I was lucky enough to be invited onto a PWA training course at Google. The course was fantastic and it was good to get a more granular understanding of the underpinning technologies. Coincidently, you may have seen (very) recent announcements about support for Service Workers in Edge and also support in preview for Safari. This feels like a great time to use my recent experience at work turning our web app into a PWA and my learnings from the course to explain how to avoid getting into some very tricky situations.

The post will benefit you most if you have some understanding of Service Workers, but for those of you who aren’t familiar a Service Worker is:

...a type of web worker. It's essentially a JavaScript file that runs separately from the main browser thread, intercepting network requests, caching or retrieving resources from the cache, and delivering push messages." - Google

In essence a Service Worker is a network proxy that you can program, amongst other things. With the Service Workers and the Cache API we can intercept responses and add additional logic and behaviour, for example offline caching or saving bandwidth on network requests by hitting the cache first.

As a very barebones example we could imagine we have a service-worker.js file that we register in our page, that caches the files in the page:


    var filesToCache = [
        '.',
        './images/cat.gif',
        './images/cat2.gif'
    ];
    var staticCacheName = 'cache-v1';

    // Wait for the Service Worker to reach the 'install' event
    self.addEventListener('install', function(event) {
        event.waitUntil(
        caches.open(staticCacheName)
            .then(function(cache) {
                // Cache all of our files in the cache named 'cache-v1'
                return cache.addAll(filesToCache);
            })
        );
    });

After this we could then listen for the fetch event and return the cached files if they match (see a full example here). Sounds powerful right? It certainly is, and there are many mistakes to make other than the ones I am about to articulate, as a few examples:

  1. Alex Pope’s talk at JSConf 2017 on superb and entertaining talk on ‘Zombie Service Workers’
  2. Antonio Calapez blog post on migrating Hackages domains and the problems Service Workers caused here with regards to redirects.
  3. Kollegorna post on Service Worker gotchas

The point I want to discuss revolves around how Service Workers power caught me off guard a few weeks ago at work in a very specific and unfortunate way. We had a support call from a customer saying the app wasn’t loading correctly. Upon further investigation it turned out a page of our app had cached a script tag, who’s src attribute points to a JavaScript file that no longer existed in our index.html. Let me show you how this might hypothetically pan out in a general case.

A broken dependency

Lets assume we have a standard web page, and we are using a Service Worker which caches assets, aside from a script called dependency.js. In our index.html for that page, we have dependency who’s URL dies/changes. We have a user who has visited the page before, caching the index.html page in question. Let’s demonstrate what could potentially happen in that scenario if the external script was to break and we try and continue to register the Service Worker:


    <!-- Let's say it's a dead URL. Even if we change this
        we wont be able to bust the cache because the Service Worker
        never registers -->
    <script src="dependency.js"></script>
    <script>

        if ('serviceWorker' in navigator) {

            // Dependency doesn't exist, an error is thrown above our registration code
            if (!expectedDependency) {
                throw "Oh no";
            } 

            if ('serviceWorker' in navigator) {

                // Will never be called because error is thrown.
                // New Service Worker never has a chance to register/install,
                // cache doesn't bust
                navigator.serviceWorker.register('service-worker.js')
                    .then(function(registration) {
                        console.log('Service Worker registration successful with scope: ',
                        registration.scope);
                    })
                    .catch(function(err) {
                        console.log('Service Worker registration failed: ', err);
                    });

            }

        }

    </script>

As you can see it’s not looking good. We could end up with a scenario where the user essentially caches a broken version of the app and there’s no obvious way out. In order for the site to work again we need to bust the cache, which can only happen if the new Service Worker registers/installs. To resolve this scenario, we’d need to fix the dead URL (again potentially impossible).

In reality if we were testing our app offline, we would have seen this problem straight away as we would notice a hard requirement for dependency.js to be in memory for the Service Worker registration to occur. Something to think about!

For a full demo, please see the GitHub repo I created alongside this article.

Changing the Service Worker file name

In a similar capacity, let’s say we decided to change the name of our Service Worker. This can be problematic, as it is possible that the previous Service Worker name is in the previous cache. Let’s demonstrate this:


    if ('serviceWorker' in navigator) {

        // Step 1: Observe the current Service Worker name
        var serviceWorkerName = 'service-worker.js'

        // Step 2: Load the page so `index.html` caches 

        // Step 3: Rename the Service Worker file to the new file name
        //var serviceWorkerName = 'service-worker-renamed.js'

        navigator.serviceWorker.register(serviceWorkerName)
            .then(function(registration) {
                console.log('Service Worker registration successful with scope: ',
                registration.scope);
            })
            .catch(function(err) {
                console.log('Service Worker registration failed: ', err);
            });
        }

        // Step 4: The old Service Worker is cached as `index.html` points to 
        // the old Service Worker, which may have been deleted or have incorrect logic.
    }

Again, for a full demo, please see the GitHub repo.

So, what to do?

Both situations are subject of the same problem; you cache index.html, but the resources it points to have changed (in the first case it’s a dependency in the second case the Service Worker itself).

How could we mitigate against this scenario? Here’s some strategies you may want to think about:

  • Let Service Worker registration logic execute before the app/dependency logic, preventing interference from app logic and errors.
  • Test your Service Workers and the registration life cycle. You can write an automated test to prevent against these types of problems. See Matt Gaunts post for more details on this.
  • If that’s not possible, experiment with removing dependencies locally. How does the app behave? Does Service Worker registration logic still run?
  • Implement conditional statements checking for any required variables, objects, functions etc
  • Think about whether you could cache the dependencies in question
  • Think critically about where your assets are hosted. Could this ever cause a problem if something was cached incorrectly?

Testing will provide the best safety net against this scenario, and defensive coding practices can help prevent it further. As a final thought. But what if something goes drastically wrong in production?

Indeed there is no way to remotely disable a Service Worker for our users. As a thought experiment (rather than a recommendation), what if we could implement one? Let’s take a look how we might do that. Let’s start off with a script tag that executes before (notice the lack of async/defer) the rest of our apps code:

<script src="service-worker-killswitch.js"></script>
<!--> The rest of your app's JavaScript -->

Here we keep the file empty unless something goes drastically wrong in production and you have users complaining the app is broken. In that case you could fill the file with code for unregistering the Service Worker like this:

    if ('serviceWorker' in navigator) {
        navigator.serviceWorker.getRegistrations().then(function(registrations) {
            for (var registration of registrations) {
                registration.unregister();
            }
        });
    }

In this scenario the cached index.html would fire off a deregistration which would stop the Service Worker taking control of the app’s network requests, fixing any caching issues.

The killswitch is really not an ideal however. There are many problems with this, apart from feeling like a complete hack, we will struggle to know who has deregistered there Service Worker and who hasn’t and we’ll always be deregistering peoples service workers for some given period of time. We could introduce some additional logic to figure out if the user has been through the deregistration process for the most recent killswitch to potentially circumvent this, but this also feels brittle.

Conclusion

In summary Service Worker, Fetch and Cache APIs are powerful and granular tools. They give us a lot of capability around the network, caching and offline capabilities. As web developers we need to be aware of tricky situations new technologies can leave us in, and as I have been taught harshly, understand them fully before shipping them to our users. Specifically it has been demonstrated that if we cache a version of an app which somehow prevents the cache itself from being busted, we could end up with a user with a more or less permanent broken version of the web app.

Hopefully this post has been useful, and simply having an understanding of how you could potentially get into this situation is enough to guide you through implementing Service Workers sensibly. If you want to see the two scenarios in action and an example of a killswitch, please feel free to checkout the previously mentioned GitHub repo. Just remember you will need to unregister the Service Workers on local host when switching between the demos. In Chrome you can do this using the Developer Tools like so: