Content hashes are fixed in Webpack v5

Just over three years ago I shared the struggles my team at the Financial Times faced trying to get Webpack 4 to compile files with consistent names across the 20+ separate codebases which serve FT.com. We fought to achieve consistency so that our users could navigate between our different services without needing to download the same things over and over again and to avoid teams busting the cache multiple times a day each time they pushed changes into production. Whilst we were eventually able to achieve our goal of generating output with a high level of consistency across our apps, getting there was very difficult and required developing some complex solutions.

Webpack 5 was released soon after my post in late 2020 and one change to content hashes in particular would have made our struggles almost entirely redundant.

The test case

To demonstrate the changes to content hashes between Webpack 4 and Webpack 5 I’ve chosen to use the Todo MVC Vanilla ES6 app. It has only 7 source files so it’s small but just complex enough to illustrate Webpack’s new and old behaviour clearly.

I’ve visualised the Todo MVC app dependency graph below using Dependency Cruiser and Graphviz. This graphic shows the relationships between all of the app’s source code modules, starting from app.js - the entry point - on the left:

The dependency graph of the Todo MVC app

I’ll bundle the app using the configuration below. I’ve setup the split chunks plugin to output each module within its own output file - or chunk - rather than bundling everything into a single file to clearly demonstrate the differences between versions of Webpack. I’ll use the same config file for both Webpack 4 and 5 tests.

const path = require('path')

const moduleName = (file) => path.basename(file, path.extname(file))

module.exports = {
    mode: 'production',
    entry: {
        main: './src/app.js',
    },
    output: {
        path: path.resolve(__dirname, 'dist'),
        filename: '[name].[contenthash:6].js',
        chunkFilename: '[name].[contenthash:6].js',
    },
    optimization: {
        splitChunks: {
            chunks: 'all',
            cacheGroups: {
                modulesToChunks: {
                    name: (m) => moduleName(m.resource),
                    enforce: true
                }
            }
        }
    }
}

Using this configuration Webpack emits 8 files; the 7 source code modules all have an equivalent output chunk generated, and an extra file named main has appeared too which contains the Webpack runtime code needed to stitch all of the separate files together again in the browser. Each file also has a 6 character content hash added to its name which is used to track the changes within.

  • app.5aa1aa.js
  • controller.4265cf.js
  • helpers.26e991.js
  • item.757ae7.js
  • main.a24d27.js
  • store.78b475.js
  • template.a21d0f.js
  • view.1115f5.js

Now that the app is being compiled as planned it’s time to make some changes to the source code and observe what happens to the output file names. I’m going to modify the application’s entry point by changing the order of its dependencies, switching the first reference into last place:

-import Controller from './controller';
import { $on } from './helpers';
import Template from './template';
import Store from './store';
import View from './view';
+import Controller from './controller';

After running Webpack again the chunk which contains the module I edited has a new content hash as expected but this is not the only file which has a new name…

Chunk name Has changes? Original hash New hash
app Yes 5aa1aa ed729d
controller No 4265cf e502be
helpers No 26e991 26e991
item No 757ae7 757ae7
main No a24d27 a24d27
store No 78b475 056f2f
template No a21d0f 89aa7e
view No 1115f5 58dbd6

Oh dear. I made a tiny change to one file but the result is a cascade of new content hashes names, affecting 5 of the 8 output files despite nothing changing inside any of them. Shipping a changeset like this would force users to download assets which are identical to the ones they already have and make their experience slower.

Why were there so many changes?

The [contenthash] appended to the output file names are not only based on the code they contain but also data points which track the relationships between the source files. In Webpack 4 the hashes are constructed (roughly) like this:

Output chunk hash = the chunk ID
    + the hashes for each module inside the chunk

JS module hash = the module ID
    + a list of module dependency IDs
    + the module source code
    + the names of exported properties
    + the names of exported properties marked as used

Because the change I made caused the source code modules to be discovered in a different order this changed their incrementally assigned IDs. As the module hashes also include the IDs of their dependencies, not only did the module I change get a new hash but every other module with dependencies did too. The only chunks which didn’t get a new hash generated are the Webpack runtime and those containing modules with dependents but no dependencies:

The dependency graph of the Todo MVC app with changes highlighted

This hashing behaviour is what caused us so many problems at the FT when trying to make 20+ apps all compile identical assets because the order their dependencies were found and exactly how they were used always varied.

Testing with Webpack v5

I’m going to run through the same steps as before; bundling the original app source code and recording the output file names, then make the change to imports and bundle the app again.

This time however I’m going to use Webpack v5 which uses a new content hash algorithm by default:

$ npm install webpack@^v5

And after completing all of the steps I recorded the following sets of output:

Chunk name Has changes? Original hash Hash after change
app Yes 0a4820 d97c82
controller No 42fc14 42fc14
helpers No 658f80 658f80
item No 22b3fd 22b3fd
main No 8dc1e0 8dc1e0
store No 8ce8c8 8ce8c8
template No b04d02 b04d02
view No 3fa1e6 3fa1e6

This time the change I made did not cause a cascade of name changes to the other files. Instead the new realContentHash feature has generated content hashes based only on the final contents of each chunk rather than a combination of development data 🎉

So is it worth upgrading?

If you’re working on projects which are regularly deployed to production and also have regular repeat visitors then I’d recommend upgrading your Webpack build processes to use Webpack v5. It’s new optimisation options do help to avoid unnecessary cache busting - by default - which will make your website faster and improve the experience for your users.

Screenshot of to-do MVC with one item