arrow-left arrow-right brightness-2 chevron-left chevron-right circle-half-full dots-horizontal facebook-box facebook loader magnify menu-down rss-box star twitter-box twitter white-balance-sunny window-close
Speed up your Gatsby application's build time by 300% with incremental builds
10 min read

Speed up your Gatsby application's build time by 300% with incremental builds

Speed up your Gatsby application's build time by 300% with incremental builds

🤔 Introduction

Gatsby Incremental Builds is a new feature in the Gatsby framework that enables build caching. When you build your Gatsby application using gatsby build, it's common for a lot of your site to stay the same - for instance, if I add a new blog post to my site, I might find that the only pages that should change are ones where that new blog post may show up: the archive page, the home page, and of course, the blog post page itself. In the past, Gatsby applications would rebuild everything on your site - while it adds to your site's build time, this ensures that every part of the site stays up-to-date.

With the release of Incremental Builds, Gatsby is now able to introspect into the .cache and public directories created by past application builds, and determine which parts of the site need to be rebuilt. For everything else that's stayed the same, the build process will just pull in existing data: this leads to much faster build times for most applications.

Gatsby is strongly encouraging that you try incremental builds via Gatsby Cloud, their hosting service. While the incremental build integration in Gatsby Cloud looks quite slick, the underlying work that makes it possible is integrated into the open-source framework, so we can use it in our existing CI tools without having to pay $99/mo for Gatsby's cloud offering.

In this tutorial, I'll show you how to add Incremental Builds to your site using GitHub Actions - a CI/workflow tool built right into GitHub, and free for public repositories - but you can also adapt this code and the principles behind incremental builds into whatever CI tool you're using.

Gatsby's blog post announcing Incremental Builds promises under ten second builds - in my testing, I haven't found it to be that fast, but the speed implications for many sites are quite impressive.

To test Incremental Builds effectively, I used Gatsby's own documentation site. Remarkably, I found that building the Gatsby docs with GitHub Actions without incremental build optimizations took almost thirty minutes! It's a testament to how big JAMStack sites can be that Gatsby can chug along for thirty minutes finding new pages to build. When I introduced incremental builds in my workflow, the build time was reduced to an average of nine minutes - an over 300% decrease in build time!

Gatsby Documentation Website (gatsbyjs.org/docs)

That being said, for many sites, the additional complexity of caching may not be worth it. In my testing of smaller sites, where the average build time is under a minute, the addition of incremental builds reduced the average build time by mere seconds.

Blog Template (https://github.com/signalnerve/gatsby-incremental-builds-gh-actions-example)

If you find that your site is building that quickly, you may find that other optimizations such as reducing the time to deploy (an exercise I've been working on with wrangler-action, an action I maintain for deploying Cloudflare Workers applications) will be a more effective way to speed up your build/deployment process.

☑️ Guide

If you're looking for a tl;dr about how to enable incremental builds in your project, the process can be reduced to four steps:

  1. Opt into incremental builds with an environment variable
  2. Cache your application's public and .cache directories
  3. Begin building your application
  4. (optional) Add flags to gatsby build to understand how/when files are changing

I'll explore each of these steps through the lens of GitHub Actions, but porting these steps to CircleCI or other CI applications should be fairly straightforward.

If you aren't familiar with GitHub Actions, check out the tutorial I published on YouTube about it. It's super easy to get started, and it's a great thing to have in your tool belt.

🍰 Using a sample workflow

Many readers of this tutorial may not be currently using GitHub Actions with their Gatsby applications - to help you get started, I've provided a sample workflow that installs your project's NPM packages and builds the application. While I personally use the Yarn variant, which has the added benefit of caching your NPM packages (another big improvement to build time), you may prefer to use the straightforward NPM variant. Pick one of them and commit it in your repository as .github/workflows/build.yml:

# .github/workflows/build.yml

on:
  - push

jobs:
  build:
    runs-on: ubuntu-latest
    name: Build
    steps:
      - uses: actions/[email protected]

      # Simple NPM variant
      - name: NPM install
        run: 'npm install'
      - name: Build app
        run: 'npm run build'
      
      # Yarn variant with caching
      - name: Yarn cache directory
        id: yarn-cache-dir
        run: echo "::set-output name=dir::$(yarn cache dir)"
      - name: Yarn cache
        uses: actions/[email protected]
        with:
          path: ${{ steps.yarn-cache-dir.outputs.dir }}
          key: ${{ runner.os }}-yarn-${{ hashFiles('**/yarn.lock') }}
          restore-keys: |
            ${{ runner.os }}-yarn-
      - name: Yarn install
        run: 'yarn install --pure-lockfile'
      - name: Build app
        run: 'yarn run build'

Both workflows make use of the build script as a simple alias for gatsby build. We'll iterate on this further in the next section, but for now, ensure that your package.json contains the build script under the scripts object:

{
  "scripts": {
    "build": "gatsby build"
  }
}

I've created a sample repository that you can also refer to on GitHub, whether you'd like to copy-paste the code, or even fork it for your own projects. You can find it at signalnerve/gatsby-incremental-builds-gh-actions-example.

signalnerve/gatsby-incremental-builds-gh-actions-example
Example Gatsby Incremental Builds + GitHub Actions Project - signalnerve/gatsby-incremental-builds-gh-actions-example

🧗‍♀️ Opt into incremental builds

As documented in Gatsby's "Experimental Page Build Optimizations for Incremental Data Changes" documentation, opting into Gatsby's new (and experimental) incremental builds feature can be done by proving an environment variable, GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES, and setting it to true:

GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build

It's important to understand how the incremental build process works, particularly when a total site rebuild happens, versus an incremental rebuild. When a Gatsby application builds, the content of the site comes from two sources: the code of the site (HTML, CSS, and JavaScript), and data - whether it's internal to the site (Markdown files and other local content), or external (APIs, CMS tools, etc).

Gatsby incremental builds focus on data: when the data from a headless CMS or API changes, Gatsby can compare the current cached version of the data and compute what incremental changes need to happen. When code changes on your site, Gatsby will force a total site rebuild. This is covered in the docs, but I missed it as I was experimenting with this project, so I want to call it out to reduce future confusion. Via the docs linked above:

If there are any changes to code (JS, CSS) the bundling process returns a new webpack compilation hash which causes all pages to be rebuilt.

My preferred way to add the environment flag for opting into incremental builds is via a new script in package.json - this way, we can run the traditional gatsby build via something like yarn run build, but move onto incremental builds without needing to do anything but change the script we call in CI. To do this, I'll define the build:incremental script in package.json:

{
  "scripts": {
    "build": "gatsby build",
    "build:incremental": "GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build"
  }
}

In my application's GitHub Actions workflow, I'll update the build step, and use build:incremental instead of build:

# .github/workflows/build.yml

jobs:
  build:
    name: "Build Gatsby app"
    steps:
      # previous steps
      - name: Build app
        run: 'yarn run build:incremental'

📦 Cache your application's directories

For incremental builds to work, your build workflow needs to cache any artifacts produced when Gatsby builds your application. At the time of writing, these two folders are public and .cache.

GitHub Actions' caching action, actions/cache, supports persisting directories produced during your workflow. To implement it, we'll add actions/cache to our workflow, and for each directory, pass a path and key to the action, indicating that we want to cache the directory:

# .github/workflows/build.yml

jobs:
  build:
    name: "Build Gatsby app"
    steps:
      # previous steps
      - name: Gatsby Cache Folder
        uses: actions/[email protected]
        with:
          key: gatsby-cache-folder
          path: .cache
      - name: Gatsby Public Folder
        uses: actions/[email protected]
        with:
          key: gatsby-public-folder
          path: public
      - name: Build app
        run: 'yarn run build:incremental'

🛠 Begin building your application

With caching and the new build:incremental script added to your workflow, we can now begin using incremental builds! GitHub Actions is event-based, meaning that the workflow will run when events occur in your repository.

Using the workflow provided in this tutorial, our workflow will be run via the push event, which is triggered whenever a user pushes commits to the repository. At this point, you can begin to work on your application as you normally would - making changes to your data, adding new content, etc. The mechanisms for incremental builds should occur on your second commit to your repository after merging your workflow updates:

  1. Commit the new workflow improvements: using the incremental builds environment variable, and caching the public and .cache directories
  2. Make any change to your application (first commit: directories will be cached)
  3. Make an additional change to your application – the previously cached data will be loaded at the beginning of the workflow (second commit: incremental builds should begin here!)

Here's some screenshots of my experiments with incremental builds. The first repository is the previously mentioned Gatsby docs repository which takes around thirty minutes to build:

Initial builds for the Gatsby documentation site take, on average, 27 to 30 minutes

When the directories are cached and start being used in the workflow, the build time drops dramatically, down to around nine minutes:

Adding incremental builds reduces the build time by around 300%

With a smaller repository, signalnerve/gatsby-incremental-builds-gh-actions-example, the build time begins at around two minutes:

Initial builds for the blog template take, on average, 110 to 120 seconds

When incremental builds kick in, the build time reduces to a little over a minute:

Adding incremental builds reduces the build time by around 35%

🚩 (Optional) Add gatsby build flags

To better understand when your content is being cached, Gatsby provides some additional flags that can be passed to gatsby build to provide output regarding incremental builds:

  • --log-pages: outputs file paths that are updated or deleted
  • --write-to-file: creates .cache/newPages.txt and .cache/deletedPages.txt, which are lists of the changed files inside of the public folder

Because we're building our Gatsby application inside of a CI workflow, I prefer to see the changed files via my workflow's output, using the --log-pages flag. To implement this, we can add the --log-pages flag to the build:incremental script:

{
  "scripts": {
    "build:incremental": "GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build --log-pages"
  }
}

Via the Gatsby documentation, you should begin to see output like this in your workflow:

success Building production JavaScript and CSS bundles - 82.198s
success run queries - 82.762s - 4/4 0.05/s
success Building static HTML for pages - 19.386s - 2/2 0.10/s
+ success Delete previous page data - 1.512s
info Done building in 152.084 sec
+ info Built pages:
+ Updated page: /about
+ Updated page: /accounts/example
+ info Deleted pages:
+ Deleted page: /test

Done in 154.501 sec

As a further exercise, you may find that the --write-to-file flag may be a good way to output how your project is changing via GitHub comments, or potentially to tools like Slack or Discord! Since I'm a "team of one" on many of my sites, I haven't taken the time to implement this, but if you try it, let me know - I'd love to include a sample in this tutorial!

🙅‍♂️ GitHub Actions Caveat

I want to mention a caveat here around the GitHub Actions + Gatsby incremental builds work, which is the interplay between events and caching.

At time of writing, the actions/cache action provided by GitHub only works on push and pull_request events. This means that if you're building your Gatsby application via other events, such as the very handy schedule event, which allows you to run workflows on a recurring "cron"-style schedule (e.g. "every hour" or "six times a day"), and the repository_dispatch event, which is commonly used as a webhook for triggering new application builds when your external APIs or CMS data changes.

This is currently being fixed by the maintainers of the actions/cache action, with a pull request open to bring caching to all workflow events. In the meantime, this means that for many "true" JAMStack applications, where a lot of data lives outside of your actual repository, you may find that this work isn't super useful quite yet. I've seen movement on that PR in the last few days, as I've been writing this tutorial, so I'm hoping it'll be merged in the next few weeks - when that happens, I'll happily remove this caveat, and opt in to super fast incremental builds on all of my Gatsby projects!

🙋‍♂️ Conclusion

I'm really excited about this work, and about the optimizations that the Gatsby team is making to the framework to reduce build times. In my video about incremental builds (embedded at the beginning of this tutorial), I mentioned that this improvement has made me excited again about optimizing my workflows: I'm taking the momentum from Gatsby incremental builds and bringing it to the other things I use GitHub Actions for, like deploying my projects to Cloudflare Workers using wrangler-action.

Since I completed this work, I've come back to my own custom actions and I'm now focusing on trying to reduce the execution time for all of them - I still haven't reached the "under 10 second builds" statistic that the Gatsby team has mentioned, but I'm getting close!

If you enjoyed this tutorial, consider subscribing to the Bytesized YouTube channel! I covered this effort for the channel and I'd love to hear from you in the video comments about other things you'd like to see covered in the Gatsby world. I release new videos over there on a weekly basis covering software development, especially web development, serverless programming, and JAMStack.

I also organize Byteconf, a free + remote developer conference series, where Gatsby has been covered numerous times at our past conferences. Every talk from the past few years of conferences is on the Bytesized channel, but I'll also link a few of my favorite vids we've done on Gatsby for you to check out below!

💬 Are you using Gatsby incremental builds? Let me know in the comments! I'd love to hear if this has made your site faster, and if you've taken this work and integrated it into your other CI tools.

Enjoying these posts? Subscribe for more

Join
Already have an account? Sign in
You've successfully subscribed to Bytesized Code.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.