Posts By: Will Keleher

18,957 tests in under 6 minutes: ClassDojo's approach to backend testing

We're pretty proud of our backend test suite. We have a lot of tests, and developers can run the full test suite locally in under six minutes. These aren't simple unit tests—they're tests that hit multiple databases and routes with only a minimal amount of stubbing for external dependencies.

9 years ago, we were proud of running 2,000 tests in 3 minutes. Not much has changed from that initial post—we're still writing a bunch of tests, but we've put a lot of effort over the years into making sure our test suite has stayed acceptably fast for people.

Why do we run our tests this way?

First off though, why are we making things so hard for ourselves? When we write our tests, we don't stub out our databases at all. Many of our tests are resource tests—those tests hit a real running server, the resource code issues real queries against Redis/MySQL/MongoDB/memcached containers, and if it makes any changes to those databases, we need to reset the databases fully before the next test run.

We think that the database is an integral part of the system that we're testing. When you stub a database query, that means that you're not testing the query. And I don't know about you, but I've gotten plenty of database queries wrong.

Similarly, we like to run a full server for any resource level tests. The middleware that runs for each resource matters. We want our tests to match our production environment as much as possible.

How do we make the tests fast?

You'll see recommendations online to limit this style of testing, where you have full databases that you're querying, not because it's worse, but because it ends up being too slow. We've needed to put a lot of work into test speed over the years, and if you take nothing else away from this post it should be that if you treat test speed as an organizational priority, you can make a pretty big impact.

  1. Make sure engineers have fast computers. First off, if we had done nothing else over the past 9 years, our tests would have gotten faster because computers have gotten better over that time period. And we make sure to buy nice computers for engineers on our team because we care that tests and builds are speedy. (The M1 and M2 chips for Macs have been quite nice!)

  2. Use Orbstack rather than Docker Desktop. The single easiest change to speed up our tests was switching from Docker Desktop to Orbstack to run containers locally. On Macs, it is so much faster than Docker Desktop. On some engineers' machines, tests run twice as fast on Orbstack. Aside from speed, we've found that it's been more stable for folks—fewer randomly missing volumes and needing to restart Docker to get things going again. We're huge fans.

    That said though, it's worth noting using docker/orbstack will still slow down your tests. If we ran databases directly on our machines rather than through docker, our tests would be faster. But the extra effort to get everyone to install and maintain MySQL, Redis, MongoDB, Memcached, and everything else just isn't worth the test speed increases that it brings for us. Other organizations might have different trade-offs.

  3. Speed up fixture resets. The slowest part of our tests is resetting fixtures. Whenever one of our tests writes to a database, we need to undo those changes before the next test starts. The core trick to doing this quickly is to only undo changes to the tables that actually changed rather than resetting every single table. All of our database operations go through the same code, so it's relatively straightforward to track which tables are "dirty" and then only reset those tables.

    A few details:

    • We tested out tracking things at the row level rather than the table level for resets, but it didn't improve performance. For MySQL, our basic table resetting strategy is turning off foreign key checks, truncating the table, and then using LOCAL DATA INFILE loads from a volume that's mounted into the MySQL container.
    • For MongoDB resets, we found that the fastest technique was creating a shadow collection for every collection that we could restore from whenever we needed to.
    • When there's a hard MySQL delete, we don't know whether it might be a cascading delete, so we have code that counts how many rows are in each table. If a table has fewer rows after the test we should reset it. And if it doesn't have fewer rows because data has also been inserted, our regular code will have marked that table as dirty.
    • For MySQL updates, we have some (slightly janky) code to pull out the list of tables that might be updated by the update query when it's a query with multiple tables.
  4. Run tests in parallel. The next important piece to fast tests is being able to run the test suite in parallel, which means we need multiple copies of our databases. This was a relatively straightforward task that took a lot of blood, sweat, and tears to actually make happen. We use mocha to run our tests which supports a --parallel option, so our tests look for MOCHA_WORKER_ID in the environment to decide which database to connect to: test_db_${MOCHA_WORKER_ID}.

  5. Measure what's slow. Like any other optimization problem, the first step is measuring how long things actually take. Having guesses about why tests are slow can lead to a ton of wasted effort that doesn't actually move the needle. We haven't done any fancy profiling here—instead, we hook into our existing instrumentation to generate a report of where time is being spent over the course of our tests. It's not perfect, but it gives us a good enough sense of where time is going to be useful.

The future

We're proud of where our tests are, but there's still a ton of space to improve things over the next 9 years. We're going to keep writing lots of tests, so we need to make sure that those tests continue to be speedy. A few things that are on our minds:

  • Setting up fixture "scenarios." Currently, we always reset our fixtures back to the same base scenario. That base scenario is difficult to change because it can impact a huge number of tests if we're tweaking fixture data that is used in a lot of spots, so it'd be nice to have better support for temporarily setting up a new "base" state that multiple tests can reference.
  • Retrying the row-level resets. In theory, these should be faster than truncating and restoring the tables, so we want to try out better profiling to make that happen.
  • Improving our Redis fixture resets. Redis is fast enough that we've been lazy with our Redis fixture resets—we just flush the db and then restore it, so there's room to improve performance
  • Run our tests with some more detailed profiling to generate a flame-graph to see if there are any hotspots in our code that we could improve. Funnily enough, we've actually sped up our production app a few times when optimizing our tests—things that are slow in testing are often slow in production too!

    Image of test file with a fixtureId imported. The editor is hovering over fixtureId.teacher1Id which shows a wealth of information about which classes, students, parents, and schools that teacher is connected to.

    Our tests used to be full of hundreds of random ids like 5233ac4c7220e9000000000d, 642c65770f6aa00887d97974, 53c450afac801fe5b8000019 that all had relationships to one another. 5233ac4c7220e9000000000d was a school, right? Was 642c65770f6aa00887d97974 a parent in that school? What if I needed a teacher and student in that school too—how did I find appropriate test entities? It wasn't an insurmountable problem—you could either poke around the CSVs that generated our fixtures or query the local database—but it slowed down writing and maintaining tests. Some engineers even had a few of these ids memorized; they'd say thing like "Oh, but 000d is parent4! Let's use 002f instead. They're in a school." or "I quite like 9797, it's a solid class for story-related tests."

    To make navigating our fixture IDs and their relationships a bit simpler, I wrote a simple script to query our database and decorate names for these fixture ids (e.g., student2Id, school5Id) with the most common relationships for that entity. For a school, we show teachers, parents, students, and classes. For a parent, we show children, teachers, schools, classes, and message threads.

    /**
     * - name: Student TwoParents
     * - **1 current classes**: classroom3Id
     * - **0 past classes**:
     * - **2 parents**: parent10Id, parent11Id
     * - **1 current teachers**: teacher1Id
     * - **schoolId**: none
     */
    export const student5Id = "57875a885eb4ec6cb0184d68";
    

    Being able to write a line like import { student5Id } from "../fixtureIds"; and then hover over it and see that if we wanted a parent for that student, we could use parent10Id, makes writing tests a bit more pleasant. The script to make generate this fixture-id file was pretty straightforward:

    1. Get ordered lists of all of the entities in our system and assign them names like parent22Id or teacher13Id[^1]
    2. Set up a map between an id like 57875a885eb4ec6cb0184d68 and student5Id.
    3. For each entity type, write model queries to get all of the relationship IDs that we're interested in.
    4. Use JS template strings to create nicely formatted JSDoc-decorated strings and write those strings to a file.

    One small pain point I ran into was migrating all of our existing test code to reference these new fixture IDs. I wrote a script to find lines like const X = /[0-9a-f]{24}/;, delete those lines, update the variable name with the fixture-id name from the file, and then add an appropriate import statement to the top of the file. (Shell patterns for easy automated code migrations talks through patterns I use to do migrations like this one.)

    After setting up these initial fixtureId JSDoc comments, we've added JSDoc comments for more and more of our collections; it's proven to be a useful tool. We also set up a complementary fixtureEntities file that exports the same information as TS documents so that it's straightforward to programmatically find appropriate entities. All in all, it's made our test code nicer to work with—I just wish we'd made the change sooner!

    [^1]: Whenever we add any new IDs to our fixtures, we need to make sure that they come after the most recent fixtureID. Otherwise we'll end up with fixtureID conflicts!

      When the ClassDojo engineering team was in the office, we loved our information radiators: we had multiple huge monitors showing broken jenkins builds, alerts, and important performance statistics. They worked amazingly well for helping us keep our CI/CD pipelines fast & unblocked, helped us keep the site up & fast, and helped us build an engineering culture that prioritized the things we showed on the info radiators. They worked well while the whole team was in the office, but when we went fully remote, our initial attempt of moving that same information into a slack channel failed completely, and we had to find a different way to get the same value.

      Open-office with row of 4 monitors displaying production metrics across the back wall

      Most teams have an #engineering-bots channel of some sort: it's a channel that quickly becomes full of alerts & broken builds, and that everyone quickly learns to ignore. For most of these things, knowing that something was broken isn't particularly interesting: we want to know what the current state of the world is, and that's impossible to glean from a slack channel (unless everyone on the team has inhuman discipline around claiming & updating these alerts).

      We had, and still have, an #engineering-bots channel that has 100s of messages in it per day. As far as I know, every engineer on the team has that channel muted because the signal to noise ratio in it is far too low. This meant that we occasionally had alerts that we completely missed because they quickly scrolled out of view in the channel, and that we'd have important builds that'd stay broken for weeks. This made any fixes to builds expensive, allowed some small production issues to stay broken, and slowed down our teams.

      slack channel with lots of alerts in it

      After about a year of frustration, we decided that we needed to figure out a way to give people a way to set up in-home info-radiators. We had a few requirements for a remote-work info-radiator:

      1. It needed to be configurable: teams needed a way to see only their broken builds & the alerts that they cared about. Most of the time, the info-radiator shouldn't show anything at all!
      2. It needed be on an external display: not everyone had an office setup with enough monitor real-estate to support a page and keep it open
      3. It needed to display broken builds from multiple Jenkins locations, broken builds from GitHub Actions, and triggered alerts from Datadog and Pagerduty on a single display

      We set up a script that fetches data from Jenkins, Github Actions, Datadog, Pagerduty, and Prowler, transforms that data into an easily consumable JSON file, and finally uploads that file to S3. We then have a simple progressive web app that we installed on small, cheap Android displays that fetches that JSON file regularly, filters it for the builds that each person cares about, and renders them nicely.

      picture of info-radiator with broken build highlighted picture of small Android display running the info-radiator on a desk

      These remote info-radiators have made it much simpler to stay on top of alerts & broken builds, and have sped us up as an engineering organization. There's been a lot written about how valuable info-radiators can be for a team, but I never appreciated their value until we didn't have them, and the work we put into making sure we had remote ones has already more than paid for itself.

        Older posts