Posts By: Will Keleher

XKCD's "Is It Worth the Time?" Considered Harmful

Will Keleher

kelwill

2025-01-08

"Is It Worth the Time?"

Is It Worth the Time?

Years ago, I wanted to update a bad pattern in our codebase—updating the order of arguments to a function or something like that. This particular pattern was only in about 10 spots, so it would have only taken a minute to search and fix manually, but instead I spent an hour automating the fix using sed and xargs. And I think that was the right choice.

At the time, I had no clue what I was doing in the shell. sed confused me, shell quoting rules were totally opaque, and xargs was a black box. I remember feeling so stupid because I couldn't figure out how to automate this reoroder; it was the kind of thing that should be simple, but I kept getting stuck.

But I automated that refactor, and I learned a few things! I learned that the default sed on Macs was quite old, that -r is necessary for regular expression mode, how to use capture groups, how to use -i with gsed, the difference between ' and " in the shell, how to use xargs with gsed -i, and probably a few other things. If I'd found better resources, I could have learned those things faster, but being comfortable with shell tools has saved me so much time over the years when I've run into situations that aren't fixable manually.[^1]

Just a few days ago, I wanted to stitch together 15 markdown documents, format them in a nice-ish way, and then print them. The easy way would have been to copy-paste 15 times. It probably would have taken a minute. Instead, I spent a chunk of time writing a python script to do it for me, and I learned a few more things. I somehow hadn't put together that you can just write html in markdown and it just works because it's left alone during the conversion process. That seems glaringly obvious in retrospect, but I'm glad I learned it. I also learned pandoc is great for converting markdown to html, this CSS from killercup works great with pandoc --css pandoc.css --standalone to make the pandoc-converted html prettier, figured out where my Obsidian Vault was stored on disk, and learned the CSS to make nice line-breaks (<div style="page-break-after: always;"></div>) between the 15 markdown documents. Doing the work to automate stitching together these 15 markdown documents was far slower than copy-pasting them, but over the long-term, automating builds compounding skills.

I'll sometimes see people reference XKCD's Is It Worth the Time to argue against automating things that would be faster to do manually, and I think that that mindset is a mistake. Automating the easy things is how you build the skills, mindset, and muscle-memory to automate the hard things. There are situations where the only thing that matters is that a task gets accomplished quickly; in those cases, doing it manually can make sense! But a lot of the time, trying to automate the thing builds skill and capability that will come in handy later.

Aside from building personal capability, reaching for automation is an important part of building an engineering culture that values automation. An engineering culture that values automation is going to find opportunities to reduce toil and speed through projects that another engineering culture might miss. I'd like to have an engineering culture that celebrates the times when the team learned how to automate something, even when automating the thing took much longer than just doing the thing. The accumulation of capability trumps the loss in short-term speed.

So, the next time you want to rewrite the order of arguments in a function that's only used in three spots in a codebase, spend some time figuring out regular expressions instead. The next time you realize you forgot to run a command that's necessary for something, spend an unreasonable amount of effort making it so that the command runs automatically. The next time you have something you need to only do yearly, waste some time building tooling to make it easier for yourself.

[^1]: ag -l doThing | xargs gsed -r 's|doThing$([^,]+), ([^)]+)$|doThing(\2, \1)|g' is a reasonable way of re-ordering a function's arguments (Bash Patterns I Use Weekly describes how this command works). Code editors should have this built in to their find-and-replace, but I prefer doing it on the shell because it allows me to construct more complex commands that aren't possible with a GUI.

18,957 tests in under 6 minutes: ClassDojo's approach to backend testing

Will Keleher

kelwill

2024-12-13

We're pretty proud of our backend test suite. We have a lot of tests, and developers can run the full test suite locally in under six minutes. These aren't simple unit tests—they're tests that hit multiple databases and routes with only a minimal amount of stubbing for external dependencies.

9 years ago, we were proud of running 2,000 tests in 3 minutes. Not much has changed from that initial post—we're still writing a bunch of tests, but we've put a lot of effort over the years into making sure our test suite has stayed acceptably fast for people.

Why do we run our tests this way?

First off though, why are we making things so hard for ourselves? When we write our tests, we don't stub out our databases at all. Many of our tests are resource tests—those tests hit a real running server, the resource code issues real queries against Redis/MySQL/MongoDB/memcached containers, and if it makes any changes to those databases, we need to reset the databases fully before the next test run.

We think that the database is an integral part of the system that we're testing. When you stub a database query, that means that you're not testing the query. And I don't know about you, but I've gotten plenty of database queries wrong.

Similarly, we like to run a full server for any resource level tests. The middleware that runs for each resource matters. We want our tests to match our production environment as much as possible.

How do we make the tests fast?

You'll see recommendations online to limit this style of testing, where you have full databases that you're querying, not because it's worse, but because it ends up being too slow. We've needed to put a lot of work into test speed over the years, and if you take nothing else away from this post it should be that if you treat test speed as an organizational priority, you can make a pretty big impact.

Make sure engineers have fast computers. First off, if we had done nothing else over the past 9 years, our tests would have gotten faster because computers have gotten better over that time period. And we make sure to buy nice computers for engineers on our team because we care that tests and builds are speedy. (The M1 and M2 chips for Macs have been quite nice!)
Use Orbstack rather than Docker Desktop. The single easiest change to speed up our tests was switching from Docker Desktop to Orbstack to run containers locally. On Macs, it is so much faster than Docker Desktop. On some engineers' machines, tests run twice as fast on Orbstack. Aside from speed, we've found that it's been more stable for folks—fewer randomly missing volumes and needing to restart Docker to get things going again. We're huge fans.

That said though, it's worth noting using docker/orbstack will still slow down your tests. If we ran databases directly on our machines rather than through docker, our tests would be faster. But the extra effort to get everyone to install and maintain MySQL, Redis, MongoDB, Memcached, and everything else just isn't worth the test speed increases that it brings for us. Other organizations might have different trade-offs.
Speed up fixture resets. The slowest part of our tests is resetting fixtures. Whenever one of our tests writes to a database, we need to undo those changes before the next test starts. The core trick to doing this quickly is to only undo changes to the tables that actually changed rather than resetting every single table. All of our database operations go through the same code, so it's relatively straightforward to track which tables are "dirty" and then only reset those tables.

A few details:
- We tested out tracking things at the row level rather than the table level for resets, but it didn't improve performance. For MySQL, our basic table resetting strategy is turning off foreign key checks, truncating the table, and then using LOCAL DATA INFILE loads from a volume that's mounted into the MySQL container.
- For MongoDB resets, we found that the fastest technique was creating a shadow collection for every collection that we could restore from whenever we needed to.
- When there's a hard MySQL delete, we don't know whether it might be a cascading delete, so we have code that counts how many rows are in each table. If a table has fewer rows after the test we should reset it. And if it doesn't have fewer rows because data has also been inserted, our regular code will have marked that table as dirty.
- For MySQL updates, we have some (slightly janky) code to pull out the list of tables that might be updated by the update query when it's a query with multiple tables.
Run tests in parallel. The next important piece to fast tests is being able to run the test suite in parallel, which means we need multiple copies of our databases. This was a relatively straightforward task that took a lot of blood, sweat, and tears to actually make happen. We use mocha to run our tests which supports a --parallel option, so our tests look for MOCHA_WORKER_ID in the environment to decide which database to connect to: test_db_${MOCHA_WORKER_ID}.
Measure what's slow. Like any other optimization problem, the first step is measuring how long things actually take. Having guesses about why tests are slow can lead to a ton of wasted effort that doesn't actually move the needle. We haven't done any fancy profiling here—instead, we hook into our existing instrumentation to generate a report of where time is being spent over the course of our tests. It's not perfect, but it gives us a good enough sense of where time is going to be useful.

The future

We're proud of where our tests are, but there's still a ton of space to improve things over the next 9 years. We're going to keep writing lots of tests, so we need to make sure that those tests continue to be speedy. A few things that are on our minds:

Setting up fixture "scenarios." Currently, we always reset our fixtures back to the same base scenario. That base scenario is difficult to change because it can impact a huge number of tests if we're tweaking fixture data that is used in a lot of spots, so it'd be nice to have better support for temporarily setting up a new "base" state that multiple tests can reference.
Retrying the row-level resets. In theory, these should be faster than truncating and restoring the tables, so we want to try out better profiling to make that happen.
Improving our Redis fixture resets. Redis is fast enough that we've been lazy with our Redis fixture resets—we just flush the db and then restore it, so there's room to improve performance
Run our tests with some more detailed profiling to generate a flame-graph to see if there are any hotspots in our code that we could improve. Funnily enough, we've actually sped up our production app a few times when optimizing our tests—things that are slow in testing are often slow in production too!

JSDoc comments can make fixtures easier to work with

Will Keleher

kelwill

2024-10-30

Image of test file with a fixtureId imported. The editor is hovering over fixtureId.teacher1Id which shows a wealth of information about which classes, students, parents, and schools that teacher is connected to.

Our tests used to be full of hundreds of random ids like 5233ac4c7220e9000000000d, 642c65770f6aa00887d97974, 53c450afac801fe5b8000019 that all had relationships to one another. 5233ac4c7220e9000000000d was a school, right? Was 642c65770f6aa00887d97974 a parent in that school? What if I needed a teacher and student in that school too—how did I find appropriate test entities? It wasn't an insurmountable problem—you could either poke around the CSVs that generated our fixtures or query the local database—but it slowed down writing and maintaining tests. Some engineers even had a few of these ids memorized; they'd say thing like "Oh, but 000d is parent4! Let's use 002f instead. They're in a school." or "I quite like 9797, it's a solid class for story-related tests."

To make navigating our fixture IDs and their relationships a bit simpler, I wrote a simple script to query our database and decorate names for these fixture ids (e.g., student2Id, school5Id) with the most common relationships for that entity. For a school, we show teachers, parents, students, and classes. For a parent, we show children, teachers, schools, classes, and message threads.

/**
 * - name: Student TwoParents
 * - **1 current classes**: classroom3Id
 * - **0 past classes**:
 * - **2 parents**: parent10Id, parent11Id
 * - **1 current teachers**: teacher1Id
 * - **schoolId**: none
 */
export const student5Id = "57875a885eb4ec6cb0184d68";

Being able to write a line like import { student5Id } from "../fixtureIds"; and then hover over it and see that if we wanted a parent for that student, we could use parent10Id, makes writing tests a bit more pleasant. The script to make generate this fixture-id file was pretty straightforward:

Get ordered lists of all of the entities in our system and assign them names like parent22Id or teacher13Id[^1]
Set up a map between an id like 57875a885eb4ec6cb0184d68 and student5Id.
For each entity type, write model queries to get all of the relationship IDs that we're interested in.
Use JS template strings to create nicely formatted JSDoc-decorated strings and write those strings to a file.

One small pain point I ran into was migrating all of our existing test code to reference these new fixture IDs. I wrote a script to find lines like const X = /[0-9a-f]{24}/;, delete those lines, update the variable name with the fixture-id name from the file, and then add an appropriate import statement to the top of the file. (Shell patterns for easy automated code migrations talks through patterns I use to do migrations like this one.)

After setting up these initial fixtureId JSDoc comments, we've added JSDoc comments for more and more of our collections; it's proven to be a useful tool. We also set up a complementary fixtureEntities file that exports the same information as TS documents so that it's straightforward to programmatically find appropriate entities. All in all, it's made our test code nicer to work with—I just wish we'd made the change sooner!

[^1]: Whenever we add any new IDs to our fixtures, we need to make sure that they come after the most recent fixtureID. Otherwise we'll end up with fixtureID conflicts!

Older posts