Shell Patterns for Easy Code Migrations

Will Keleher

kelwill

2022-10-12

Automated and semi-automated code migrations using shell text manipulation tools are great! Turning a migration task that might take multiple days or weeks of engineering effort into one that you can accomplish in a few minutes can be a huge win. I'm not remotely an expert at these migrations, but I thought it'd still be useful to write up the patterns that I use consistently.

Use `ag`, `rg`, or `git grep` to list files

Before anything else, you need to edit the right files! If you don't have a way of finding your codebase's files, you might accidentally edit random cache files, package files, editor files, or other dependencies. Editing those files is a good way to end up throwing away a codebase and cloning it from scratch again.

I normally use ag -l . to list files because ag, the Silver Searcher, is set up to respect .gitignore already. A simple find and replace might look like ag -l . | xargs gsed -i 's|bad pattern|replacement|'. It'd be simpler to do that replacement with your editor, but the ag -l . | xargs gsed -i pattern is one that you can expand on in a larger script.

Pause for user input: not all migrations are fully automatable

A lot of migrations can't actually be fully automated. In those cases, it can be worth building a miniature tool to make editing faster (and more fun!).

# spaces in file names will kill this for loop
# thankfully, I've never worked in a code base where people put spaces in filenames
for file in $(ag -l bad_pattern); do
  echo "how should we replace bad_pattern in ${file}? Here's context:"
  ag -C 3 bad_pattern "${file}"
  echo ""
  read good_pattern
  # quoting in sed commands is tricky!
  # using `${var}` rather than $var avoids potential problems here
  gsed -i "s|bad_pattern|${good_pattern}|" "${file}"
done

You can expand this pattern to look for a number and choose an appropriate option, but just having something that speeds up going through files makes life better!

Handle relative import paths with `for` loops

I've often needed to add a new import statement with a relative path to files as part of a migration, and every time I've been surprised that my editor hasn't been able to help me out more: what am I missing? I normally use a for loop and increase both the max-depth of files I'm looking at and the number of ../ on the path:

dots="."
import_path="/file/path"
for ((depth=0; depth<5; depth++)); do
 dots="$dots/..";
 for file in $(ag -l --depth $depth | grep .ts); do
   if ! grep $import_path $file; then
     gsed -i "1i import '${dots}${import_path}';" $file;
   fi
 done
done

Rely on your code formatter

Not needing to worry about code formatting is AMAZING. If your codebase is set up with a code formatter (like prettier or gofmt), it allows you to make changes without worrying about whitespace and then let the code formatter fix things later. It may even make sense to intentionally remove white-space from a pattern in order to make a replacement simpler to write!

Use the right tool for the job

Some code migrations require a tool that looks at the AST rather than the text in a code file and transforms that AST. These tools are more powerful & flexible than shell tools, but they require a bit more effort to get working. In NodeJS, there's jscodeshift and codemods. I don't know what's available for other languages.
Your editor & language might support advanced migrations. If it does, learning how to do those migrations with your editor will likely be more effective than using these techniques or may prove a useful complement to these techniques.
Bash tools like sed, awk, grep, and cut are designed to deal with text and files. Code is text and files! Other tools work, but they might not be designed to deal with files and streams of text.
Shell tools are great, but a tool you know well and are excited about using is better than a tool you don't want to learn! Whatever programming language you're most comfortable with should have ways of dealing with and changing files and text. Having some way of manipulating text & files is important!. There are even tools like rb or nq (I wrote this one!) that let you use the Ruby or NodeJS syntax you're familiar with on the command line in a script you're writing.

Use sed: it's designed for this

sed is the streaming text editor, and it's the perfect tool for many code migrations. A surprising number of code migrations boil down to replacing a code pattern that happens on a single line with a different code pattern: sed makes that easy. Here are a few notes:

If you're on a mac, you'll want to download a modern version of sed. I use gnu-sed: brew install gnu-sed
use | (or anything else!) as your delimiter rather than /. sed takes the first character after the command as the delimiter, and / will show up in things that you want to replace pretty often! Writing gsed 's|/path/file.js|/path/file.ts|' is nicer than gsed 's/\/path\/file.js/\/path\/file.ts/'.
In gsed, the --null-data (-z) option separates lines by NUL characters which lets you easily match and edit multiline patterns. If you use this, don't forget to use the g flag at the end to get all matches: everything in a file will be on the same 'line' for sed.
When referring to shell variables, use ${VAR_NAME} rather than $VAR_NAME. This will simplify using them in sed commands.
Use -E (or -r with gsed) for extended regular expressions and use capture groups in your regular expressions. git grep -l pattern | xargs gsed -Ei 's|pat(tern)|\1s are birds|g'

("perl pie" (perl -pi -e) can be another good tool for finding and replacing patterns! It's just not one I know.)

Many migrations might take multiple steps

When you're migrating code, don't worry about migrating everything at once. If you can break down the problem into a few different commands, those individual commands can be simple to write: you might first replace a function call with a different one and then update import statements to require the new function that you added.

When you write a regular expression in a find-and-replace, you can sometimes get false positives. Rather than trying to update your regular expression to skip the false positives, I often find it simpler to write a regular expression to replace those false positives with a temporary pattern, update the remaining matches, and then replace the temporary pattern.

With all of this, you'll need to rely on git (or another version control system). It's really easy to make mistakes! If you don't have an easy way to undo mistakes, you'll be sad.

Automate ALL the code migrations!

Manipulating text & files like this is a skill, and it's one that takes some practice to learn. Even if it's much slower to automate a code change, spending the time to automate it will help you build the skills to automate larger, more complex, and more valuable code migrations. I remember spending over an hour trying to figure out how to automate changing a pattern that was only in 10 spots in our codebase. It would have taken 5 minutes to do manually, but I'm glad I spent 10x the time doing it the slow way with shell tools because that experience made me capable of tackling more complex migrations that wouldn't be feasible to do manually.

Simulating network problems with docker network

Nick Bottomley

bttmly

2022-10-05

TL;DR: see how your code responds to not being able to reach a dependency

docker disconnect <network-id / network-name> <container-id / container-name>
docker connect <network-id / network-name> <container-id / container-name>

We recently ran into a production issue where a few of our main API containers were briefly unable to reach our RabbitMQ instance, and this surfaced a latent problem with our reconnection logic. The messages that go into Rabbit are exclusively side effects — things like a push notification when a user sends another user a message. If something briefly interrupts Rabbit connectivity, the API server should still be able to mostly function normally (unlike if it were unable to reach our primary AWS RDS cluster). The API-side Rabbit code was designed to reconnect during a network interruption, buffering messages in memory in the meantime. Our logs showed this process wasn't working as intended — the buffer filled up and we never reconnected to Rabbit.

After a couple quick fixes didn't resolve the issue, we realized we didn't have a clear picture of what the underlying library node-amqp was doing when it encountered a connection issue. What our investigation found is related to node-amqp specifically, and the Node event emitter model more generally[^1], but the docker network commands we used should be useful for any dockerized service.

We were working with two different objects, a node-ampq Connection and an associated Channel. When we tried docker killing the Rabbit container, the API-side Rabbit client closed gracefully — not a match with what we saw from our logging. We needed to understand the events emitted by these two objects during an unexpected network problem.

You can simulate a network partition in any number of ways; after a quick Google search we came across the docker network suite of commands. We took a stab at docker network disconnect and immediately saw the same behavior we saw in production.

docker disconnect <network-id / network-name> <container-id / container-name>
docker connect <network-id / network-name> <container-id / container-name>

Our specific issue ended up being that the close event on the AMQP connection had an error payload when the connection was not closed cleanly, and no payload when it was. The fix was pretty easy, and we determined it would work as intended by doing a quick docker network connect and watching the debug logging reconnect and flush its buffer of jobs to the Rabbit server.

We don't yet have an automated test verifying that the reconnection logic works but we plan to soon. This is what's most exciting to me about docker network — automated testing of service behavior in the case of network issues, all inside docker. We want our main API service to respond differently when specific services are unavailable. If a critical dependency like our main AWS RDS cluster is unreachable, we need to shut down, and that's pretty easy to test. Testing nuanced behavior with subcritical dependencies like reconnecting a fixed number of times (or until a fixed-size buffer is full) is trickier, but docker network provides an easy way to do just that!

[^1]: We have a small collection of Go services reading to and writing from RabbitMQ. The error-handling model there is more natural: there's a channel of jobs coming from the guts of the service that needs to be sent to the Rabbit server, along with a channel of errors produced by Rabbit. Since jobs and errors both come from the same kind of source — a Go channel — dealing with jobs and errors is as simple as selecting on the two channels.

Post-Merge Code Reviews are Great!

Victor Essnert

svoctor

2022-09-14

One aspect of our workflow that people often find odd or new to them is that we don’t require an up-front code review or pull request with signoff from another engineer before pushing to production. This often comes up in interviews or conversations with engineers who are interested in jobs at ClassDojo, with reactions ranging from curiosity to aversion.

The reactions are understandable. Many organizations enforce blocking pull requests as a requirement, and many developers work in environments with fewer safety nets (test automation, linting, and other code quality tools) and less trust. Don’t get me wrong, code reviews are a good tool for getting feedback, sharing context and useful patterns, and we do create pull requests and ask for input all the time. It’s just not required.

Code reviews often happen too late for certain kinds of feedback, and both research and experience[1] show that mandatory external code reviews don’t catch issues at a higher rate than other techniques, and don’t reduce production outages. Having pull requests and feature branches hanging around waiting for review can lead to tasks getting out of date with the main branch — going stale and requiring additional work that slows down the time to release. The only real, predictable outcome from having forced code reviews is slower velocity.

In general, no single, silver bullet tool or process guarantees quality. You need to employ a combination of techniques to lower the number of defects and generate stable releases. Even in circumstances where bugs or bad code could be outright dangerous, resulting in the loss of life or irreversible harm, a code inspection process, coupled with other programming techniques such as defensive programming, could be a better fit than a blocking code review.

Code reviews have their place as a collaboration tool, but depending on what you intend to get out of collaboration there are other, more effective techniques to reach for. Mandated blocking code reviews that are external to the team are often motivated by an underlying need for control, a lack of trust in systems or people, or as an enforced form of collaboration.

Trust is important, and at ClassDojo we strive for a high trust, low control engineering environment. If you can't trust someone to make good changes without careful review, you can try several things to set them up for success. Options range from upskilling through training, mentoring, and pairing, to changing the way that you're working. For instance, you can promote safer changes by adding tests, instrumentations, and alerting. Ultimately, if you really don’t feel you can trust a person to write good code, you might consider parting ways with them.

What do we do?

At ClassDojo we’re strong advocates of pair and mob programming as collaboration practices. These both help to drive attention and energy toward problem solving, and provide live, in-context feedback on code quality and technical decisions. Fast feedback is one of the most important contributors to velocity, and getting that feedback at the time of writing the code when you have the mental model in your head is by far preferable to feedback at a later date.

Careful scope management helps make this kind of collaboration work for us. As a team picks up a project or idea, we investigate what the smallest possible first implementation could be and start there. Smaller changes are easier to reason about as you work on them, and easier to roll back if necessary.

For larger changes or changes with a lot of uncertainty, we have upfront design conversations, sometimes in writing but often as simple as saying "I'm planning on sticking X in Y for Z. Does that sound crazy to anyone?" That can be a better fit for up-front feedback on direction, and is something we encourage engineers and teams to do informally and often.

Additionally, we’re very invested in test automation and instrumentation to make going to production safe. We want to prioritize making it easy to fix or roll back changes over the idea that we can predict the future and never be wrong. We also encourage post-merge code reviews. After the code is live and hopefully contributing user and business value, we expect to iterate on the code. We have processes and systems that make it easy to do that, which rely on all of the practices we’ve talked about.

There are of course drawbacks to this approach. Services like GitHub lack good tools or notifications for post-merge code reviews, and it can be hard to remember to follow up and act on new suggestions or issues. This way of working might also not be a good fit if you don’t already have a good level of monitoring, alerting, and test automation that enables you to move swiftly with confidence.

While this way of working might not appeal to everyone, we have found that it works well in our engineering culture. Engineers at ClassDojo tend to be very collaborative, often using pairing or mob programming to problem solve together, and also tend to be motivated by the craft of achieving a high level of quality in code and products. Our practices as a team are enabled by our team values, so we work hard at maintaining both.

If our development practices sound like an interesting way of working, and you’re excited about working in a team that values trust and collaboration, have a look at our open roles and get in touch!

[1] McConnell, S. (2004). 20.3 Relative Effectiveness of Quality Techniques. In Code complete (2nd ed., pp. 469–472). essay, Microsoft Press.

Newer posts

Older posts