Automated and semi-automated code migrations using shell text manipulation tools are great! Turning a migration task that might take multiple days or weeks of engineering effort into one that you can accomplish in a few minutes can be a huge win. I'm not remotely an expert at these migrations, but I thought it'd still be useful to write up the patterns that I use consistently.
Use ag
, rg
, or git grep
to list files
Before anything else, you need to edit the right files! If you don't have a way of finding your codebase's files, you might accidentally edit random cache files, package files, editor files, or other dependencies. Editing those files is a good way to end up throwing away a codebase and cloning it from scratch again.
I normally use ag -l .
to list files because ag
, the Silver Searcher, is set up to respect .gitignore
already. A simple find and replace might look like ag -l . | xargs gsed -i 's|bad pattern|replacement|'
. It'd be simpler to do that replacement with your editor, but the ag -l . | xargs gsed -i
pattern is one that you can expand on in a larger script.
Pause for user input: not all migrations are fully automatable
A lot of migrations can't actually be fully automated. In those cases, it can be worth building a miniature tool to make editing faster (and more fun!).
# spaces in file names will kill this for loop
# thankfully, I've never worked in a code base where people put spaces in filenames
for file in $(ag -l bad_pattern); do
echo "how should we replace bad_pattern in ${file}? Here's context:"
ag -C 3 bad_pattern "${file}"
echo ""
read good_pattern
# quoting in sed commands is tricky!
# using `${var}` rather than $var avoids potential problems here
gsed -i "s|bad_pattern|${good_pattern}|" "${file}"
done
You can expand this pattern to look for a number and choose an appropriate option, but just having something that speeds up going through files makes life better!
Handle relative import paths with for
loops
I've often needed to add a new import statement with a relative path to files as part of a migration, and every time I've been surprised that my editor hasn't been able to help me out more: what am I missing? I normally use a for
loop and increase both the max-depth
of files I'm looking at and the number of ../
on the path:
dots="."
import_path="/file/path"
for ((depth=0; depth<5; depth++)); do
dots="$dots/..";
for file in $(ag -l --depth $depth | grep .ts); do
if ! grep $import_path $file; then
gsed -i "1i import '${dots}${import_path}';" $file;
fi
done
done
Rely on your code formatter
Not needing to worry about code formatting is AMAZING. If your codebase is set up with a code formatter (like prettier or gofmt), it allows you to make changes without worrying about whitespace and then let the code formatter fix things later. It may even make sense to intentionally remove white-space from a pattern in order to make a replacement simpler to write!
Use the right tool for the job
- Some code migrations require a tool that looks at the AST rather than the text in a code file and transforms that AST. These tools are more powerful & flexible than shell tools, but they require a bit more effort to get working. In NodeJS, there's jscodeshift and codemods. I don't know what's available for other languages.
- Your editor & language might support advanced migrations. If it does, learning how to do those migrations with your editor will likely be more effective than using these techniques or may prove a useful complement to these techniques.
- Bash tools like
sed
, awk
, grep
, and cut
are designed to deal with text and files. Code is text and files! Other tools work, but they might not be designed to deal with files and streams of text.
- Shell tools are great, but a tool you know well and are excited about using is better than a tool you don't want to learn! Whatever programming language you're most comfortable with should have ways of dealing with and changing files and text. Having some way of manipulating text & files is important!. There are even tools like rb or nq (I wrote this one!) that let you use the Ruby or NodeJS syntax you're familiar with on the command line in a script you're writing.
Use sed: it's designed for this
sed
is the streaming text editor, and it's the perfect tool for many code migrations. A surprising number of code migrations boil down to replacing a code pattern that happens on a single line with a different code pattern: sed
makes that easy. Here are a few notes:
- If you're on a mac, you'll want to download a modern version of
sed
. I use gnu-sed
: brew install gnu-sed
- use
|
(or anything else!) as your delimiter rather than /
. sed
takes the first character after the command as the delimiter, and /
will show up in things that you want to replace pretty often! Writing gsed 's|/path/file.js|/path/file.ts|'
is nicer than gsed 's/\/path\/file.js/\/path\/file.ts/'
.
- In
gsed
, the --null-data
(-z
) option separates lines by NUL characters which lets you easily match and edit multiline patterns. If you use this, don't forget to use the g
flag at the end to get all matches: everything in a file will be on the same 'line' for sed
.
- When referring to shell variables, use
${VAR_NAME}
rather than $VAR_NAME
. This will simplify using them in sed
commands.
- Use
-E
(or -r
with gsed
) for extended regular expressions and use capture groups in your regular expressions. git grep -l pattern | xargs gsed -Ei 's|pat(tern)|\1s are birds|g'
("perl pie" (perl -pi -e
) can be another good tool for finding and replacing patterns! It's just not one I know.)
Many migrations might take multiple steps
When you're migrating code, don't worry about migrating everything at once. If you can break down the problem into a few different commands, those individual commands can be simple to write: you might first replace a function call with a different one and then update import statements to require the new function that you added.
When you write a regular expression in a find-and-replace, you can sometimes get false positives. Rather than trying to update your regular expression to skip the false positives, I often find it simpler to write a regular expression to replace those false positives with a temporary pattern, update the remaining matches, and then replace the temporary pattern.
With all of this, you'll need to rely on git
(or another version control system). It's really easy to make mistakes! If you don't have an easy way to undo mistakes, you'll be sad.
Automate ALL the code migrations!
Manipulating text & files like this is a skill, and it's one that takes some practice to learn. Even if it's much slower to automate a code change, spending the time to automate it will help you build the skills to automate larger, more complex, and more valuable code migrations. I remember spending over an hour trying to figure out how to automate changing a pattern that was only in 10 spots in our codebase. It would have taken 5 minutes to do manually, but I'm glad I spent 10x the time doing it the slow way with shell tools because that experience made me capable of tackling more complex migrations that wouldn't be feasible to do manually.