Editing 200 Files with Bash and Perl

I recently had to change 189 files in our code base, all in almost the same way. Rather than doing it manually, I decided to brush up on my command-line text manipulation ... and ended up taking it further than I expected.

The Mission

The changes were pretty simple. In our API code, we have TypeScript definitions for every endpoint. They look something like this:

1interface API {
2 "/api/widget/:widgetId": {
3 GET: {
4 params: {
5 widgetId: MongoId;
6 };
7 response: WidgetResponse;
8 }
9 }
10}

You'll notice the params are defined twice: once in the URL key string (as :widgetId) and again in the GET attribute (under params); we are moving to a TypeScript template literal string parser to get the type information out of the URL key string itself, and so I wanted to remove the params key from these definitions. But with 189 files to change, the usual manual approach wasn't so inviting.

So, I set myself the challenge of doing it via the command line.

Step 1: Remove the lines

I'll be honest, when I started, this was the only step I had in mind. I needed to do a multi-line find-and-replace, to remove params: { ... }; a quick grep showed me that this pattern was unique to the places I wanted to change; however, I could have narrowed the set of files I was searching to just our endpoints in src/resources if necessary. For doing the replacement, I thought sed might be the right tool, but new lines can be challenging to work with ... so I ended up learning my first bit of perl to make this work.

Here's what I ended up doing (I've added line breaks for readability):

1grep -r --files-with-matches "params: {" ./src | while read file;
2 do
3 perl -0777 -pi -e 's/ *params: {[^}]*};\n//igs' "$file";
4 done

This one-liner uses grep to recursively search my src directory to find all the files that have the pattern I want to remove. Actually, I usually reach for ag (the silver searcher) or ripgrep, but grep is already available pretty much everywhere. Then, we'll loop over the files and use perl to replace that content.

Like I said, this was my first line of perl, but I'm fairly sure it won't be my last. This technique of using perl for find-and-replace logic is called a perl pie. Here's what it does:

  • 0777 means perl will read in the entire file
  • p wraps that one-liner in the conventional perl script wrapper.
  • i means that perl will change the file in place; if you aren't making this change in a git repo like I am, you can do something like i.backup and perl will create a copy of the original file, so you aren't making an irreversible change.
  • e expects an argument that is your one-line program

Oh, and the program itself:

1s/ *params: {[^}]*};\n//igs

This is typical 's/find/replace/flags' syntax, and you know how regexes work. The flags are global, case-insensitive, and single-line (where . will also match newlines).

So, this changed the 189 files, in exactly the way I wanted. At this point, I was feeling great about my change. Reviewed the changes, committed it and started the git push.

Step 2: Remove unused imports

Not so fast. Our pre-push hooks caught a TypeScript linting issue:

1error TS6133: 'MongoId' is declared but its value is never read.
2
35 import { MongoId } from "our-types";
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ah, yeah, that makes sense. URL parameters are strings, but we have a MongoId type that's a branded string. I forgot about this step, but that's why we have pre-push checks! We'll need to remove those imports.

How can we do this? Well, let's get a list of the files we changed in our most recent commit:

1git show --name-only | grep ^src

We add the grep to only find the files within our top-level src directory (and to remove the commit information).

Then, we need to find all the files that include MongoId only once. If a file references MongoId multiple times, then we don't want to remove the import, because clearly we're still using it. If the file only references MongoId once, we can remove the import ... but we have to consider that it might not be the only thing we're importing on that line. For starters, grep's -c flag to count the number of occurrences per file.

1for file in $(git show --name-only | grep ^src)
2 do
3 grep -c MongoId "$file"
4 done

A simple for loop works here, because I know the only whitespace is the linebreaks between the file names. Once we have the count, we can check to see that there's only 1 match:

1for file in $(git show --name-only | grep ^src)
2 do
3 if [ $(grep -c MongoId "$file") = 1 ]; then; echo "..."; fi
4 done

We're using an if statement here, to check that the occurrence count is 1. If it is, we want to do something. But what? Remember, we might be importing multiple things on that line, so that leaves us with three possible actions:

  1. Remove the whole line when MongoId is the only item imported.
  2. Remove MongoId, when it's the first item imported on that line. Don't miss that following comma!
  3. Remove , MongoId when it's not the first item on the that line. Don't miss the preceding comma!

There are many ways we could do this, so let's have some fun with reading input from the command line! To be clear, this isn't the best way to do it. We could easily match our three cases above with perl or sed. But we've already used that pattern in this project, and reading input in a shell script is an incredibly useful tool to have in your toolbox.

At this point, we probably want to move this into an actual shell script, instead of running it like a one-off on the command line:

1#!/bin/bash
2
3for file in $(git show --name-only | grep ^src)
4 do
5 if [ $(grep -c MongoId "$file") = 1 ]
6 then
7 echo ""
8 echo "====================="
9 echo "1 - remove whole line"
10 echo "2 - remove first import"
11 echo "3 - remove other import"
12 echo ""
13 echo "file: $file"
14 echo "line: $(grep MongoId "$file" | grep -v "^//")"
15 echo -n "> "
16
17 read choice
18
19 echo "your choice: $choice"
20
21 case "$choice" in
22 1)
23 sed -i '' "/MongoId/d" "$file";
24 ;;
25 2)
26 perl -i -pe "s/MongoId, ?//" "$file";
27 ;;
28 3)
29 perl -i -pe "s/, ?MongoId//" "$file";
30 ;;
31 *)
32 echo "nothing, skipping line"
33 ;;
34 esac
35 fi
36done

Don't be intimidated by this, it's mostly echo statements. But we're doing some pretty cool stuff here.

Inside our if statement, we start by echoing some instructions, as well as the file name and the line that we're about to operate on. Then, we read an input from the command line. At this point, the script will pause and wait for us to type some input. Once we hit <enter> the script will resume and assign the value we entered to our choice variable.

Once we have determined our choice, we can do the correct replacement using the bash equivalent of a switch/case statement. For case 1, we're using sed's delete line command d. For cases 2 and 3, we'll use perl instead of sed, because it will operate only on the matched text, and not on the whole line. Finally, the default case will do nothing.

Running this script, we can now walk through the files, one by one, and review each change. It reduces our work to one keystroke per file, which is way less than opening each file, finding the line, removing the right stuff.

And that's it! While we don't use command-line editing commands every day, keeping these skills sharp will speed up your workflow when the right task comes along.

Our teams at ClassDojo have the freedom to choose how they want to work. Many of our teams have started spending a few hours each day mobbing because we've found it to be an effective form of collaboration. Here's how we do it!

What is Mob Programming?

Mob programming is similar to pair programming, but with more than two people working together. One person, the driver, does the actual typing but everyone is involved in the problem solving. Mob programming is often defined as “All the brilliant minds working on the same thing, at the same time, in the same space, and at the same computer.” We don’t follow the strict definition of mobbing, especially since we are a fully remote team, but we are continuously iterating on an approach that works for us.

Why do we mob?

Woody Zuill has a great writeup about how a whole range of issues just faded away once his teams started mobbing, including fading communication problems and decision making problems, without trying to address those issues directly. We’ve found similar benefits, and I’ll call out just a few:

Focus

When the team is working together on a single task, it means we’re focused on the top priority for our team. Although it may sound more productive to have multiple engineers working in parallel on separate tasks, that often means that the top priority is delayed when waiting for answers to questions. Having the whole team focused on the same thing greatly decreases the amount of context switching we need to do.

Knowledge Sharing

Without mobbing, it’s easy to develop silos of knowledge as individuals become experts in specific areas. Others might gain context through code reviews or knowledge sharing meetings. However, when the whole team works together on a piece of code, it almost eliminates the need for code reviews since all the reviewers were involved in writing it, and everyone already has shared knowledge. Mobbing is also really useful for onboarding new teammates and getting them up to speed.

Quality

More time is spent debugging and refactoring code than writing it. If you mob, you have more eyes on the code while it’s being written, rather than during code review or later when it needs to be updated or refactored. You increase the quality of your output, and that quality increase leads to long-term speed.

Collaboration

Especially with a fully remote engineering team, it can be isolating to only work on individual tasks. There is also the challenge of communication and having to wait for answers to blocking questions. By having everyone attend the mob, we eliminate that waiting time. Questions can be answered immediately and decisions are made as a group.

What does remote mobbing look like at ClassDojo?

Who: Most often, we have all the engineers of the same function (e.g. all the full-stack engineers) on a team join a mob. Depending on the task it can be helpful to have other functions like client engineers or product managers join as well, to quickly answer questions and unblock. The group will naturally include engineers of varying skill levels, which is a good thing! We rotate drivers often, but like to have the less experienced engineers drive as it keeps them engaged and learning.

When: This depends on the team’s preference and availability as well as the nature of the task, but we may schedule mobbing time for anywhere from an hour to almost the entire day, most days of the week. It’s important to block the same time off on each person’s calendar and protect that time from other meetings. During longer sessions, we set a timer to remind ourselves to take breaks often. We generally take a 10-15 minute break after every 45 minutes of focused mobbing.

What: We pick one task to focus on, and it should be the highest priority task for the team. It’s easy to get derailed by PRs that need reviewing, bugs that get reported, questions on slack, etc, but we make a conscious effort to avoid starting anything new until we finish the current task. The one exception we have is for P-now bugs, which we drop everything else for.

How: No special tools or complex setup required! We simply hop on a Zoom call and the driver shares their screen. If we’re coding, the driver will use their own IDE and when it’s time to switch drivers, the driver pushes the changes to a branch so the next driver can pull the latest. There are tools for collaborative coding, but we’ve found that they don’t offer much benefit over simply having someone share their screen. If we’re in a design phase, we often use Miro as a collaborative whiteboard.

As with everything we do, we have frequent retrospectives to reflect on what’s going well and what could be improved with how we mob, and we are open to trying new ideas. If you have any thoughts, we’d love to hear from you!

Canary releases are pretty great! ClassDojo uses them as part of our continuous delivery pipeline: having a subset of real users use & validate our app before continuing with deploys allows us to safely & automatically deploy many times a day.

Our canary releases are conceptually simple:

  1. we start canary containers with a new container image
  2. we then route some production traffic to these containers
  3. we monitor them: if a container sees a problem, we stop our pipeline. If they don't see problems, we start a full production deploy

Simple enough, right? There are a few details that go into setting up a system like this, and I'd like to take you through how ClassDojo does it. Our pipeline works well for our company's needs, and I think it's a good example of what this kind of canary-gated deploy can look like.

The key pieces of our system:

  1. We have a logging taxonomy that lets us accurately detect server-errors that we want to fix. ("Errors" that we don't want to fix aren't actually errors!)
  2. HAProxy, Consul, and Nomad let us route a subset of production traffic to a group of canary containers running new code
  3. Our canary containers expose a route with the count of seen errors and the count of total requests that a monitoring script in our jenkins pipeline can hit
  4. The monitoring script will stop our deployment if it sees a single error. If it sees 75,000 successful production requests, it will let the deploy go to production. (75,000 is an arbitrary number that gives us a 99.9% chance of catching errors that happen 1/10^4 requests. )

Starting canary containers

ClassDojo uses Nomad for our container orchestration, so once we've built a docker image and tagged it with our updated_image_id, we can deploy it by running nomad run api-canary.nomad.

1// api-canary.nomad
2job "api-canary" {
3 group "api-canary-group" {
4 count = 8
5 task "api-canary-task" {
6 driver = "docker"
7 config {
8 image = "updated_image_id"
9
10 }
11 service {
12 name = "api-canary"
13 port = "webserver_http"
14 // this registers this port on these containers with consul as eligible for “canary” traffic
15 }
16 resources {
17 cpu = 5000 # MHz
18 memory = 1600
19
20 network {
21 port "webserver_http"{}
22 }
23 }
24 }
25 }
26}

Nomad takes care of running these 8 (count = 8) canary containers on our nomad clients. At this point, we have running containers, but they're not serving any traffic.

Routing traffic to our canary containers

Remember that nomad job file we looked at above? Part of what it was doing was registering a service in consul. We tell consul that the webserver_http port can provide the api-canary service.

1service {
2 name = "api-canary"
3 port = "webserver_http"
4}

We use HAProxy for load-balancing, and we use consul-template to generate updated haproxy configs every 30 seconds based on the service information that consul knows about.

1backend api
2 mode http
3 # I'm omitting a *ton* of detail here!
4 # See https://engineering.classdojo.com/2021/07/13/haproxy-graceful-server-shutdowns talks about how we do graceful deploys with HAProxy
5
6{{ range service "api-canary" }}
7 server canary_{{ .Address }}:{{ .Port }} {{ .Address }}:{{ .Port }}
8{{ end }}
9
10# as far as HAProxy is concerned, the canary containers above should be treated the same as our regularly deployed containers. It will round robin traffic to all of them
11{{ range service "api" }}
12 server api_{{ .Address }}:{{ .Port }} {{ .Address }}:{{ .Port }}
13{{end}}

Monitoring canary

Whenever we see an error, we increment a local counter saying that we saw the error. What counts as an error? For us, an error is something we need to fix (most often 500s or timeouts): if something can't be fixed, it's part of the system, and we need to design around it. If you're curious about our approach to categorizing errors, Creating An Actionable Logging Taxonomy digs into the details. Having an easy way of identifying real problems that should stop a canary deploy is the key piece that makes this system work.

1let errorCount: number = 0;
2export const getErrorCount = () => errorCount;
3export function logServerError(errorDetails: ErrorDetails) {
4 errorCount++;
5 metrics.increment("serverError");
6 winstonLogger.log("error", errorDetails);
7}

Similarly, whenever we finish with a request, we increment another counter saying we saw the request. We can then expose both of these counts on our status route. There are probably better ways of publishing this information to our monitoring script rather than via our main server, but it works well enough for our needs.

1router.get("/api/errorAndRequestCount", () => {
2 return {
3 errorCount: getErrorCount(),
4 requestCount: getRequestsSeenCount(),
5 ...otherInfo,
6 });
7});

Finally, we can use consul-template to re-generate our list of canary hosts & ports, and write a monitoring script to check the /api/errorAndRequestCount route on all of them. If we see an error, we can run nomad job stop api-canary && exit 1, and that will stop our canary containers & our deployment pipeline.

consul-template -template canary.tpl:canary.txt -once

1{{ range service "api-canary" }}
2 {{ .Address }}:{{ .Port }}
3{{end -}}

Our monitoring script watches our canary containers until it sees that they've handled 75,000 requests without an error. (75,000 is a little bit of an arbitrary number: it's large enough that we'll catch relatively rare errors, and small enough that we can serve that traffic on a small number of containers within a few minutes.)

1const fs = require("fs");
2const canaryContainers = fs
3 .readFileSync("./canary.txt")
4 .toString()
5 .split("\n")
6 .map((s) => s.trim())
7 .filter(Boolean);
8const fetch = require("node-fetch");
9const { execSync } = require("child_process");
10const GOAL_REQUEST_COUNT = 75_000;
11
12const delay = (ms) => new Promise((resolve) => setTimeout(resolve, ms));
13
14(async function main() {
15 while (true) {
16 let totalRequestCount = 0;
17 for (const container of canaryContainers) {
18 const { errorCount, requestCount } = await fetch(
19 `${container}/api/errorAndRequestCount`
20 ).then((res) => res.json());
21 totalRequestCount += requestCount;
22 if (errorCount) {
23 // stopping our canary containers is normally handled by the next stage in our pipeline
24 // putting it here for illustration
25 console.error("oh no! canary failed");
26 execSync(`nomad job stop api-canary`);
27 return process.exit(1);
28 }
29 }
30
31 if (totalRequestCount >= GOAL_REQUEST_COUNT) {
32 console.log("yay! canary succeeded");
33 execSync(`nomad job stop api-canary`);
34 return process.exit(0);
35 }
36
37 await delay(1000);
38 }
39})();

Nary an Error with Canary

We've been running this canary setup (with occasional changes) for over eight years now, and it's been a key part of our continuous delivery pipeline, and has let us move quickly and safely. Without it, we would have shipped a lot more errors fully out to production, our overall error rate would likely be higher, and our teams would not be able to move as quickly as they can. Our setup definitely isn't perfect, but it's still hugely valuable, and I hope that sharing our setup will help your team create a better one.

Interested in working in an engineering culture that values automated testing, continuous delivery, and high collaboration? ClassDojo is hiring and we'd love to chat!

Older posts