Patterns of the Swarm

On our teams, we do our best to ensure that we're fully focused on the most important thing that our team could be doing. To do that, we often "swarm" on the top priority: this is some internal documentation on what that looks like in practice!

Swarming is when a team is as fully dedicated to their absolute top priority as possible. This is the ideal state for maximum productivity. This is a list of patterns for swarming. No one pattern works in every scenario. Some patterns can be combined! It might be the right move for a group to change patterns in the middle work! We're probably also missing great patterns! Get creative! These are just suggestions and starting points.

💡 Also please note that full-team swarming is often impossible or suboptimal in many cases. You can't just keep adding more people to the same task and always expect that addition to be effective (e.g., nine women can't make a baby in one month).

Group Programming

Group programming is any kind of programming with more than 1 person on one screen at a time. There are many different kinds and you should decide which one makes sense instead of just adopting one for every situation. You will want to consider the goal of this type of collaboration, the skill levels and background knowledge of everyone involved, the responsibilities of everyone in the group, and the type of work you're trying to accomplish.

Roles

  • "navigator" — the person talking about what needs to be done on a high level
  • "driver" — the person with their hands on the keyboard making that happen
  • "observer" — anyone else that is just observing

Every group programming pattern here has a single navigator and single driver for the sake of simplicity.

Observers will have an extremely hard time staying engaged unless they are very regularly rotated into more engaging roles.

Notes

  • Group programming is most useful when you're introducing new people to a code-base, or working on an integration between two platforms.
  • Group programming is extremely exhausting and should involve heavy, heavy use of breaks. I'd recommend 15 minutes per hour AT MINIMUM and adjusting as you feel necessary.
  • Not everyone in the group needs to be a software developer for this to be useful.
  • Group programming with just two people is generally called Pair programming or pairing.
  • Group programming with more people is generally called mob programming or mobbing

Pair Programming

Peer Pairing

A type of pair programming where both developers have roughly the same level of expertise and background knowledge.

  • Recommended for: Really hard things or really unfun work that is less unfun with a friend.
  • Not Recommended for: Simple straightforward tasks: try Divide and Conquer instead. Cases where the work would benefit from more eyes, or when we want to spread knowledge/skills with more people in similar ways: try mob programming.

Mentor Pairing

A type of pair programming where one developer is trying to teach another developer something as they do the work together. The two developers could have things to learn from each other, and one of them could possibly not even be a developer at all.

You'll want to be very thoughtful about who should be the navigator and who should be the driver depending on your teaching goals. You may or may not want to rotate roles.

  • Recommended for: Cases where we want one person to teach another person something in a hands-on way. Hands-on teaching on real work is generally pretty engaging and practical compared to more academic approaches.
  • Not Recommended For: Times when multiple people need the same mentoring — try Directed Mobbing.

Ping-Pong Pairing

A type of pair programming where the developers practice TDD with one person writing a test and the other writing the app code that passes it. You could have only one person writing the tests during the session, or the duties could rotate. Recommended for: really complex and detailed uses cases where an adversarial testing technique can drive out a better result.

Not recommended for: straightforward building (like CRUD stuff)

Mob Programming

Mob-programming is group programming with more than 2 people. The additional people beyond the driver and the navigator are almost always observers to keep things simple. The traditional recommendation is to change the driver every 10 minutes. Some examples are: Directed Mobbing , Peer Mobbing

Directed Mobbing

Directed mobbing is a form of mob programming when there's a group trying to learn something from a particular expert . In general that expert should never be driving so that the learners are always engaged and learning in a hands-on fashion. Usually the expert will be the navigator, but there may be a point when the learners can even get into the navigation role and allow the expert to continue just as an observer that's around for consultation. If there comes a point where the expert is no longer necessary at all, this style may no longer be useful.

  • Recommended for: teaching/guiding many people the same thing at once.
  • Not Recommended for: groups that have graduated to having the teacher as an observer for an extended amount of time. Move on to something more engaging/efficient!

💡 Check out this great blog post written by Melissa Dirdo for more on ClassDojo's approach to mobbing.

Peer Mobbing

Peer mobbing is is a form of mob programming when everyone has roughly the same level of ability and it makes sense to rotate them into navigator/driver positions equally.

  • Recommended for: stuff the team is working though/learning together that they all want to know/understand. getting the team on the same page about practices/processes/conventions. working through a hairy problem that could use lots of different ideas/perspectives.
  • Not Recommended for: simple straightforward things. things that are low value for everyone to learn.

Divide and Conquer

If stories are sliced well vertically, a particular story may still have divisible parts along other lines that could allow them to be worked on simultaneously without much overhead. This requires that everyone involved gets in a group and does actual upfront planning and breakdown but it can be pretty quick ("You do the frontend and I'll do the backend", "You do the model and I'll do the controller"). This does not require any tracking whatsoever in asana (and probably shouldn't), so once things are broken up well vertically in asana, feel free to break up a particular story horizontally to collaborate on it if you think that's the best way. You'll probably want to solve integrations between layers first before actually building the layers. There may be other sub-pieces to cut out other than horizontal layers as well!

  • Recommended for: When everything is simple and straightforward and everyone knows what to do and how to do it. When everybody is exhausted from group programming and wants to just bang out some code and listen to Iron Maiden. When someone being mentored is ready to try flying solo to see if they really can apply their new learnings on their own.
  • Not recommended for: Work that can't be parallelized. Teams that don't have widespread capability on the subject matter (they're just going to continuously interrupt each other asking questions). Work that, when broken down, is not straightforward.

    Automated and semi-automated code migrations using shell text manipulation tools are great! Turning a migration task that might take multiple days or weeks of engineering effort into one that you can accomplish in a few minutes can be a huge win. I'm not remotely an expert at these migrations, but I thought it'd still be useful to write up the patterns that I use consistently.

    Use ag, rg, or git grep to list files

    Before anything else, you need to edit the right files! If you don't have a way of finding your codebase's files, you might accidentally edit random cache files, package files, editor files, or other dependencies. Editing those files is a good way to end up throwing away a codebase and cloning it from scratch again.

    I normally use ag -l . to list files because ag, the Silver Searcher, is set up to respect .gitignore already. A simple find and replace might look like ag -l . | xargs gsed -i 's|bad pattern|replacement|'. It'd be simpler to do that replacement with your editor, but the ag -l . | xargs gsed -i pattern is one that you can expand on in a larger script.

    Pause for user input: not all migrations are fully automatable

    A lot of migrations can't actually be fully automated. In those cases, it can be worth building a miniature tool to make editing faster (and more fun!).

    # spaces in file names will kill this for loop
    # thankfully, I've never worked in a code base where people put spaces in filenames
    for file in $(ag -l bad_pattern); do
      echo "how should we replace bad_pattern in ${file}? Here's context:"
      ag -C 3 bad_pattern "${file}"
      echo ""
      read good_pattern
      # quoting in sed commands is tricky!
      # using `${var}` rather than $var avoids potential problems here
      gsed -i "s|bad_pattern|${good_pattern}|" "${file}"
    done
    

    You can expand this pattern to look for a number and choose an appropriate option, but just having something that speeds up going through files makes life better!

    Handle relative import paths with for loops

    I've often needed to add a new import statement with a relative path to files as part of a migration, and every time I've been surprised that my editor hasn't been able to help me out more: what am I missing? I normally use a for loop and increase both the max-depth of files I'm looking at and the number of ../ on the path:

    dots="."
    import_path="/file/path"
    for ((depth=0; depth<5; depth++)); do
     dots="$dots/..";
     for file in $(ag -l --depth $depth | grep .ts); do
       if ! grep $import_path $file; then
         gsed -i "1i import '${dots}${import_path}';" $file;
       fi
     done
    done
    

    Rely on your code formatter

    Not needing to worry about code formatting is AMAZING. If your codebase is set up with a code formatter (like prettier or gofmt), it allows you to make changes without worrying about whitespace and then let the code formatter fix things later. It may even make sense to intentionally remove white-space from a pattern in order to make a replacement simpler to write!

    Use the right tool for the job

    1. Some code migrations require a tool that looks at the AST rather than the text in a code file and transforms that AST. These tools are more powerful & flexible than shell tools, but they require a bit more effort to get working. In NodeJS, there's jscodeshift and codemods. I don't know what's available for other languages.
    2. Your editor & language might support advanced migrations. If it does, learning how to do those migrations with your editor will likely be more effective than using these techniques or may prove a useful complement to these techniques.
    3. Bash tools like sed, awk, grep, and cut are designed to deal with text and files. Code is text and files! Other tools work, but they might not be designed to deal with files and streams of text.
    4. Shell tools are great, but a tool you know well and are excited about using is better than a tool you don't want to learn! Whatever programming language you're most comfortable with should have ways of dealing with and changing files and text. Having some way of manipulating text & files is important!. There are even tools like rb or nq (I wrote this one!) that let you use the Ruby or NodeJS syntax you're familiar with on the command line in a script you're writing.

    Use sed: it's designed for this

    sed is the streaming text editor, and it's the perfect tool for many code migrations. A surprising number of code migrations boil down to replacing a code pattern that happens on a single line with a different code pattern: sed makes that easy. Here are a few notes:

    1. If you're on a mac, you'll want to download a modern version of sed. I use gnu-sed: brew install gnu-sed
    2. use | (or anything else!) as your delimiter rather than /. sed takes the first character after the command as the delimiter, and / will show up in things that you want to replace pretty often! Writing gsed 's|/path/file.js|/path/file.ts|' is nicer than gsed 's/\/path\/file.js/\/path\/file.ts/'.
    3. In gsed, the --null-data (-z) option separates lines by NUL characters which lets you easily match and edit multiline patterns. If you use this, don't forget to use the g flag at the end to get all matches: everything in a file will be on the same 'line' for sed.
    4. When referring to shell variables, use ${VAR_NAME} rather than $VAR_NAME. This will simplify using them in sed commands.
    5. Use -E (or -r with gsed) for extended regular expressions and use capture groups in your regular expressions. git grep -l pattern | xargs gsed -Ei 's|pat(tern)|\1s are birds|g'

    ("perl pie" (perl -pi -e) can be another good tool for finding and replacing patterns! It's just not one I know.)

    Many migrations might take multiple steps

    When you're migrating code, don't worry about migrating everything at once. If you can break down the problem into a few different commands, those individual commands can be simple to write: you might first replace a function call with a different one and then update import statements to require the new function that you added.

    When you write a regular expression in a find-and-replace, you can sometimes get false positives. Rather than trying to update your regular expression to skip the false positives, I often find it simpler to write a regular expression to replace those false positives with a temporary pattern, update the remaining matches, and then replace the temporary pattern.

    With all of this, you'll need to rely on git (or another version control system). It's really easy to make mistakes! If you don't have an easy way to undo mistakes, you'll be sad.

    Automate ALL the code migrations!

    Manipulating text & files like this is a skill, and it's one that takes some practice to learn. Even if it's much slower to automate a code change, spending the time to automate it will help you build the skills to automate larger, more complex, and more valuable code migrations. I remember spending over an hour trying to figure out how to automate changing a pattern that was only in 10 spots in our codebase. It would have taken 5 minutes to do manually, but I'm glad I spent 10x the time doing it the slow way with shell tools because that experience made me capable of tackling more complex migrations that wouldn't be feasible to do manually.

      TL;DR: see how your code responds to not being able to reach a dependency

      docker disconnect <network-id / network-name> <container-id / container-name>
      docker connect <network-id / network-name> <container-id / container-name>
      

      We recently ran into a production issue where a few of our main API containers were briefly unable to reach our RabbitMQ instance, and this surfaced a latent problem with our reconnection logic. The messages that go into Rabbit are exclusively side effects — things like a push notification when a user sends another user a message. If something briefly interrupts Rabbit connectivity, the API server should still be able to mostly function normally (unlike if it were unable to reach our primary AWS RDS cluster). The API-side Rabbit code was designed to reconnect during a network interruption, buffering messages in memory in the meantime. Our logs showed this process wasn't working as intended — the buffer filled up and we never reconnected to Rabbit.

      After a couple quick fixes didn't resolve the issue, we realized we didn't have a clear picture of what the underlying library node-amqp was doing when it encountered a connection issue. What our investigation found is related to node-amqp specifically, and the Node event emitter model more generally[^1], but the docker network commands we used should be useful for any dockerized service.

      We were working with two different objects, a node-ampq Connection and an associated Channel. When we tried docker killing the Rabbit container, the API-side Rabbit client closed gracefully — not a match with what we saw from our logging. We needed to understand the events emitted by these two objects during an unexpected network problem.

      You can simulate a network partition in any number of ways; after a quick Google search we came across the docker network suite of commands. We took a stab at docker network disconnect and immediately saw the same behavior we saw in production.

      docker disconnect <network-id / network-name> <container-id / container-name>
      docker connect <network-id / network-name> <container-id / container-name>
      

      Our specific issue ended up being that the close event on the AMQP connection had an error payload when the connection was not closed cleanly, and no payload when it was. The fix was pretty easy, and we determined it would work as intended by doing a quick docker network connect and watching the debug logging reconnect and flush its buffer of jobs to the Rabbit server.

      We don't yet have an automated test verifying that the reconnection logic works but we plan to soon. This is what's most exciting to me about docker network — automated testing of service behavior in the case of network issues, all inside docker. We want our main API service to respond differently when specific services are unavailable. If a critical dependency like our main AWS RDS cluster is unreachable, we need to shut down, and that's pretty easy to test. Testing nuanced behavior with subcritical dependencies like reconnecting a fixed number of times (or until a fixed-size buffer is full) is trickier, but docker network provides an easy way to do just that!

      [^1]: We have a small collection of Go services reading to and writing from RabbitMQ. The error-handling model there is more natural: there's a channel of jobs coming from the guts of the service that needs to be sent to the Rabbit server, along with a channel of errors produced by Rabbit. Since jobs and errors both come from the same kind of source — a Go channel — dealing with jobs and errors is as simple as selecting on the two channels.

        Newer posts
        Older posts