Playtesting: maintaining high product quality without QA

One aspect of ClassDojo’s engineering culture that tends to surprise people is that we don’t have dedicated QA engineers or testers. We use a variety of strategies to keep our end-to-end feature development fast, continuous, and high quality. These strategies include a strong culture of writing automated tests, along with regular playtesting sessions.

What do we do instead of QA?

At ClassDojo, we prioritize truly continuous deployment of our backend monolith, which means removing as many manual blocking steps as we can, and we’ve built systems to do so safely. Engineers have end-to-end ownership of features, and we rely heavily on writing automated unit and integration tests to ensure that we’re not pushing code that causes unexpected behavior or errors. Our canary containers help automatically roll back any commits that cause errors, and we monitor our logs carefully.

However, as thorough as we try to be, automated tests and monitoring can’t catch everything. To supplement our automated systems, we hold regular playtesting sessions to serve as manual testing as well as context sharing.

What is playtesting?

Playtesting is simply manually testing the products and features the team has built. You may have also heard of the term “dogfooding,” which refers to actually using your product as a real user. Some folks on the team are able to dogfood as legitimate parent users since their kids’ teachers use ClassDojo. However, since the majority of us are not teachers in a classroom nor parents connected to a real classroom, we do intentional playtesting sessions instead.

How do we run a playtesting session?

Who should join?

Individual teams might playtest new features right before shipping, in which case the whole team should join. We also schedule sessions that are open to anyone at the company so teams can share their recent features and solicit feedback from the wider group. For the best insight and context sharing, we encourage including folks from various functions, such as engineers, designers, PMs, marketers, and customer success agents. Playtesters don’t need to prepare anything ahead of time, or even have any knowledge of the flows. They just need to have the right type of device if it’s a mobile feature that’s only released on one platform.

The process

The first step is scheduling a time for the group of testers to get together and play through certain flows. Usually 30-45 minutes is sufficient, depending on how large or complex the feature is. Ahead of time, the product owner should prepare a set of loose guidelines for how to run through the flows, including any necessary setup steps such as installing an alpha build or turning on certain feature flags.

At the beginning of the session, the facilitator or product owner gives a brief overview of the flows and setup, then all the playtesters simply go through the flows independently on their own devices and accounts. Doing this synchronously instead of async lets us quickly identify whether an issue is widespread or a particular edge case, and answer any questions on the spot. We typically use an Asana board where playtesters can add cards for anything that comes up — not just bug reports but also general product feedback and points of confusion.

We reserve 5-10 minutes at the end of the session to go through the cards and make sure the issues are clear. From there, the product owner and team can prioritize them at their next prioritization meeting.

What are the benefits?

Our playtesting often finds bugs in obscure edge cases not covered by automated tests. With an app and user network as complex as ours, it’s nearly impossible to cover all use cases with test fixtures, so it helps to have a variety of people testing on their own real-world accounts.

Playtesting is one of our best methods for cross-team and cross-functional context sharing. Screenshots, product specs, and demos can only go so far in conveying what a new feature really involves. Having teammates actually get their hands on the features is a great way to share what’s being built. It’s also valuable to get fresh perspectives from folks who perhaps hadn’t explored that product area before. It’s like having an in-house focus group to give feedback.

If you have ideas on how we can improve our playtesting or manual testing strategies, reach out! We’d love to hear from you.

    In the dark times before AsyncLocalStorage, it could be hard to tell why a request would occasionally time out. Were there multiple relatively slow queries somewhere in that route's code? Was another request on the same container saturating a database connection pool? Did another request block the event loop? It was possible to use tracing, profiling, and logs to track down problems like these, but it could be tricky; setting up per route metrics using AsyncLocalStorage makes it a ton easier!

    When ClassDojo set up our AsyncLocalStorage-based per-route instrumentation we found things like:

    • a route that occasionally made 30,000+ database requests because it was fanning out over a large list of items
    • another route that blocked the event-loop for 15-20 seconds a few times a day, and caused timeouts for any other requests that our server was handling at the same time
    • a third route that was occasionally fetching 500,000+ items to render simple counts to return to clients

    I wrote about this a bit more in AsyncLocalStorage Makes the Commons Legible. If you're not familiar with ClassDojo, it's a parent-teacher communication platform. Our monolithic NodeJS web-server backend API normally serves ~10,000 requests per second.

    I'd like to go through some of the details of how we set up this per-request instrumentation. For this post, we'll be starting with a relatively standard NodeJS web-server with pre-router middleware, and a router that finds an appropriate route to handle a request. It should look something like this:

    1app.use(({ req, res }, next) => {
    2 const start = Date.now();
    3 onFinished(res, () => afterResponse(req, res, start));
    4 next();
    5});
    6app.use(rateLimitingMiddleware);
    7app.use(bodyParserMiddleware);
    8app.use(allOfTheRestOfOurMiddlware);
    9
    10app.use(router);
    11app.use(notFoundMiddleware);

    To add instrumentation to this setup, we'll want to do the following:

    1. Create a per-request async store in our first middleware
    2. Store details about the database request caused by our request in our request's async store
    3. Send the request's database request details to our data lake.
    4. If any of the database request details violate our per-request limits, we log it as a server-error so that a team can see it & take action

    Starting our pre-request async store

    In a NodeJS web server, each middleware calls the next, so if we start an async local storage context in our very first middleware, every subsequent middleware should have access to the same storage context. (I had a lot of trouble understanding why this worked, so I wrote up a simplified gist that hopefully demonstrates what's going on.)

    1import { AsyncLocalStorage } from "async_hooks";
    2
    3export const requestTrackingAsyncLocalStore = new AsyncLocalStorage();
    4
    5// requestTrackingAsyncLocalStoreMiddleware wraps the downstream koa middlewares inside an async local storage context
    6export function requestTrackingAsyncLocalStoreMiddleware({ req, res }, next) {
    7 const store = {
    8 requestCost,
    9 req,
    10 res,
    11 };
    12 // running the next middleware in the chain in the context of this 'run' makes sure that all calls
    13 // to getStore() in the scope of this requests are bound to the correct store instance
    14 return requestTrackingAsyncLocalStore.run(store, next);
    15}
    16
    17// add this to the router! (this would be in a different file)
    18app.use(requestTrackingAsyncLocalStoreMiddleware);
    19app.use(rateLimitingMiddleware);
    20app.use(....);

    Store details about request about behavior in our pre-request async store

    Now that we have a pre-request async local store, we can grab it and start using it! We'll want to learn:

    1. How many database requests do we make over the course of an HTTP request? Are we running into the N+1 query problem on any of our routes?
    2. How long do those database requests take in total? Requests that take a long time can indicate spots where we're doing a lot of expensive work.
    3. How many documents are these requests returning? If we're processing 10,000s of documents in NodeJS, that can slow down a server quite a bit, and we may want to move that work to our database instead.
    1export function increment(type: "request_count" | "duration" | "document_count", table: string, n: number = 1) {
    2 const store = requestTrackingAsyncLocalStore.getStore();
    3 // we'll probably want to track this to see if we're losing async context over the course of a request
    4 if (!store) return;
    5 _.set(store, ["requestCost", type], _.get(store, ["requestCost", type], 0) + n);
    6 _.set(store, ["requestCost", "byTable", table, type], _.get(store, ["requestCost", "byTable", table,], 0) + n);
    7}

    If we add code that wraps our database client's request, it should hopefully be easy to add these increment calls at an appropriate point.

    Handle the request report

    Once we have this request report, we can do whatever we'd like with it! At ClassDojo, we log a server-error whenever a route is doing anything particularly egregious: that way, we get quick feedback when we've made a mistake. We also use a firehose to send this data to redshift (our data lake) so that we can easily query it. Either way, this is something that we can do after we're done sending our response to the client:

    1app.use(requestTrackingAsyncLocalStoreMiddleware);
    2app.use(({ req, res }, next) => {
    3 // this use of new AsyncResource will preserve the async context
    4 res.on("finished", new AsyncResource("requestTrackingLogging").bind(() => {
    5 const store = requestTrackingAsyncLocalStore.getStore();
    6 if (!store) throw new Error(`Something has gone awry with our async tracking!`);
    7 if (isEgregiouslyBad(store.requestCost)) logOutBadRequest(store);
    8 requestCostFirehose.write(store);
    9 }))
    10 next();
    11});

    Tracking down places where we lose async context

    While the async store might feel like magic, it's not, and sommon common situations will cause you to lose async context:

    1. using callbacks rather than promises. In those situations, you'll need to create an AsyncResource to bind the current async context
    1setTimeout(new AsyncResource("timeout").bind(() => doRequestTrackingThings()), 1);
    2redisClient.get("key", new AsyncResource("timeout").bind(() => doRequestTrackingThings()))
    1. Some promise libraries might not support async-hooks. Bluebird does, but requires setting asyncHooks to true: Bluebird.config({ asyncHooks: true });.

    It may take a bit of work to track down and fix all of the places where you're losing async context. Setting up your increment calls to log out details about those situations can help!

    1export function increment(type: "request_count" | "duration" | "document_count", table: string, n: number = 1) {
    2 const store = requestTrackingAsyncLocalStore.getStore();
    3 if (!store) {
    4 logServerError(`We lack async context for a call to increment ${type} ${table} by ${n}`, new Error().stack);
    5 return;
    6 }
    7 ...
    8}

    Increased Observability is great!

    Putting effort into increasing the observability of a system can make that system much easier to manage. For a NodeJS web-server, we've found a lot of benefits in using AsyncLocalStorage to improve per-request visibility: it has let us improve latency on a few routes, reduced our event-loop blocking, and given us a better view of opportunities to improve performance.

      A single large query in SQL can be hard to understand, test, debug, or change in the same way that an over-large function in code can be. A large query is also much harder to write! Feedback loops while writing large queries are slow and you'll often find yourself needing to guess at where the problem in your query is.

      When I started writing analytics queries, I wrote some pretty rough ones that are now hard to debug and maintain! (Apologies to everyone who has had to deal with any of my old SQL queries.) Over time, I think I've gotten better at them, and I wanted to write down some of the things I'm doing differently now that I hope will be useful to other people in similar situations.

      Finally, I'm a product engineer who rarely writes these sorts of queries! This advice is much less applicable to someone who does this daily & is using better & more focused tools to do analyses.

      Temporary tables, views, and with

      Large queries are hard, but it's pretty simple to break a large query into smaller pieces. For analytics queries, I normally create small temporary tables (often with the temporary keyword) that normalize data, filter out deleted rows and rows I'm not interested in, and organize my data into a format that makes querying easy. Views or with clauses can accomplish similar things, but I like using temporary tables for this because they cache results and make subsequent queries faster.

      I also try to put constants into a temporary table or with clause. When working on a query, it can be easy to forget to update a constant in one spot and then get completely meaningless results. (Shoutout to Ben Haley for showing me this trick!)

      All of this might sound a little abstract: let's take a somewhat contrived query and try to refactor it. We want to bucket and count US-based teachers who were active in 2021 by how many classes they created during that time period. Here's what that might look like as a single query:

      1select
      2 case when class_count < 5 then class_count::varchar else 'many' end as bucket,
      3 count(*)
      4 from (
      5 select count(distinct class.classId) as class_count
      6 from teacher
      7 join user_teacher ON teacher.teacherId = user_teacher.teacherid
      8 -- left join class_teacher to make sure we're counting teachers who haven't created classes
      9 left join class_teacher on class_teacher.teacherId = user_teacher.teacherId and class_teacher.creator
      10 left join user USING(userId)
      11 join class using(classId)
      12 join (
      13 select distinct teacherId
      14 from teacher_active
      15 where active_date between '2021-01-01' and '2022-01-01'
      16 ) as ats on teacher.teacherId = ats.teacherId
      17 and class.createdat between '2021-01-01' and '2022-01-01'
      18 and not class.autocreated_demo
      19 and lower(user.country) in ('usa', 'us')
      20 group by teacherId
      21 )
      22group by 1

      This query isn't particularly complex, but it's still enough logic that I'd be a little worried about changing it or verifying that it's correct. I'd be tempted to try to pull out constants and then separate out the filtering logic from the calculation logic.

      1drop table if exists _constant;
      2create temporary table _constant as (
      3 select '2021-01-01' as start, '2022-01-01' as end
      4);
      5
      6drop table if exists _teacher;
      7create temporary table _teacher as (
      8 -- us_user is probably overkill: this might be better in the `where` clause!
      9 with us_user as (
      10 select userId
      11 from user
      12 where lower(country) in ('usa', 'us')
      13 )
      14 select distinct teacherId
      15 from teacher_active
      16 join user_teacher USING(teacherId)
      17 join us_user using(userid)
      18 where active_date between (select start from _constant)
      19 and (select end from _constant)
      20);
      21drop table if exists _class;
      22create temporary table _class (
      23 select classId
      24 from class
      25 where class.created between (select start from _constant)
      26 and (select end from _constant)
      27 and not class.autocreated_demo
      28);
      29
      30drop table if exists _classes_created_by_teacher;
      31create temporary table _classes_created_by_teacher (
      32 with class_creator as (
      33 select class_teacher.*
      34 from class_teacher
      35 join _class USING(classId)
      36 where class_teacher.creator
      37 )
      38 select teacherId, count(distinct classId) as classes_created
      39 from _teacher
      40 left join class_creator using(teacherId)
      41 group by teacherId
      42);
      43
      44select
      45 case when class_count < 5 then class_count::varchar else 'many' end as bucket,
      46 count(*)
      47from _classes_created_by_teacher
      48group by bucket;

      It's arguable whether this is actually better! The initial query is short enough that it's not that much logic to understand: it might be the right size for the team that you're working with. There are also certainly better ways of factoring this same query that could make the logic even more clear. Overall though, I'd much rather work with the updated query:

      • if something is broken in the query, I can easily and quickly examine the tables that I've created to see if my results match my expectations
      • if I have a testing library of some sort, I can set up simple assertions about what the tables I'm using look like
      • the overall query will run faster because results are cached and because query-planners don't always generate optimum plans for large queries. While working on this query, I'll have faster feedback loops
      • I'll be able to tell which parts of this query are slow and optimize if necessary
      • it's easier to focus on adapting & improving a single part
      • The sub-queries that I've turned into tables are things I could take advantage of later if I ever tackle other similar problems by turning them into nice non-temporary cache tables

      I think many data-focused engineers use jupyter notebooks and pandas to break down large queries. I think how you're breaking down a large query into smaller pieces is much less important than doing that breakdown!

      Make feedback loops FAST!

      One of the most frustrating parts of working on a large query is that feedback loops can be slow. Making a change and waiting tens of minutes can completely kill any programming flow or focus that you have.

      • Break up large queries into smaller ones that are quick to run!
      • Use select without a from to quickly test out behavior. You can run queries like select extract('year' from GETDATE()); or select 1 != null, 1 is not null to quickly check your understanding. This can be especially useful for testing out regular expressions and formatting date fields for charts. There's no need for a from or to run a full query1
      • If a full table is slow to query, it might make sense to create a temporary table with representative data that you can use to build up your queries
      • Good indexes or sort-keys can drastically improve query speed! The most common mistake I've made is setting up a compound sort key with a too-precise timestamp followed by other keys I'm interested in. If you use a second or millisecond precision key at the start of a compound sortkey, that key's precision will override any possible benefit from the other key. So, rather than creating a sortkey like (createdAtMs, event), it's probably better to instead sort and query on an index like (day, event). (This might mean that queries need to include both day and createdAtMs.)
      • validate your work as you go! The quicker you can realize that your assumptions about a table or column are incorrect the better. Running a query that checks whether a column is unique, what values a column can hold, or just what data looks like can save a ton of time!

      In general, putting effort into how quickly you get feedback while working makes it much easier to find flow and be effective. A little bit of effort put into setting up nice tables, improving data layout, and optimizing sortkeys can pay large dividends.

        Newer posts
        Older posts