Automating Weekly Business Reviews with DBT Macros

Evan Jones

e-jones

2025-03-19

Background

At ClassDojo, our product teams conduct Weekly Business Reviews (WBRs) to track, analyze, and discuss key metrics in relation to our overall business goals.

One critical metric during the busy back-to-school season is the percentage of new teachers who successfully create a ClassDojo classroom within seven days of signing up. Ensuring a smooth onboarding experience is essential for helping teachers set up their classes and engage students and parents quickly.

Beyond simply tracking the raw number of classrooms created in the current year, we also need additional context to understand trends and impacts:

How does class creation volume compare to the previous year?
What effect did a recently launched product feature have on class creation over the past six weeks?
Are there any sharp spikes or drops in metrics that warrant deeper investigation?

To maintain consistency across teams, we adopted a “6-12” visualization format, displaying data from the previous six weeks and the past 12 months to provide a clear historical perspective.

Now imagine you're on the data team, managing ~20 key metrics that drive weekly product decisions – plus 100 more used for deeper analysis. Now, scale that across all major product teams. ClassDojo as a company is a complex system. It involves hundreds of inputs interacting with each other to drive outputs: business results. How do you build a standardized, scalable, maintainable process to keep track and make sense of all of these inputs and outputs?

Do you manually update a BI tool? Do you precompute every possible metric slice in an OLAP cube? More importantly, how can you enable Data Scientists to define and deploy their own metrics – without relying on dedicated Analytics Engineering support?

Challenge: Scaling metrics generation and visualization

When approaching this challenge as a data team (Data Platform, Analytics Engineering, and Data Science), we aimed to create a scalable, end-to-end solution rather than focusing on a single metric. Our goal was to streamline metric generation across all key business metrics, making the process more efficient and self-serve. Before diving into implementation, we first conducted a discovery phase to understand the “life of a metric” across our data stack – mapping its journey from raw data to final visualization.

Step 1: Understand the life of a ClassDojo metric

How do metrics get generated today?

A core part of the Analytics Engineering mission is to drive product impact by delivering usable, scalable data solutions. Effective metric building requires a deep understanding of both product context and the end-to-end data flow – from upstream data sources to downstream consumption – before writing a single line of SQL.

Product Context = What is the business trying to achieve? What are our goals?

Identifying pain points by auditing the metrics journey

To understand which ~20 key metrics matter most to each team, we start by partnering with stakeholders and asking the right questions. Gaining this context ensures we focus on the most impactful metrics and align data solutions with business needs.

Looking at the ❌ pain points, we saw a clear theme emerge that we didn’t have a consistent, flexible, and scalable way to generate and visualize metrics.

Step 2: Designing for the future

Ideate! What could the tech vision look like for metrics generation?

There are many solutions to building a metrics engine that we explored ranging from DBT’s own semantic layer, Netflix’s DataJunction all the way to doing the modelling in the BI layer like LookML or Metabase models.

In the end, we went with a simple DBT macro approach that (1) accepts a generic SQL statement from an underlying fact table and then (2) creates a wide and tall table that can easily be fed into a visualization template. What follows is a deep dive into that approach.

What does the end-state look like?

Our goal is a standardized, fast-to-query "wide metrics table" that simplifies period-over-period analysis. The backbone of this table is the metrics period (e.g., daily, monthly, MTD, QTD, YTD), with corresponding metric values for each period.

In practice, this table includes 20+ additional metrics columns, providing a comprehensive view of performance trends. Next, we’ll dive into the process of (1) generating time periods and (2) incorporating all key metrics into this structure.

If we’re able to model the above correctly, this would replace many bespoke metrics queries like this one:

SELECT 
    COUNT(DISTINCT CASE 
        WHEN classroom_created_date <= DATE_ADD(joined_date, INTERVAL 7 DAY) 
        THEN teacher_id 
    END) * 100.0 / COUNT(DISTINCT teacher_id) AS percent_created_within_7_days
FROM <fct_teacher_events>
WHERE date_key BETWEEN '2025-01-01' AND '2025-03-31';

… with generic “wide” metrics table queries like this which we can then automate into a visualization tool:

SELECT perc_created_within_7d
FROM <curated_metics_table>
WHERE period_type = "quarter_to_date"

… you could even imagine a metrics cube that has many metric slices like:

SELECT
    period_type,
    metric_name,
    metric_value,
    metric_prev_yoy_value,
    metric_perc_yoy_chg
FROM <curated_metics_cube>
WHERE metric_name = 'perc_created_new_class_within_7d_signup'

Show me the code!

Here’s the technical approach we went with

Start with a high quality fact table as input
Use DBT macro to generate the deep and wide metrics table
Connect our visualization tool

Deep dive on macro generate_wbr_periods()

See this GitHub gist for the basic scaffolding of our approach.

At the very bottom, we see that six CTEs are UNION ALL ’ed to generate the needed “time periods”

-- Union all CTEs at the end
SELECT * FROM daily_stats
UNION ALL
SELECT * FROM weekly_stats
UNION ALL
SELECT * FROM monthly_stats
UNION ALL
SELECT * FROM month_to_date_stats
UNION ALL
SELECT * FROM quarter_to_date_stats
UNION ALL
SELECT * FROM year_to_date_stats

Each of those CTEs has their own unique code for their “time period”. For example, the monthly_stats CTE will have values of:

period_type = 'month' across all rows
period_value is an integer between 1 and 12 to represent month X, e.g. 6 == June
year is an varchar(4) to represent the year, eg. '2024'

Thus, these three columns <period_type, period_value, year> make the unique Primary Key for the output table of this macro.

Setting up the macro parameters

The top of the DBT macro code shows the mandatory and optional arguments. Here are the arguments and a short description.

Calling the macro on an existing fct_ table

To illustrate creating the screenshot earlier, we will call this macro like this:

... other important DBT code here

{{ generate_wbr_periods(
    source_table='fct_teacher_events'
    , date_column='date_key'
    , date_end='2025-02-01'
    , metrics_sql_body="""
   , COUNT(DISTINCT CASE 
        WHEN classroom_created_date <= DATE_ADD(joined_date, INTERVAL 7 DAY) 
        THEN teacher_id 
    END) * 100.0 / COUNT(DISTINCT teacher_id) AS perc_class_created_within7d
    """
) }}

which generates a table like so:

SELECT
    	period_type
    	, period_value
    	, date_key
    	, end_date_key
, "year"

-- metrics_sql_body starts here
, COUNT(DISTINCT CASE 
        WHEN classroom_created_date <= DATE_ADD(joined_date, INTERVAL 7 DAY) 
        THEN teacher_id 
 END) * 100.0 / COUNT(DISTINCT teacher_id) AS perc_class_created_within7d
FROM fct_teacher_events
GROUP BY ...
ORDER BY period_type DESC, "year" DESC, period_value DESC;

Lastly, if one needs to calculate a metric across multiple segments, then more complex DBT Jinja code can be written to employ loops, e.g.

/* Loop through specific platforms */
{% set platforms_array = [
      'ios'
    , 'android'
    , 'web'
    ]
%}

{% set platforms_sql = [] %}
{% for platform in platforms_array %}
    {% set sql_piece %}
        , COUNT(CASE WHEN platform = '{{platform}}' THEN 1 END) AS {{platform}}_count
    {% endset %}
    {% do platforms_sql.append(sql_piece) %}
{% endfor %}

One word of caution when using DBT Jinja macros instead of more out-of-the-box (OOTB) solutions is that over-customization can introduce significant maintenance overhead. While Jinja macros offer flexibility, they can also add complexity – especially if your Analytics Engineering team is new to macros or if the implementation lacks proper documentation and testing.

Extending this macro after we launched to support DS self-serve

After the initial WBR reviews with senior leadership team, we quickly learned that some of our first ~20 key metrics required additional filtering, such as “Those international teachers in countries that began their school year in the first quarter of the calendar”. We could simply add more metrics to the metrics_sql_body section, but we realized it would be more flexible to have all ~20 key metrics filtered, and thus DS could self-serve by configuration of the optional_entity_filter argument in a new DBT Model.

{{ generate_wbr_periods(
    source_table='fct_teacher_events'
    , date_column='date_key'
    , date_end='2025-02-01'
    , metrics_sql_body="""
   , COUNT(DISTINCT CASE 
        WHEN classroom_created_date <= DATE_ADD(joined_date, INTERVAL 7 DAY) 
        THEN teacher_id 
    END) * 100.0 / COUNT(DISTINCT teacher_id) AS perc_class_created_within7d
    """
    , optional_entity_filter="teacher_id IN (SELECT teacher_id FROM analytics.dim_teachers_now WHERE LOWER(country_code) IN ('au', 'za', 'my', 'br', 'co', 'nz', 'sg'))"
) }}

The top-most CTE of the macro would inject this global filter like:

-- Base table
WITH dbt_wide_table AS (
   SELECT *
   FROM fct_teacher_events
   WHERE teacher_id IN (SELECT teacher_id FROM analytics.dim_teachers_now WHERE LOWER(country_code) IN ('au', 'za', 'my', 'br', 'co', 'nz', 'sg'))
)

-- from this point on, all other CTEs are filtered by only these specific teachers

Key takeaways

Tip #1 – Start small.

Standardize on a few critical metrics to start. How do these metrics get generated and reported today? Pick the most important 2 or 3 metrics to start with for a given domain. Don’t overload stakeholders with every metric slice. You can link to deep dive dashboards with filters for exploratory analysis. The aim for these company critical metrics is consistency across teams.

Tip #2 – Map out the lifecycle of a metric

Engaging with partner teams (e.g., Data Science, Product, and Engineering) early on helps uncover common pain points, such as inconsistent definitions, multiple sources of truth, or performance bottlenecks.

Tip #3 – Address the biggest pain points first in your solution

For us, our DBT macro approach helped solve:

✅ Standardized metric generation – A single, reusable macro to generate WBR-ready tables
✅ Automated time-period calculations – Built-in period logic (daily, monthly, quarterly, etc.)
✅ Self-serve analytics for Data Scientists – A structured way to define and onboard new metrics
✅ Consistent reporting look-and-feel – A standard way of presenting metrics for leadership reviews

XKCD's "Is It Worth the Time?" Considered Harmful

Will Keleher

kelwill

2025-01-08

"Is It Worth the Time?"

Is It Worth the Time?

Years ago, I wanted to update a bad pattern in our codebase—updating the order of arguments to a function or something like that. This particular pattern was only in about 10 spots, so it would have only taken a minute to search and fix manually, but instead I spent an hour automating the fix using sed and xargs. And I think that was the right choice.

At the time, I had no clue what I was doing in the shell. sed confused me, shell quoting rules were totally opaque, and xargs was a black box. I remember feeling so stupid because I couldn't figure out how to automate this reoroder; it was the kind of thing that should be simple, but I kept getting stuck.

But I automated that refactor, and I learned a few things! I learned that the default sed on Macs was quite old, that -r is necessary for regular expression mode, how to use capture groups, how to use -i with gsed, the difference between ' and " in the shell, how to use xargs with gsed -i, and probably a few other things. If I'd found better resources, I could have learned those things faster, but being comfortable with shell tools has saved me so much time over the years when I've run into situations that aren't fixable manually.[^1]

Just a few days ago, I wanted to stitch together 15 markdown documents, format them in a nice-ish way, and then print them. The easy way would have been to copy-paste 15 times. It probably would have taken a minute. Instead, I spent a chunk of time writing a python script to do it for me, and I learned a few more things. I somehow hadn't put together that you can just write html in markdown and it just works because it's left alone during the conversion process. That seems glaringly obvious in retrospect, but I'm glad I learned it. I also learned pandoc is great for converting markdown to html, this CSS from killercup works great with pandoc --css pandoc.css --standalone to make the pandoc-converted html prettier, figured out where my Obsidian Vault was stored on disk, and learned the CSS to make nice line-breaks (<div style="page-break-after: always;"></div>) between the 15 markdown documents. Doing the work to automate stitching together these 15 markdown documents was far slower than copy-pasting them, but over the long-term, automating builds compounding skills.

I'll sometimes see people reference XKCD's Is It Worth the Time to argue against automating things that would be faster to do manually, and I think that that mindset is a mistake. Automating the easy things is how you build the skills, mindset, and muscle-memory to automate the hard things. There are situations where the only thing that matters is that a task gets accomplished quickly; in those cases, doing it manually can make sense! But a lot of the time, trying to automate the thing builds skill and capability that will come in handy later.

Aside from building personal capability, reaching for automation is an important part of building an engineering culture that values automation. An engineering culture that values automation is going to find opportunities to reduce toil and speed through projects that another engineering culture might miss. I'd like to have an engineering culture that celebrates the times when the team learned how to automate something, even when automating the thing took much longer than just doing the thing. The accumulation of capability trumps the loss in short-term speed.

So, the next time you want to rewrite the order of arguments in a function that's only used in three spots in a codebase, spend some time figuring out regular expressions instead. The next time you realize you forgot to run a command that's necessary for something, spend an unreasonable amount of effort making it so that the command runs automatically. The next time you have something you need to only do yearly, waste some time building tooling to make it easier for yourself.

[^1]: ag -l doThing | xargs gsed -r 's|doThing$([^,]+), ([^)]+)$|doThing(\2, \1)|g' is a reasonable way of re-ordering a function's arguments (Bash Patterns I Use Weekly describes how this command works). Code editors should have this built in to their find-and-replace, but I prefer doing it on the shell because it allows me to construct more complex commands that aren't possible with a GUI.

18,957 tests in under 6 minutes: ClassDojo's approach to backend testing

Will Keleher

kelwill

2024-12-13

We're pretty proud of our backend test suite. We have a lot of tests, and developers can run the full test suite locally in under six minutes. These aren't simple unit tests—they're tests that hit multiple databases and routes with only a minimal amount of stubbing for external dependencies.

9 years ago, we were proud of running 2,000 tests in 3 minutes. Not much has changed from that initial post—we're still writing a bunch of tests, but we've put a lot of effort over the years into making sure our test suite has stayed acceptably fast for people.

Why do we run our tests this way?

First off though, why are we making things so hard for ourselves? When we write our tests, we don't stub out our databases at all. Many of our tests are resource tests—those tests hit a real running server, the resource code issues real queries against Redis/MySQL/MongoDB/memcached containers, and if it makes any changes to those databases, we need to reset the databases fully before the next test run.

We think that the database is an integral part of the system that we're testing. When you stub a database query, that means that you're not testing the query. And I don't know about you, but I've gotten plenty of database queries wrong.

Similarly, we like to run a full server for any resource level tests. The middleware that runs for each resource matters. We want our tests to match our production environment as much as possible.

How do we make the tests fast?

You'll see recommendations online to limit this style of testing, where you have full databases that you're querying, not because it's worse, but because it ends up being too slow. We've needed to put a lot of work into test speed over the years, and if you take nothing else away from this post it should be that if you treat test speed as an organizational priority, you can make a pretty big impact.

Make sure engineers have fast computers. First off, if we had done nothing else over the past 9 years, our tests would have gotten faster because computers have gotten better over that time period. And we make sure to buy nice computers for engineers on our team because we care that tests and builds are speedy. (The M1 and M2 chips for Macs have been quite nice!)
Use Orbstack rather than Docker Desktop. The single easiest change to speed up our tests was switching from Docker Desktop to Orbstack to run containers locally. On Macs, it is so much faster than Docker Desktop. On some engineers' machines, tests run twice as fast on Orbstack. Aside from speed, we've found that it's been more stable for folks—fewer randomly missing volumes and needing to restart Docker to get things going again. We're huge fans.

That said though, it's worth noting using docker/orbstack will still slow down your tests. If we ran databases directly on our machines rather than through docker, our tests would be faster. But the extra effort to get everyone to install and maintain MySQL, Redis, MongoDB, Memcached, and everything else just isn't worth the test speed increases that it brings for us. Other organizations might have different trade-offs.
Speed up fixture resets. The slowest part of our tests is resetting fixtures. Whenever one of our tests writes to a database, we need to undo those changes before the next test starts. The core trick to doing this quickly is to only undo changes to the tables that actually changed rather than resetting every single table. All of our database operations go through the same code, so it's relatively straightforward to track which tables are "dirty" and then only reset those tables.

A few details:
- We tested out tracking things at the row level rather than the table level for resets, but it didn't improve performance. For MySQL, our basic table resetting strategy is turning off foreign key checks, truncating the table, and then using LOCAL DATA INFILE loads from a volume that's mounted into the MySQL container.
- For MongoDB resets, we found that the fastest technique was creating a shadow collection for every collection that we could restore from whenever we needed to.
- When there's a hard MySQL delete, we don't know whether it might be a cascading delete, so we have code that counts how many rows are in each table. If a table has fewer rows after the test we should reset it. And if it doesn't have fewer rows because data has also been inserted, our regular code will have marked that table as dirty.
- For MySQL updates, we have some (slightly janky) code to pull out the list of tables that might be updated by the update query when it's a query with multiple tables.
Run tests in parallel. The next important piece to fast tests is being able to run the test suite in parallel, which means we need multiple copies of our databases. This was a relatively straightforward task that took a lot of blood, sweat, and tears to actually make happen. We use mocha to run our tests which supports a --parallel option, so our tests look for MOCHA_WORKER_ID in the environment to decide which database to connect to: test_db_${MOCHA_WORKER_ID}.
Measure what's slow. Like any other optimization problem, the first step is measuring how long things actually take. Having guesses about why tests are slow can lead to a ton of wasted effort that doesn't actually move the needle. We haven't done any fancy profiling here—instead, we hook into our existing instrumentation to generate a report of where time is being spent over the course of our tests. It's not perfect, but it gives us a good enough sense of where time is going to be useful.

The future

We're proud of where our tests are, but there's still a ton of space to improve things over the next 9 years. We're going to keep writing lots of tests, so we need to make sure that those tests continue to be speedy. A few things that are on our minds:

Setting up fixture "scenarios." Currently, we always reset our fixtures back to the same base scenario. That base scenario is difficult to change because it can impact a huge number of tests if we're tweaking fixture data that is used in a lot of spots, so it'd be nice to have better support for temporarily setting up a new "base" state that multiple tests can reference.
Retrying the row-level resets. In theory, these should be faster than truncating and restoring the tables, so we want to try out better profiling to make that happen.
Improving our Redis fixture resets. Redis is fast enough that we've been lazy with our Redis fixture resets—we just flush the db and then restore it, so there's room to improve performance
Run our tests with some more detailed profiling to generate a flame-graph to see if there are any hotspots in our code that we could improve. Funnily enough, we've actually sped up our production app a few times when optimizing our tests—things that are slow in testing are often slow in production too!

Older posts