Red and Blue Function Mistakes in JavaScript

Bob Nystrom's What Color is Your Function does an amazing job of describing why it can be painful when programming languages have different rules for calling synchronous and asynchronous functions. Promises and async/await have simplified things in JavaScript, but it's still a language with "red" (async) and "blue" (sync) functions, and I consistently see a few understandable errors from red vs. blue function confusion. Let's go through some of the most common mistakes – none of these are bad things to get wrong, they're just a symptom of how confusing this can be!

Omitting await from try/catch blocks

The most common mistake I see is omitting await from try/catch blocks with async functions. The code looks reasonable, but the catch block will only be able to catch synchronously thrown errors. To make matters worse, error handling logic is often less well tested than the happy path when everything works, which makes this pattern more likely to sneak its way into production code.

async function throwsError () {
  throw new Error("alas! an error");
}

try {
  return throwsError();
} catch (err) {
  console.error("Oh no! This catch block isn't catching anything", err);
}

An async function that throws is the equivalent of a Promise.reject, and when written that way, it's a bit clearer what's going on:

try {
  return Promise.reject(new Error("alas! an error"));
} catch (err) {
  console.error("It's clearer that this `catch` can't catch that `Promise.reject`. This is equivalent to the earlier code");
}

Personally, I'm starting to wonder whether using try and catch blocks at all is a mistake when dealing with async code. They take up space and don't offer the same pattern matching that a library like Bluebirdjs can add to catch when you only want to catch some specific known errors: await tryThing().catch(NotFoundErrorClass, handleErrPattern) feels substantially cleaner to me than the equivalent try/catch block.

Array.filter(async () => false)

In recent years, JavaScript has added lots of useful Array methods like filter, map, forEach, and flatMap, and JavaScript programmers often use libraries like lodash to write functional code rather than writing for loops. Sadly, none of those Array methods or lodash helpers work with red async functions and are a common source of coding errors.

const things = [true, false, 1, 0, "", new Date("not a date") - 0];
const filteredThings = things.filter(async (thing) => thing);

How many things do we end up with in filteredThings? Surprisingly, the answer has little to do with JavaScript type coercion: filteredThings will be the same size as things. An async function returns a Promise and even a Promise that resolves to false is still a truthy value: Boolean(Promise.resolve(false)) === true. If we want to do any sort of filtering using an async function, we need to switch out of blue sync mode and into red async mode.

(async function () {
  // You should use a library like Bluebird rather than filtering like this! this is only for illustration
  const things = [true, false, 1, 0, "", new Date("not a date") - 0];
  const predicateValues = await Promise.all(things.map(async (thing) => thing));
  const filteredThings = things.filter((_thing, i) => predicateValues[i]);
})();

When you see Array.filter(async (thing) => thing) written out like that, the mistake is pretty clear. It can be harder to notice when you see code like const goodThings = things.filter(isGoodThing); you need to check whether isGoodThing is red or blue.

Array.forEach(async...

We see a similar problem when people use Array.forEach with an async function:

const fruitStatus = {};
["apple", "tomato", "potato"].forEach(async (food) => {
  fruitStatus[food] = await isFruit(food);
});
return fruitStatus;

In some ways, this is a more dangerous pattern. Depending on when you check, fruitStatus may have some, none, or all of the correct isFruit values. If isFruit is normally fast, problems and bugs might not manifest until isFruit slows down. A bug that only shows up some of the time is much harder to debug than one that's always there.

Await off my shoulders

Despite how easy it is to make mistakes with async/await, I still love it – it feels easier to work with than Promises or callbacks. Dealing with asynchronous code is still one of the harder parts of programming in JavaScript, but tools like bluebird, the TypesScript no-unnecessary-condition rule, and the eslint promise plugin can help surface these easy-to-make red/blue function mistakes early. Hopefully, seeing the mistakes we often make will help you avoid some frustrating minutes debugging.

    Since the dawn of the internet, software engineers have found ways to effectively collaborate remotely, and over the years we've become quite good at it. If you need convincing, take a look at any large-scale open source project, and you'll find that it was likely created by people from different countries, time zones, and languages, all collaborating on a single project and working toward a shared vision.

    The fact that developers can collaborate successfully no matter where they’re located shows that coders don’t need to be tied to a specific office or time zone. Unfortunately, until recently, when the pandemic forced their hands, many companies (especially large companies) tended to frown on remote work. Full-time remote positions did exist, but they were far less common than on-site positions. And even when you found these positions, they were typically only for contract work.

    Enlightened employers now believe that people should be free to work where they’re most productive. For many of us, that's from home, but it could just as easily be from a coffee shop down the street, on a beach, or even in an RV, nestled among trees deep in the forest.

    I’ve always dreamed of being a nomad, of sailing the ocean and seeing the world, but I always found myself stuck in a particular place. Family, school, work, etc. — all were rooted to a specific geographic location that I didn't venture far from. Once I purchased a home, that became my base of operations. Trips away from home had to be planned in advance, and were usually expensive (flight, hotel, rental car, etc.). I ended up filling my home with stuff, as one does, and I found that the more stuff I kept there, the more tied down I felt ... but I yearned to be free.

    After a lot of research, I convinced myself that a software engineer like me could live and work nomadically while raising my three young children as a single parent and not go crazy (or go crazier?). In the span of a few months, I sold my house and all of the stuff that was previously tying me down, purchased an RV, moved in, and hit the road.

    I’ve been living and working full-time in an RV for a year now. Along the way, I upgraded to a larger RV, purchased a tow vehicle, and have learned many things about life on the road and being a digital nomad.

    My home is a Thor Challenger, which is a large motor coach. I tow a Jeep Wrangler, which I use as my daily driver and for going places where the coach can't. I no longer feel tied down, and I can live and work wherever the road takes me — and let me tell you, it can lead to some pretty amazing places.

    My life now is so different from what it was before, it's hard to compare things directly. Inevitably, my lifestyle has presented unique challenges, but they say each challenge is an opportunity in disguise, and the fact that I'm finally living the adventure I always dreamt of makes each challenge seem like a small bump along the journey.

    Staying connected on the road

    One of the first challenges I faced was figuring out how to have reliable internet while on the road and at campsites, regardless of where I was staying. To stay connected, I first need to know where I plan to be during the work week. I usually don't travel much Monday through Friday, which makes planning easier.

    Before I travel anywhere, I check the cell phone signal strength for an area by looking at coverage maps so that I know what internet options will be available to me once I'm there. OpenSignal.com is the best website I have found so far for checking cellular signal strength in different areas.

    If I'm staying somewhere with a strong cell signal, I use my phone hotspot as my primary internet. It's not terribly fast, but it's reliable and unlimited thanks to my plan through Visible. While this plan is affordable (only $50/month), unfortunately it’s for mobile devices only, and can't be used for dedicated hotspot devices.

    If I'm outside of town or I need a fast connection, I use a Winegard ConnecT 2.0 4G2 system as my primary internet connection. The Winegard system sits on top of my RV and pulls in even the faintest 3G or 4G signal thanks to its high-gain antenna. Once a signal is captured, it's rebroadcast via three Wi-Fi routers spread throughout the coach. A hotspot-specific data plan is required to use this device, and many hotspot plans have 100GB data caps, so I use the Winegard system sparingly.

    If I'm staying at a place with Wi-Fi, the Winegard can use that signal as its source instead of cellular. Since Wi-Fi at RV parks and campgrounds is notoriously spotty, having a device that can rebroadcast a weak signal is essential for staying connected.

    Advice from an experienced nomad

    Nomadic life is increasingly popular. Younger generations are drawn to it as an alternative to the expensive housing and rental markets, and older generations are drawn to it as a way to have a life of adventure while living comfortably within their means during retirement. New products and services are becoming available to support digital nomads. For example, satellite internet (Starlink in particular) is becoming more viable as a mobile internet solution, and may replace cellular-based hotspot devices. For more information on the latest in mobile internet, RVMobileInternet.com is the best resource I have found.

    If you’re thinking about life as a digital nomad, I hope that hearing about my solutions to staying connected will help you to do the same, no matter where the road takes you. And as always, feel free to reach out to me if you have any questions. Happy travels!

      DataOps has always been the vision for the ClassDojo team, even if we didn’t think of it using that term. Our goal has been to give every vertically integrated team ownership over its own metrics, tables, and data dictionaries, and the processes for generating those artifacts. These things should be written mostly in code, with some generated documentation.

      But sometimes vision clashes with reality. Our old system grew organically, and every team had a separate set of metrics and queries for which it was responsible. We used some standard technologies to extract and load data from sources like in-house databases and cloud applications into Amazon Redshift and a Jenkins job that ran transformation queries in serial. In theory this should have empowered every team to write their own transformation queries and add their own replication sources.

      But because the system had been built piecemeal and ad hoc, it was a massive mess. By early 2021, the number of queries in the Jenkins job had ballooned to more than 200, providing at least seven different measures of user engagement, while configuration management for the upstream pipelines languished with no ownership or improvement. From an engineering perspective, the data platform was some of the most unpleasant pieces of the ClassDojo codebase to work on. Though individual queries were straightforward and easy to debug, the platform itself had a number of performance issues that were difficult to understand.

      We knew the old system was unsustainable, but we limped along until a catastrophic outage forced us to plan a major redesign. Though we still hadn’t crystallized around the term DataOps, we knew the system we planned had to fulfill the vision of a self-serve platform that empowered engineers and analysts to make changes without hand-holding.

      Thus, a team of interested parties coalesced into a dedicated data team with engineering resources, and we began our months-long journey towards building the ClassDojo DataOps platform.

      Find a Partner

      To build a platform that actually conformed to our vision, we needed to completely redo the foundation. Unfortunately, the foundation was also the part of the system we had the least expertise with.

      We chose a two-pronged approach to solve this problem. First, we redefined some roles. We split the job of data engineer into two roles — a data infrastructure engineer and an analytics engineer. One was in charge of maintaining the data platform, the other was in charge of understanding and fulfilling the business use cases for data.

      Second, we searched for a partner that was aligned with our vision and had the track record needed to build the platform. We found it in Mutt Data, a small team of data experts that specialize in building out DataOps and ML pipelines for larger companies. Though we don’t take advantage of their ML expertise, we have been able to lean on their vast knowledge of how to build data tooling.

      Together we were able to mold our vision into something actionable.

      Define Requirements

      The Mutt team were the ones who introduced us to the term DataOps. Defining what DataOps meant let us create requirements for what our system should be: a platform that includes standard data technologies with proven records of reliability and performance, where the most common use cases should be written in as little code as possible.

      The outcome of our talks was a roadmap with concrete milestones and tasks.

      Pay Down Technical Debt

      First, though, we had to pay down our technical debt. As a general rule, startups lack the luxury of sitting down and planning to build something “right” from the start. For the sake of finding product market fit, rushing to get something out to market, or just a shrinking runway, it just doesn’t make sense for a growing, evolving company like ours to plan for a future that may not exist.

      Unfortunately for us, data was a major debt item. The old system grew to meet needs instead of being built with specific requirements, and was developed only to fulfill the bare minimum of enabling reporting. Yet despite its flaws, and as much as we wanted to rid ourselves of the whole mess, it was our only reporting system, and thus had to function even as we rewrote the platform underneath.

      Thus we spent the first few months of the rebuild dealing with performance of the old system and picking the pieces of old code to migrate.

      Migrate Workflow Management to Airflow

      As part of the migration, we set a goal of moving the transformation pipeline off of Jenkins and onto Airflow. That Amazon had a hosted Airflow service at the time was a huge bonus. While Jenkins is a competent cron runner with a log, Airflow is considered a data engineering standard. It offers a lot of flexibility, and new data hires are able to quickly be productive in its ecosystem.

      We marked a number of queries for migration from Jenkins while axing some lesser-used ones to free up time for the mission-critical jobs. Most of these queries were pretty straightforward; others needed more attention.

      Build a Data Lake

      Despite the fact that our stabilization work had caused our transformation pipeline to finish in record time, some long-running queries were still taking longer than two hours to execute. We targeted the top 10 longest-running queries and migrated the input and result tables from Redshift into a data lake consisting of Amazon S3, Amazon Athena, and AWS Glue. The results were dramatic. Two-hour runtimes were cut to five minutes.

      We were then able to take advantage of Glue and Amazon Redshift Spectrum to use the data as though it was in native Redshift tables. Though there was a bit of performance hit, it was good enough for most of our use cases.

      Create Anomaly Detection

      As with most companies, we have a product event stream that’s used to monitor feature usage and general business health. This event stream is the bedrock for all our major KPIs and downstream tables. For such a mission-critical piece of our business, we had shockingly little validation to be confident in its accuracy.

      To validate our event stream, we added anomaly detection monitors to detect breakages in upstream-pipelines. These alarms forecast row counts using a FOSS project created by our partners called SoaM (Son of a Mutt). It’s especially useful for Dojo since our event patterns are very seasonal.

      Once we had confidence in our event streams, we were able to move on to augmenting our downstream processes.

      Add dbt

      Dbt is a popular tool for data analysts. It functions like a souped-up version of the data transformation pipeline that we had in Jenkins in that it allows users to write SQL queries without having to worry about the more technical details underneath. This is really useful for our PMs and analysts who don’t (and shouldn’t) write Python.

      But dbt has a lot of additional benefits for power users, like snapshotting and built-in incremental loads. On top of that, engineers get to take advantage of built-in unit tests, and the organization as a whole gets to take advantage of auto-generated documentation.

      Augment with Great Expectations

      Dbt unit tests are great, but we also wanted the option to add more complex validation where we could write simple assertions that are hard to translate into SQL. We got this with Great Expectations, a tool for validating, documenting, and profiling data. We found that we could hang a Great Expectations operator off of a dbt operator and gain both quick unit tests and more complex assertions. We could then upload the validation results to S3 and view them on a monitoring dashboard.

      Migrate to Airbyte

      We briefly touched on our upstream extraction pipelines. The old setup used data pipelines and some home-rolled technologies to replicate data from production databases into Redshift. Though the solutions worked well enough, they had no maintainers and a bit of stigma surrounding them.

      The 2020 project Airbyte has been making a splash in data engineering circles. It promises easy loading between different data sources with a GUI and easy Airflow integration. Since it’s a newer project, we’ve been having some trouble integrating it with our existing technology stack, but the vision of a world where all upstream pipelines would be in the same place using a well-supported technology was too tantalizing to pass up.

      We’ve tested output from Airbyte and are in the process of migrating existing pipelines off of an AWS data pipeline and onto Airbyte.

      Throw in Some Easy Rollback

      One of our core values here at ClassDojo is that failure recovery is more important than failure prevention. We hold this value to allow us to move fast without fear of failure. This means that building robust disaster recovery mechanisms for all of our major processes is a requirement for our platforms.

      While we needed to build a few extra disaster prevention tools and processes as is natural with a stateful system, we’ve hewn to this value by building CI/CD tools that allow us to delete entire date ranges of data and backfill.

      Tie It All Together

      While most of these technologies and techniques are standard, each needs to be configured and toggled. To make a self-serve platform for both engineers and non-engineers, there needs to be some connective tissue that covers the most important use cases and allows for them to occur with as little code as possible.

      Our final contribution to our DataOps platform was to build a Python layer that would detect and parse a short YAML configuration file and translate it into an Airflow DAG that has input sensors, a dbt transformation process, and optional tests and expectations. If a user doesn’t want to do anything complicated, they never need to write a line of Python.

      Looking Forward

      We’re proud of our new platform, but world-class data infrastructure means nothing if the data it manipulates isn’t leveraged. To make sure that happens, our data infrastructure engineering team hands off responsibility to our analytics engineering team. Their job is to mold our terabytes of raw data into a properly modeled star schema that gives the business a standard set of tables they can draw from for their reporting needs, which in turn aids us in our mission of creating a world-class educational experience that’s also loved by kids.

      There has never been a more exciting time to be a part of the ClassDojo data organization. The problems are challenging, but there’s a clear path forward and plenty of support along the way. If you find the prospect of building the foundation for a business exciting, then join us by checking our jobs page and applying!

        Newer posts
        Older posts