;

Posts By: Andrew Burgess

TypeScript String Theory

TypeScript brings power to even the humble string. Let's take a look at several ways we incorporate better string-based types into our code.

String Constant Types

Let's start simple. You already know about the string type, but did you know you could assign a specific string value as a type?

1type Notification = {
2 type: "email";
3 content: string;
4}

In our Notification type, the content can be any string value, but the type must always be "email". What's the point? Why would you do this?

For two reasons. The domain-related reason could be that you only support email notifications right now, but you want to add support for SMS later. You can ensure that, for now, your codebase can only have Notifications that are of type email. You might be tempted to leave the type field off entirely if there's only only possible value, but having a type field makes Notification open to extension and closed for modification. It'll be simple to add other notification types later.

The other reason for literal string types is that you can use them to identify the type of an object via a discriminated union. Let's extend our example above to see how this works:

1type EmailNotification = {
2 type: "email";
3 content: string;
4 recipientEmailAddress: string;
5}
6
7type SmsNotification = {
8 type: "sms";
9 content: string;
10 recipientPhoneNumber: string;
11}
12
13type Notification = EmailNotification | SmsNotification;

We have two specific Notification types, and a union type made from all our more specific types.

Now, let's say we need a send function:

1function send(n: Notification) {
2 // ...
3}

This function takes any Notification, but it needs to do something different depending on what type of notification it is. However, since Notification is the union of our email and sms types, we only have access to the type and content fields, since those are the only fields that are shared.

This is where that string literal type comes in handy. It allows us to discriminate between the types in the union:

1function send(n: Notification) {
2 switch(n.type) {
3 case "email":
4 return sendEmail(n);
5 case "sms":
6 return sendSms(n);
7 default:
8 unreachable(n);
9 }
10}
11
12function sendEmail(emailNotif: EmailNotification) {}
13
14function sendSms(smsNotif: SmsNotification) {}
15
16function unreachable(x: never): never {
17 throw new Error("unreachable!");
18}

There are a number of cool things going on here.

First, the differentiation. As we already established, the argument n is of type Notification, so we can't access the recipientEmailAddress or recipientPhoneNumber fields. However, because the type field is a literal string (and not just the type string) for both, TypeScript can use that to narrow the type of n. That is, we can discriminate between the types in our union (Notification) by comparing n.type. This means that inside the case statements, n is now known to be an EmailNotification or SmsNotification, and we can treat it as such.

Secondly, we're using a pattern called exhaustive switch here — that is, we're using TypeScript to guarantee that our switch statement covers all possible values for n.type. Because of the discrimination behavior, TypeScript knows that if we read the default case, there's no other possible type for n, and so it will be never. We have a little utility function that takes a never and throws an error. This performs double duty. Most obviously, it will throw an error if we ever hit the default case. But even better, we have a "compile time" error: if we add a new type to our union — say, PigeonNotification with type: "pigeon" — and we forget to add a case statement for that, then we'll get an error on our call to unreachable:

1Argument of type 'PigeonNotification' is not assignable to parameter of type 'never'.

Of course, with language servers running in the editor, this compile time error becomes a "coding time" error, and we get an inline reminder to update the send function.

String Unions

We can use literal strings to create their own unions as well. For example:

1type NotificationType = "email" | "sms" | "pigeon";

This is actually the same as using the square bracket notation on a type to get the type of a field within it:

1type NotificationType = Notification["type"];

A string union is a great way to represent a discrete set of values. Types of notifications? Absolutely! How about the possible states of a job that sends notifications? You bet:

1type JobState = "enqueued" | "running" | "failed" | "complete";

Or, what about the list of the names of the databases your app connects to? Yep:

1type DatabaseName = "users" | "widgets" | "events";

You can do a couple cool things with these string unions.

First, you can create permutations of multiple unions:

1type NotificationType = "email" | "sms" | "pigeon";
2type JobState = "enqueued" | "running" | "failed" | "complete";
3
4type NotifcationJobState = `${NotificationType}${Capitalize<JobState>}`;

Notice that we can use the template literal syntax to create a type. We can use this exactly like we do in regular JavaScript, where we include both literal characters and variables for replacement.

We're also using one of TypeScript's string manipulation types, Capitalize, which capitalizes the first letter of the string. TypeScript offers four such string manipulation types:

  • Uncapitalize
  • Capitalize
  • Uppercase
  • Lowercase

But what exactly is the result of all this? What's the resulting type NotificationJobState? Well, it's another string union, one with all permutations of the two string unions "inside" it. It's the equivalent of this:

1type NotificationJobState = "emailEnqueued" | "emailRunning" | "emailFailed" | "emailComplete" | "smsEnqueued" | ... | "pigeonComplete";

Of course, the benefit of creating one type based on another is that all your types will be kept "in sync" — if you add a new notification type or job state, the new values will be part of the permutations.

Mapping String Types

We can use string unions to create more complex types using type mapping. Let's create a DatabaseConfigs type based on that DatabaseName string union we have above

1type DatabaseName = "users" | "widgets" | "events";
2
3type DatabaseConfigs = {
4 [key in DatabaseName]: {
5 host: string;
6 port: number;
7 user: string;
8 pass: string
9 }
10};

The key in OtherType syntax is the mapping. This means that an instance of DatabaseConfigs needs to have three properties matching strings in DatabaseName.

Mapped types do save you some keystrokes, but they also improve your development experience. Let’s say we have our DatabaseConfigs instance:

1const dbConfig: DatabaseConfigs = {
2 users: { ... },
3 widgets: { ... },
4 events: { ... }
5}

If we add a new database to our app (say, orders) and we add its name to the DatabaseName string union, it will automatically become part of the DatabaseConfig type. We’ll immediately get a TypeScript error at our dbConfig object, saying that it’s missing an orders field, and reminding us to add the connection details.

Unions with keyof

There's another way you can create string unions: using the keyof keyword. In conjunction with another type, keyof will create a union from all the keys on that type.

1type User = {
2 firstName: string;
3 lastName: string;
4 age: number;
5 verified: boolean;
6}
7
8type UserField = keyof User;
9
10// equivalent to "firstName" | "lastName" | "age" | "verified"

We can use this with type mapping and the template literal syntax to do some cool and complex stuff:

1type User = {
2 firstName: string;
3 lastName: string;
4 age: number;
5 verified: boolean;
6}
7
8type UserGetter = {
9 [Key in keyof User as `get${Capitalize<Key>}`]: () => User[Key];
10}
11
12type UserSetter = {
13 [Key in keyof User as `set${Capitalize<Key>}`]: (arg: User[Key]) => void;
14}

We're putting keyof User inline here, but we could just as easily create an explicit type for it. We're also using the in-as syntax for mapping here, which allows us to transform the key using a template literal. In our case, we're ensuring that our UserGetter and UserSetter types will use conventional casing for their method names. These two types will make it easy for us to ensure that any time we add new fields to our User type, we'll be reminded to add the correct methods (with the correct types!) to anything implementing UserGetter and UserSetter.

Read-Only Strings

Let's wrap up with an interesting example of crossing the compile time and runtime boundary. We know that when TypeScript is transpiled to JavaScript, the types are stripped out. Because of this, we sometimes "know better" than the compiler. Check out this example:

1type User = {
2 firstName: string;
3 lastName: string;
4 age: number;
5 verified: boolean;
6}
7
8const u: User = {
9 firstName: "Jane",
10 lastName: "Doe",
11 age: 50,
12 verified: true
13};
14
15const keys = ["firstName", "lastName"];
16
17keys.forEach(key => {
18 u[key] = u[key].toLowerCase();
19});

We have an instance of our User type, and we want to iterate over an explicit subset of the keys, so we put them in an array. We can read this code and know that it works fine.

However, TypeScript complains:

1Element implicitly has an 'any' type because expression of type 'string' can't be used to index type 'User'.

The problem is that TypeScript considers keys to have the type Array<string>, which is too "wide" to be used to index into our user (u[key]). The array could include strings that aren't keys of User!

You might think that this is the solution, because it limits the array to including only strings that are keys of User:

1const keys: Array<keyof User> = ["firstName", "lastName"];

This will solve that problem, but another one pops up:

1Property 'toLowerCase' does not exist on type 'string | number | boolean'.

Now we can index into the object with u[key], but we can't know for sure that we're operating on a string, since User includes non-string values.

The cleanest way to do this is using as const:

1const keys = ["firstName", "lastName"] as const;
2
3// equivalent to
4const keys: readonly ["firstName", "lastName"] = ["firstName", "lastName"];

You’ve likely used const to create a variable with an unchangeable value, but this is different. When you append as const to a value, you’re telling TypeScript that the type of this object exactly matches the object itself; that is, all the fields in the object (or items in the array) are literal values, just like in the example we started with, where Notification’s type field is a string literal type.

In this case, it will give keys the type of a read-only tuple with exactly those two strings inside it.

Because they're string literals, TypeScript can validate that u[key] is always a string. And because keys is constant, or read-only, trying to do something like keys.push("age") or keys[2] = "verified" would result in TypeScript throwing an error.

One final note: you don’t need to use as const with primitive values: if no type information is given, TypeScript will infer that they are literal types.

1type NotificationType = "email" | "sms" | "pigeon";
2type JobState = "enqueued" | "running" | "failed" | "complete";
3
4type NotifcationJobState = `${NotificationType}_${JobState}`;
5
6function update(s: NotifcationJobState) {}
7
8const t = "email";
9const s = "running";
10update(`${t}_${s}`)

This works because the type of t is ”email”, not string; same with s. If either had the type string, this would cause a type error.

Conclusion

TypeScript takes the humble string and makes it a powerful tool for well-typed applications. We use all of these techniques to reduce errors and keep our whole codebase flexible as it grows.

As you might know, ClassDojo offers a paid subscription for parents, to give them extra tools for interacting with both their teachers and their kids. For several years, we built and supported a system for managing those subscriptions. That’s non-trival work: it involves integrations with several app stores and payment providers and wrangling different data models for each.

In the last year, we’ve switched to a third-party solution: RevenueCat. It’s not easy switching payment systems when you’ve got several years of data in the old system, and need to keep it running until the new integration is ready to go.

I recently sat down with Urjit and Sarah, two of the engineers who led this project, and asked them about the challenges they faced and the lessons they learned as they replaced the wheels of our bus without making a pit stop. You can listen to the conversation below:

Listen to Episode 1, Building Payment Systems

Note: we're still in the process of setting up a podcast feed, so stay tuned for that!

Engineering Dojo Podcast

I recently had to change 189 files in our code base, all in almost the same way. Rather than doing it manually, I decided to brush up on my command-line text manipulation ... and ended up taking it further than I expected.

The Mission

The changes were pretty simple. In our API code, we have TypeScript definitions for every endpoint. They look something like this:

1interface API {
2 "/api/widget/:widgetId": {
3 GET: {
4 params: {
5 widgetId: MongoId;
6 };
7 response: WidgetResponse;
8 }
9 }
10}

You'll notice the params are defined twice: once in the URL key string (as :widgetId) and again in the GET attribute (under params); we are moving to a TypeScript template literal string parser to get the type information out of the URL key string itself, and so I wanted to remove the params key from these definitions. But with 189 files to change, the usual manual approach wasn't so inviting.

So, I set myself the challenge of doing it via the command line.

Step 1: Remove the lines

I'll be honest, when I started, this was the only step I had in mind. I needed to do a multi-line find-and-replace, to remove params: { ... }; a quick grep showed me that this pattern was unique to the places I wanted to change; however, I could have narrowed the set of files I was searching to just our endpoints in src/resources if necessary. For doing the replacement, I thought sed might be the right tool, but new lines can be challenging to work with ... so I ended up learning my first bit of perl to make this work.

Here's what I ended up doing (I've added line breaks for readability):

1grep -r --files-with-matches "params: {" ./src | while read file;
2 do
3 perl -0777 -pi -e 's/ *params: {[^}]*};\n//igs' "$file";
4 done

This one-liner uses grep to recursively search my src directory to find all the files that have the pattern I want to remove. Actually, I usually reach for ag (the silver searcher) or ripgrep, but grep is already available pretty much everywhere. Then, we'll loop over the files and use perl to replace that content.

Like I said, this was my first line of perl, but I'm fairly sure it won't be my last. This technique of using perl for find-and-replace logic is called a perl pie. Here's what it does:

  • 0777 means perl will read in the entire file
  • p wraps that one-liner in the conventional perl script wrapper.
  • i means that perl will change the file in place; if you aren't making this change in a git repo like I am, you can do something like i.backup and perl will create a copy of the original file, so you aren't making an irreversible change.
  • e expects an argument that is your one-line program

Oh, and the program itself:

1s/ *params: {[^}]*};\n//igs

This is typical 's/find/replace/flags' syntax, and you know how regexes work. The flags are global, case-insensitive, and single-line (where . will also match newlines).

So, this changed the 189 files, in exactly the way I wanted. At this point, I was feeling great about my change. Reviewed the changes, committed it and started the git push.

Step 2: Remove unused imports

Not so fast. Our pre-push hooks caught a TypeScript linting issue:

1error TS6133: 'MongoId' is declared but its value is never read.
2
35 import { MongoId } from "our-types";
4 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Ah, yeah, that makes sense. URL parameters are strings, but we have a MongoId type that's a branded string. I forgot about this step, but that's why we have pre-push checks! We'll need to remove those imports.

How can we do this? Well, let's get a list of the files we changed in our most recent commit:

1git show --name-only | grep ^src

We add the grep to only find the files within our top-level src directory (and to remove the commit information).

Then, we need to find all the files that include MongoId only once. If a file references MongoId multiple times, then we don't want to remove the import, because clearly we're still using it. If the file only references MongoId once, we can remove the import ... but we have to consider that it might not be the only thing we're importing on that line. For starters, grep's -c flag to count the number of occurrences per file.

1for file in $(git show --name-only | grep ^src)
2 do
3 grep -c MongoId "$file"
4 done

A simple for loop works here, because I know the only whitespace is the linebreaks between the file names. Once we have the count, we can check to see that there's only 1 match:

1for file in $(git show --name-only | grep ^src)
2 do
3 if [ $(grep -c MongoId "$file") = 1 ]; then; echo "..."; fi
4 done

We're using an if statement here, to check that the occurrence count is 1. If it is, we want to do something. But what? Remember, we might be importing multiple things on that line, so that leaves us with three possible actions:

  1. Remove the whole line when MongoId is the only item imported.
  2. Remove MongoId, when it's the first item imported on that line. Don't miss that following comma!
  3. Remove , MongoId when it's not the first item on the that line. Don't miss the preceding comma!

There are many ways we could do this, so let's have some fun with reading input from the command line! To be clear, this isn't the best way to do it. We could easily match our three cases above with perl or sed. But we've already used that pattern in this project, and reading input in a shell script is an incredibly useful tool to have in your toolbox.

At this point, we probably want to move this into an actual shell script, instead of running it like a one-off on the command line:

1#!/bin/bash
2
3for file in $(git show --name-only | grep ^src)
4 do
5 if [ $(grep -c MongoId "$file") = 1 ]
6 then
7 echo ""
8 echo "====================="
9 echo "1 - remove whole line"
10 echo "2 - remove first import"
11 echo "3 - remove other import"
12 echo ""
13 echo "file: $file"
14 echo "line: $(grep MongoId "$file" | grep -v "^//")"
15 echo -n "> "
16
17 read choice
18
19 echo "your choice: $choice"
20
21 case "$choice" in
22 1)
23 sed -i '' "/MongoId/d" "$file";
24 ;;
25 2)
26 perl -i -pe "s/MongoId, ?//" "$file";
27 ;;
28 3)
29 perl -i -pe "s/, ?MongoId//" "$file";
30 ;;
31 *)
32 echo "nothing, skipping line"
33 ;;
34 esac
35 fi
36done

Don't be intimidated by this, it's mostly echo statements. But we're doing some pretty cool stuff here.

Inside our if statement, we start by echoing some instructions, as well as the file name and the line that we're about to operate on. Then, we read an input from the command line. At this point, the script will pause and wait for us to type some input. Once we hit <enter> the script will resume and assign the value we entered to our choice variable.

Once we have determined our choice, we can do the correct replacement using the bash equivalent of a switch/case statement. For case 1, we're using sed's delete line command d. For cases 2 and 3, we'll use perl instead of sed, because it will operate only on the matched text, and not on the whole line. Finally, the default case will do nothing.

Running this script, we can now walk through the files, one by one, and review each change. It reduces our work to one keystroke per file, which is way less than opening each file, finding the line, removing the right stuff.

And that's it! While we don't use command-line editing commands every day, keeping these skills sharp will speed up your workflow when the right task comes along.

Older posts