From Google Analytics to Matomo Part 1

ClassDojo is committed to keeping the data from our teachers, parents, and students secure and private. We have a few principles that guide how we handle user data:

  • We minimize the information we collect, and limit it to only what is necessary to provide and improve the ClassDojo products. Data we don’t collect is data we don’t have to secure.
  • We limit sharing data with third parties, sharing only what is necessary for them to provide a service to ClassDojo, and making sure that they abide by ClassDojo and legal requirements for the shared data.
  • We delete data that we collect when it is no longer needed to run the ClassDojo product

For a long time, we used Google Analytics on our main website, https://www.classdojo.com/. While we avoided including it in the ClassDojo signed-in product itself, we used Google Analytics to understand who was coming to our website and what pages they visited. We recently decided to remove Google Analytics from our main website and replace it with a self hosted version of https://matomo.org/.

Self-hosted Matomo allows us to improve our data policies in a number of ways:

  • We no longer share user activity and browser information with Google directly
  • We no longer use common google tracking cookies that allow Google to correlate activity on the ClassDojo website with other websites
  • We can customize data retention, ensuring data collected is deleted after it is no longer necessary for our product quality work

But there were some other requirements that we needed to verify before we migrated:

  • Would Matomo work in our infrastructure stack? We use stateless docker containers orchestrated by Hashicorp Nomad clients
  • Could we minimize the public surface area Matomo exposes for increased security?
  • Would Matomo scale for our regular usage patterns, and our occasional large spikes?
  • Could we deploy in a way where there is zero downtime maintenance?

Matomo was able to meet these needs with some configuration, and now we’re collecting millions of actions per day on our main website. We'll publish Part 2 soon where we’ll talk about how we architected Matomo to do this.

    VPCs are a pretty ubiquitous piece of cloud infrastructure for most tech stack these days. It is an important to get the IP address layout of your Virtual Private Cloud (VPC) right from the start. With far-reaching implications for scaling, fault-tolerance and security, a VPC is a great tool to separate out the wild-west of the web and your precious servers. However, once you have a VPC, you’ll need the ability to access your VPC protected servers. This is where VPNs come in.

    Our VPN journey

    ClassDojo used the OpenVPN offering for quite a few years but we are now running using Pritnul as our VPN implementation. Our OpenVPN setup had a pretty standard bastion instance setup - a box that sits on the edge of the private and public networks. Bastion instances have a network interface with an IP address in your private VPC and one exposed to the public internet. This lets clients of the VPN connect via the public IP address and then tunnel their way into the private network space - with the proper auth of course.

    More recently, we’ve moved to Pritunl. Pritunl itself builds upon the OpenVPN protocol but offers a better UI for managing servers and users. Pritnl uses Mongo to store the user registration and configuration details. Our try run on Pritunl was pretty successful.

    Here's what we learnt from our trial run

    • Installing and maintaining the Pritunl instance was pretty straightforward
    • Their client support is pretty good
    • Their support for Google auth was a big plus
    • This might be a small thing but being able to point users to the Pritunl public website to self-serve download client profiles made onboarding easier for us

    Scalability

    Pritunl features scalability out of the box via Mongo. Once we decided to move out of the trial phase, we wanted to set up multiple instances so that in case an instance goes down, our employees can continue having a seamless connectivity experience. Pritunl communicates with replica instances via MongoDB.

    Our recipe for setting up a scalable replica for Pritunl:

    • Create a MongoDB instance using MongoAtlas
      • PS: We tried using the AWS managed Mongo but it doesn’t implement some critical features like tailable cursors or capped collections which are required for Pritunl app functionality.
      • Setup VPC peering between MongoAtlas and your local VPC if it doesn’t exist
    • If you are migrating from an existing Pritunl setup like us and don’t want your employees to recreate client accounts with Pritunl, you can use mongo-dump and mongo-restore to move the existing Pritunl data into the new Mongo Instance. Make sure you pause the Pritunl instance during this time to avoid data changes while you’re migrating to the new Mongo instance.
    • Point your Pritunl instance to the new Mongo Atlas endpoint and restart it.
    • Once Pritunl is up and running, verify that you’re able to connect and use the VPN functionality as before.

    Now, if you want to scale up your Pritunl cluster, you can simply start a new server instance, install Pritunl and point it to the same Mongo endpoint. From within the Pritunl interface, remember to create a new host for the new instance and connect it to the same virtual Pritunl server instance

    If you want to look under the hood and verify that everything is working correctly, you can inspect the detailed configuration from within your Pritunl client and check that multiple public IP addresses show up - one each for your Pritunl instance. It'll look something like this:

    
    remote <Public IP address 1> <port> udp
    remote <Public IP address 2> <port> udp
    remote-random
    
    

    That’s it - this creates a scalable, replicated VPN solution based on Pritunl.

    • Engineering
    • Programming
    • Networking

    TypeScript brings power to even the humble string. Let's take a look at several ways we incorporate better string-based types into our code.

    String Constant Types

    Let's start simple. You already know about the string type, but did you know you could assign a specific string value as a type?

    type Notification = {
        type: "email";
        content: string;
    }
    

    In our Notification type, the content can be any string value, but the type must always be "email". What's the point? Why would you do this?

    For two reasons. The domain-related reason could be that you only support email notifications right now, but you want to add support for SMS later. You can ensure that, for now, your codebase can only have Notifications that are of type email. You might be tempted to leave the type field off entirely if there's only only possible value, but having a type field makes Notification open to extension and closed for modification. It'll be simple to add other notification types later.

    The other reason for literal string types is that you can use them to identify the type of an object via a discriminated union. Let's extend our example above to see how this works:

    type EmailNotification = {
        type: "email";
        content: string;
        recipientEmailAddress: string;
    }
    
    type SmsNotification = {
        type: "sms";
        content: string;
        recipientPhoneNumber: string;
    }
    
    type Notification = EmailNotification | SmsNotification;
    

    We have two specific Notification types, and a union type made from all our more specific types.

    Now, let's say we need a send function:

    function send(n: Notification) {
        // ...
    }
    

    This function takes any Notification, but it needs to do something different depending on what type of notification it is. However, since Notification is the union of our email and sms types, we only have access to the type and content fields, since those are the only fields that are shared.

    This is where that string literal type comes in handy. It allows us to discriminate between the types in the union:

    function send(n: Notification) {
        switch(n.type) {
            case "email":
                return sendEmail(n);
            case "sms":
                return sendSms(n);
            default:
                unreachable(n);
        }
    }
    
    function sendEmail(emailNotif: EmailNotification) {}
    
    function sendSms(smsNotif: SmsNotification) {}
    
    function unreachable(x: never): never {
        throw new Error("unreachable!");
    }
    

    There are a number of cool things going on here.

    First, the differentiation. As we already established, the argument n is of type Notification, so we can't access the recipientEmailAddress or recipientPhoneNumber fields. However, because the type field is a literal string (and not just the type string) for both, TypeScript can use that to narrow the type of n. That is, we can discriminate between the types in our union (Notification) by comparing n.type. This means that inside the case statements, n is now known to be an EmailNotification or SmsNotification, and we can treat it as such.

    Secondly, we're using a pattern called exhaustive switch here — that is, we're using TypeScript to guarantee that our switch statement covers all possible values for n.type. Because of the discrimination behavior, TypeScript knows that if we read the default case, there's no other possible type for n, and so it will be never. We have a little utility function that takes a never and throws an error. This performs double duty. Most obviously, it will throw an error if we ever hit the default case. But even better, we have a "compile time" error: if we add a new type to our union — say, PigeonNotification with type: "pigeon" — and we forget to add a case statement for that, then we'll get an error on our call to unreachable:

    Argument of type 'PigeonNotification' is not assignable to parameter of type 'never'.
    

    Of course, with language servers running in the editor, this compile time error becomes a "coding time" error, and we get an inline reminder to update the send function.

    String Unions

    We can use literal strings to create their own unions as well. For example:

    type NotificationType = "email" | "sms" | "pigeon";
    

    This is actually the same as using the square bracket notation on a type to get the type of a field within it:

    type NotificationType = Notification["type"];
    

    A string union is a great way to represent a discrete set of values. Types of notifications? Absolutely! How about the possible states of a job that sends notifications? You bet:

    type JobState = "enqueued" | "running" | "failed" | "complete";
    

    Or, what about the list of the names of the databases your app connects to? Yep:

    type DatabaseName = "users" | "widgets"  | "events";
    

    You can do a couple cool things with these string unions.

    First, you can create permutations of multiple unions:

    type NotificationType = "email" | "sms" | "pigeon";
    type JobState = "enqueued" | "running" | "failed" | "complete";
    
    type NotifcationJobState = `${NotificationType}${Capitalize<JobState>}`;
    

    Notice that we can use the template literal syntax to create a type. We can use this exactly like we do in regular JavaScript, where we include both literal characters and variables for replacement.

    We're also using one of TypeScript's string manipulation types, Capitalize, which capitalizes the first letter of the string. TypeScript offers four such string manipulation types:

    • Uncapitalize
    • Capitalize
    • Uppercase
    • Lowercase

    But what exactly is the result of all this? What's the resulting type NotificationJobState? Well, it's another string union, one with all permutations of the two string unions "inside" it. It's the equivalent of this:

    type NotificationJobState = "emailEnqueued" | "emailRunning" | "emailFailed" | "emailComplete" | "smsEnqueued" | ... | "pigeonComplete";
    

    Of course, the benefit of creating one type based on another is that all your types will be kept "in sync" — if you add a new notification type or job state, the new values will be part of the permutations.

    Mapping String Types

    We can use string unions to create more complex types using type mapping. Let's create a DatabaseConfigs type based on that DatabaseName string union we have above

    type DatabaseName = "users" | "widgets"  | "events";
    
    type DatabaseConfigs = {
        [key in DatabaseName]: {
            host: string;
            port: number;
            user: string;
            pass: string
        }
    };
    

    The key in OtherType syntax is the mapping. This means that an instance of DatabaseConfigs needs to have three properties matching strings in DatabaseName.

    Mapped types do save you some keystrokes, but they also improve your development experience. Let’s say we have our DatabaseConfigs instance:

    const dbConfig: DatabaseConfigs = {
        users: { ... },
        widgets: { ... },
        events: { ... }
    }
    

    If we add a new database to our app (say, orders) and we add its name to the DatabaseName string union, it will automatically become part of the DatabaseConfig type. We’ll immediately get a TypeScript error at our dbConfig object, saying that it’s missing an orders field, and reminding us to add the connection details.

    Unions with keyof

    There's another way you can create string unions: using the keyof keyword. In conjunction with another type, keyof will create a union from all the keys on that type.

    type User = {
        firstName: string;
        lastName: string;
        age: number;
        verified: boolean;
    }
    
    type UserField = keyof User;
    
    // equivalent to "firstName" | "lastName" | "age" | "verified"
    

    We can use this with type mapping and the template literal syntax to do some cool and complex stuff:

    type User = {
        firstName: string;
        lastName: string;
        age: number;
        verified: boolean;
    }
    
    type UserGetter = {
        [Key in keyof User as `get${Capitalize<Key>}`]: () => User[Key];
    }
    
    type UserSetter = {
        [Key in keyof User as `set${Capitalize<Key>}`]: (arg: User[Key]) => void;
    }
    

    We're putting keyof User inline here, but we could just as easily create an explicit type for it. We're also using the in-as syntax for mapping here, which allows us to transform the key using a template literal. In our case, we're ensuring that our UserGetter and UserSetter types will use conventional casing for their method names. These two types will make it easy for us to ensure that any time we add new fields to our User type, we'll be reminded to add the correct methods (with the correct types!) to anything implementing UserGetter and UserSetter.

    Read-Only Strings

    Let's wrap up with an interesting example of crossing the compile time and runtime boundary. We know that when TypeScript is transpiled to JavaScript, the types are stripped out. Because of this, we sometimes "know better" than the compiler. Check out this example:

    type User = {
        firstName: string;
        lastName: string;
        age: number;
        verified: boolean;
    }
    
    const u: User = {
      firstName: "Jane",
      lastName: "Doe",
      age: 50,
      verified: true
    };
    
    const keys = ["firstName", "lastName"];
    
    keys.forEach(key => {
      u[key] = u[key].toLowerCase();
    });
    

    We have an instance of our User type, and we want to iterate over an explicit subset of the keys, so we put them in an array. We can read this code and know that it works fine.

    However, TypeScript complains:

    Element implicitly has an 'any' type because expression of type 'string' can't be used to index type 'User'.
    

    The problem is that TypeScript considers keys to have the type Array<string>, which is too "wide" to be used to index into our user (u[key]). The array could include strings that aren't keys of User!

    You might think that this is the solution, because it limits the array to including only strings that are keys of User:

    const keys: Array<keyof User> = ["firstName", "lastName"];
    

    This will solve that problem, but another one pops up:

    Property 'toLowerCase' does not exist on type 'string | number | boolean'.
    

    Now we can index into the object with u[key], but we can't know for sure that we're operating on a string, since User includes non-string values.

    The cleanest way to do this is using as const:

    const keys = ["firstName", "lastName"] as const;
    
    // equivalent to
    const keys: readonly ["firstName", "lastName"] = ["firstName", "lastName"];
    

    You’ve likely used const to create a variable with an unchangeable value, but this is different. When you append as const to a value, you’re telling TypeScript that the type of this object exactly matches the object itself; that is, all the fields in the object (or items in the array) are literal values, just like in the example we started with, where Notification’s type field is a string literal type.

    In this case, it will give keys the type of a read-only tuple with exactly those two strings inside it.

    Because they're string literals, TypeScript can validate that u[key] is always a string. And because keys is constant, or read-only, trying to do something like keys.push("age") or keys[2] = "verified" would result in TypeScript throwing an error.

    One final note: you don’t need to use as const with primitive values: if no type information is given, TypeScript will infer that they are literal types.

    type NotificationType = "email" | "sms" | "pigeon";
    type JobState = "enqueued" | "running" | "failed" | "complete";
    
    type NotifcationJobState = `${NotificationType}_${JobState}`;
    
    function update(s: NotifcationJobState) {}
    
    const t = "email";
    const s = "running";
    update(`${t}_${s}`)
    

    This works because the type of t is ”email”, not string; same with s. If either had the type string, this would cause a type error.

    Conclusion

    TypeScript takes the humble string and makes it a powerful tool for well-typed applications. We use all of these techniques to reduce errors and keep our whole codebase flexible as it grows.

      Newer posts
      Older posts