From Google Analytics to Matomo Part 2

In Part 1, we discussed why we were moving from Google analytics to Matomo. Now, in Part 2, we’ll talk about how we architected Matomo to handle traffic at ClassDojo’s scale.

Architecture

We are using the Matomo official docker image to run the Matomo PHP application. The application has a number of functions and provides:

  • An event ingestion endpoint (matomo.php)
  • An administration and reporting interface
  • Periodic jobs
  • Command line tools

While a single container can perform all of these functions, they each have different performance and security characteristics and we've decided to separate them to take advantage of different configurations for each of them.

Event ingestion

First, we deployed an auto scaling multi container deployment to our Nomad clients. The containers serve matomo.js (a static javascript file) and matomo.php (the main php executable for event ingestion), and we route to these URLs via HAProxy.

These event ingestion containers are publicly available, and contain another PHP script called index.php. This is the admin and reporting interface. We do not want to publicly expose this. Matomo can disable this interface by setting maintenance_mode=1 in the configuration file, and we've turned it on for these containers. Additionally, we rewrite all /ma/*.php script requests to /ma/matomo.php, which will force everyone to the event ingestion code instead of the admin code.

Admin and reporting

Next, we create a separate Nomad job for Matomo administration. It is deployed with a single container and haproxy will route to this container only on our internal network. Unlike the above, this one has the admin interface exposed. Matomo configuration happens in a combination of a PHP configuration file (config.ini.php) and in a mysql database. Changes that can be stored in the database are safe to use because they are synchronized across all running containers. But changes that are written to the config file are not, since they will only happen on the admin interface. For this reason, we set multi_server_environment=1 in the config file, which prevents changing any setting that would write to the config.ini.php file. Instead, these changes need to be deployed via nomad spec changes. Additionally, we turn auto updates off with enable_auto_update=0, so that matomo instances aren't updating themselves and trying to separately migrate the mysql database.

Periodic jobs

Out of the box, Matomo does everything on the tail end of user initiated scripts. This means when a user is using the admin site, Matomo might decide that it needs to do some log archiving, or report building. Or if there are events to process in a queue, Matomo might run them at the end of ingesting an event. This isn't ideal for us as it could create undesired performance problems (an admin site that slows down unexpectedly or tracking backing up and a queue growing too large). So we have disabled these periodic jobs (archiving and queue processing) and run them separately as 2 more Nomad periodic jobs. One job is for processing a queue of incoming events, and the second is for archiving our event databases

Queue Processing

By default, Matomo writes event entries directly to the database, but at our scale, we want to write to a fast queue, and then batch process the queue into the database. This lets us handle database failovers and upgrades, but also provides slack for when there is a spike in traffic. It also lets us run long queries on the admin site without worrying about impacting the incoming events. Matomo provides a QueueProcessing plugin that moves event ingestion to write to a redis queue. This is fast and reliable and can be processed out of band so that event ingestion can continue while DB maintenance happens.

At first, we ran the queue processing job every minute as a Nomad periodic job. At our scale, we were not able to process the full queue in each minute, and events were backing up in the queue throughout the day. This caused delays in data showing up in matomo, but also we were running out of memory in Redis. We changed from a periodic job to a long running job that runs multiple queue workers (8 right now) by setting the numQueueWorkers setting in the QueuedTracking plugin. It’s important to remember to set both the numQueueWorkers setting and to create the same number of simultaneous queue worker jobs.

Archiving

Matomo stores each event individually, but also contains aggregate reports (today, this week, this month, last month, this year, etc). To build those reports, Matomo runs an "archive" process. This job runs once a day as a Nomad periodic job.

We are happy with how we designed the Matomo architecture, but it took some time to get container configuration working. We’ll talk about this in Part 3.

    ClassDojo is committed to keeping the data from our teachers, parents, and students secure and private. We have a few principles that guide how we handle user data:

    • We minimize the information we collect, and limit it to only what is necessary to provide and improve the ClassDojo products. Data we don’t collect is data we don’t have to secure.
    • We limit sharing data with third parties, sharing only what is necessary for them to provide a service to ClassDojo, and making sure that they abide by ClassDojo and legal requirements for the shared data.
    • We delete data that we collect when it is no longer needed to run the ClassDojo product

    For a long time, we used Google Analytics on our main website, https://www.classdojo.com/. While we avoided including it in the ClassDojo signed-in product itself, we used Google Analytics to understand who was coming to our website and what pages they visited. We recently decided to remove Google Analytics from our main website and replace it with a self hosted version of https://matomo.org/.

    Self-hosted Matomo allows us to improve our data policies in a number of ways:

    • We no longer share user activity and browser information with Google directly
    • We no longer use common google tracking cookies that allow Google to correlate activity on the ClassDojo website with other websites
    • We can customize data retention, ensuring data collected is deleted after it is no longer necessary for our product quality work

    But there were some other requirements that we needed to verify before we migrated:

    • Would Matomo work in our infrastructure stack? We use stateless docker containers orchestrated by Hashicorp Nomad clients
    • Could we minimize the public surface area Matomo exposes for increased security?
    • Would Matomo scale for our regular usage patterns, and our occasional large spikes?
    • Could we deploy in a way where there is zero downtime maintenance?

    Matomo was able to meet these needs with some configuration, and now we’re collecting millions of actions per day on our main website. We'll publish Part 2 soon where we’ll talk about how we architected Matomo to do this.

      VPCs are a pretty ubiquitous piece of cloud infrastructure for most tech stack these days. It is an important to get the IP address layout of your Virtual Private Cloud (VPC) right from the start. With far-reaching implications for scaling, fault-tolerance and security, a VPC is a great tool to separate out the wild-west of the web and your precious servers. However, once you have a VPC, you’ll need the ability to access your VPC protected servers. This is where VPNs come in.

      Our VPN journey

      ClassDojo used the OpenVPN offering for quite a few years but we are now running using Pritnul as our VPN implementation. Our OpenVPN setup had a pretty standard bastion instance setup - a box that sits on the edge of the private and public networks. Bastion instances have a network interface with an IP address in your private VPC and one exposed to the public internet. This lets clients of the VPN connect via the public IP address and then tunnel their way into the private network space - with the proper auth of course.

      More recently, we’ve moved to Pritunl. Pritunl itself builds upon the OpenVPN protocol but offers a better UI for managing servers and users. Pritnl uses Mongo to store the user registration and configuration details. Our try run on Pritunl was pretty successful.

      Here's what we learnt from our trial run

      • Installing and maintaining the Pritunl instance was pretty straightforward
      • Their client support is pretty good
      • Their support for Google auth was a big plus
      • This might be a small thing but being able to point users to the Pritunl public website to self-serve download client profiles made onboarding easier for us

      Scalability

      Pritunl features scalability out of the box via Mongo. Once we decided to move out of the trial phase, we wanted to set up multiple instances so that in case an instance goes down, our employees can continue having a seamless connectivity experience. Pritunl communicates with replica instances via MongoDB.

      Our recipe for setting up a scalable replica for Pritunl:

      • Create a MongoDB instance using MongoAtlas
        • PS: We tried using the AWS managed Mongo but it doesn’t implement some critical features like tailable cursors or capped collections which are required for Pritunl app functionality.
        • Setup VPC peering between MongoAtlas and your local VPC if it doesn’t exist
      • If you are migrating from an existing Pritunl setup like us and don’t want your employees to recreate client accounts with Pritunl, you can use mongo-dump and mongo-restore to move the existing Pritunl data into the new Mongo Instance. Make sure you pause the Pritunl instance during this time to avoid data changes while you’re migrating to the new Mongo instance.
      • Point your Pritunl instance to the new Mongo Atlas endpoint and restart it.
      • Once Pritunl is up and running, verify that you’re able to connect and use the VPN functionality as before.

      Now, if you want to scale up your Pritunl cluster, you can simply start a new server instance, install Pritunl and point it to the same Mongo endpoint. From within the Pritunl interface, remember to create a new host for the new instance and connect it to the same virtual Pritunl server instance

      If you want to look under the hood and verify that everything is working correctly, you can inspect the detailed configuration from within your Pritunl client and check that multiple public IP addresses show up - one each for your Pritunl instance. It'll look something like this:

      
      remote <Public IP address 1> <port> udp
      remote <Public IP address 2> <port> udp
      remote-random
      
      

      That’s it - this creates a scalable, replicated VPN solution based on Pritunl.

      • Engineering
      • Programming
      • Networking
      Newer posts
      Older posts