In Part 1, we discussed why we were moving from Google analytics to Matomo. Now, in Part 2, we’ll talk about how we architected Matomo to handle traffic at ClassDojo’s scale.
We are using the Matomo official docker image to run the Matomo PHP application. The application has a number of functions and provides:
- An event ingestion endpoint (matomo.php)
- An administration and reporting interface
- Periodic jobs
- Command line tools
While a single container can perform all of these functions, they each have different performance and security characteristics and we've decided to separate them to take advantage of different configurations for each of them.
These event ingestion containers are publicly available, and contain another PHP script called index.php. This is the admin and reporting interface. We do not want to publicly expose this. Matomo can disable this interface by setting
maintenance_mode=1 in the configuration file, and we've turned it on for these containers. Additionally, we rewrite all
/ma/*.php script requests to
/ma/matomo.php, which will force everyone to the event ingestion code instead of
the admin code.
Admin and reporting
Next, we create a separate Nomad job for Matomo administration. It is deployed with a single container and haproxy will route to this container only on our internal network. Unlike the above, this one has the admin interface exposed.
Matomo configuration happens in a combination of a PHP configuration file (
config.ini.php) and in a mysql database. Changes that can be stored in the database are safe to use because they are synchronized across all running containers. But changes that are written to the config file are not, since they will only happen on the admin interface. For this reason, we set
multi_server_environment=1 in the config file, which prevents changing any setting that would write to the
config.ini.php file. Instead, these changes need to be deployed via nomad spec changes. Additionally, we turn auto updates off with
enable_auto_update=0, so that matomo instances aren't updating themselves and trying to separately migrate the mysql database.
Out of the box, Matomo does everything on the tail end of user initiated scripts. This means when a user is using the admin site, Matomo might decide that it needs to do some log archiving, or report building. Or if there are events to process in a queue, Matomo might run them at the end of ingesting an event. This isn't ideal for us as it could create undesired performance problems (an admin site that slows down unexpectedly or tracking backing up and a queue growing too large). So we have disabled these periodic jobs (archiving and queue processing) and run them separately as 2 more Nomad periodic jobs. One job is for processing a queue of incoming events, and the second is for archiving our event databases
By default, Matomo writes event entries directly to the database, but at our scale, we want to write to a fast queue, and then batch process the queue into the database. This lets us handle database failovers and upgrades, but also provides slack for when there is a spike in traffic. It also lets us run long queries on the admin site without worrying about impacting the incoming events. Matomo provides a QueueProcessing plugin that moves event ingestion to write to a redis queue. This is fast and reliable and can be processed out of band so that event ingestion can continue while DB maintenance happens.
At first, we ran the queue processing job every minute as a Nomad periodic job. At our scale, we were not able to process the full queue in each minute, and events were backing up in the queue throughout the day. This caused delays in data showing up in matomo, but also we were running out of memory in Redis.
We changed from a periodic job to a long running job that runs multiple queue workers (8 right now) by setting the
numQueueWorkers setting in the QueuedTracking plugin. It’s important to remember to set both the
numQueueWorkers setting and to create the same number of simultaneous queue worker jobs.
Matomo stores each event individually, but also contains aggregate reports (today, this week, this month, last month, this year, etc). To build those reports, Matomo runs an "archive" process. This job runs once a day as a Nomad periodic job.
We are happy with how we designed the Matomo architecture, but it took some time to get container configuration working. We’ll talk about this in Part 3.
ClassDojo's CTO, Dom enjoys refactoring, continuous delivery, and long walks on the beach with potential engineering candidates. He previously co-founded Wikispaces.