Node.js `Array.includes` is faster than you'd expect

I've done a few Advent of Code problems, and one thing that's occasionally tripped me up has been my erroneous assumption that accessing the key of an object is fast. It's a fast constant time operation, but whatever hashing node is doing internally to find a key can sometimes be a performance hot-spot. There's a massive difference in speed between Array[index] and Object[key], and that made me start to wonder whether Object[key] was slow enough that there were situations where it'd make more sense to use Array.includes instead.

When I benchmarked it, I was pretty astounded to see that on a sorted array, Array.includes is faster on arrays of 10,000 strings than Object[key]. Internally, I'm guessing Array.includes is using binary-search when it knows that the array is sorted, which is a cool optimization! Here's the gist if you want to test it out. (I'm running these tests on Node 14 on a Mac Mini)

What about unsorted arrays?

When the array is unsorted, Object[key] look-ups start beating Array.includes pretty quickly: once we have ~20 documents, we'd expect to see faster look-ups from keyed object access. This is much more in line with what I expected when setting up this test.

What about Set.has?

Set.has is consistently slower than Object[key]. It can handle more cases than Object[key] can, but from a speed perspective, it's not in the running. I still often use it over Object[key] because I think it does a better job of expressing the intent to someone who's reading the code, and it's still a fast-enough constant time look-up.

What does this all mean?

Absolutely nothing! I still use Set.has for the majority of situations where I need to check membership in a list. For most of the code I write, expressing clear intent is far more important than optimizing performance, and I have yet to profile non-Advent-of-Code performance and find Object[key] or Set.has being a hotspot.

Benchmark Results

110 object access x 25,715,807 ops/sec ±0.70% (94 runs sampled)
2 10 set x 18,422,013 ops/sec ±0.85% (96 runs sampled)
3 10 array x 28,105,533 ops/sec ±0.88% (89 runs sampled)
4Fastest for ordered 10 is array
5 40 object access x 20,175,473 ops/sec ±0.90% (92 runs sampled)
6 40 set x 19,632,409 ops/sec ±0.73% (92 runs sampled)
7 40 array x 27,999,865 ops/sec ±1.09% (94 runs sampled)
8Fastest for ordered 40 is array
9 100 object access x 21,842,706 ops/sec ±0.70% (95 runs sampled)
10 100 set x 18,283,045 ops/sec ±0.78% (95 runs sampled)
11 100 array x 27,860,691 ops/sec ±1.03% (94 runs sampled)
12Fastest for ordered 100 is array
13 1000 object access x 20,026,184 ops/sec ±0.82% (92 runs sampled)
14 1000 set x 14,656,626 ops/sec ±1.37% (89 runs sampled)
15 1000 array x 23,392,774 ops/sec ±1.31% (90 runs sampled)
16Fastest for ordered 1000 is array
17 10000 object access x 14,367,754 ops/sec ±0.58% (93 runs sampled)
18 10000 set x 12,213,161 ops/sec ±0.51% (95 runs sampled)
19 10000 array x 28,067,504 ops/sec ±0.94% (92 runs sampled)
20Fastest for ordered 10000 is array
21 100000 object access x 9,106,145 ops/sec ±1.55% (93 runs sampled)
22 100000 set x 7,918,164 ops/sec ±0.92% (91 runs sampled)
23 100000 array x 78,374 ops/sec ±58.42% (86 runs sampled)
24Fastest for ordered 100000 is object access
25 1000000 object access x 3,296,346 ops/sec ±1.05% (93 runs sampled)
26 1000000 set x 2,076,943 ops/sec ±1.24% (91 runs sampled)
27 1000000 array x 490 ops/sec ±3.50% (40 runs sampled)
28Fastest for ordered 1000000 is object access
29 10 object access x 24,869,178 ops/sec ±0.80% (91 runs sampled)
30 10 set x 19,181,356 ops/sec ±0.98% (92 runs sampled)
31 10 array x 27,691,570 ops/sec ±0.99% (90 runs sampled)
32Fastest for random 10 is array
33 40 object access x 27,450,529 ops/sec ±1.30% (86 runs sampled)
34 40 set x 12,483,804 ops/sec ±1.30% (91 runs sampled)
35 40 array x 16,590,091 ops/sec ±0.92% (92 runs sampled)
36Fastest for random 40 is object access
37 100 object access x 31,040,999 ops/sec ±1.24% (87 runs sampled)
38 100 set x 12,386,070 ops/sec ±1.14% (91 runs sampled)
39 100 array x 16,389,384 ops/sec ±1.22% (91 runs sampled)
40Fastest for random 100 is object access
41 1000 object access x 25,365,248 ops/sec ±1.52% (88 runs sampled)
42 1000 set x 11,275,049 ops/sec ±1.51% (91 runs sampled)
43 1000 array x 16,294,268 ops/sec ±1.23% (88 runs sampled)
44Fastest for random 1000 is object access
45 10000 object access x 16,907,288 ops/sec ±1.47% (89 runs sampled)
46 10000 set x 8,905,038 ops/sec ±1.47% (91 runs sampled)
47 10000 array x 14,777,156 ops/sec ±1.19% (91 runs sampled)
48Fastest for random 10000 is object access

Software developers need to prioritize bugs when they are discovered, and most do this with an ordered priority scheme. Tracking tools like Jira provide a column on each ticket, and many organizations use numbers, such as P1, P2, P3, P4, and P5. Some even adopt a P0 as an indicator that there are some bugs that have an even higher priority than P1. Often there will be criteria for each of them, for example “all issues preventing users from logging in are P1.”

These systems are good at ranking issues against each other, but don’t on their own communicate what to do with the bugs themselves. When should they get worked on? How important are they to my current work? Just like the criteria for getting assigned a priority, companies define the response to the priority: “Drop everything for a P1” vs “We won’t fix a P5.”

At ClassDojo, we’ve found a few problems with systems like this:

  • Despite defined criteria, there’s lots of debate around the categorization: “Is this really a P3 or more of a P4?”
  • Despite defined criteria for work, there’s lots of debate around prioritization: “It’s a P1, but let’s finish the current set of work before we fix it, instead of doing it now.”
  • There’s a never-ending backlog of items that someone periodically goes through and deletes because either because they’ve grown stale, or because we don’t know if they are still valid, or because we decide they aren't important enough to work on.

So how did we address this problem? We came up with a bug categorization system called P-NOW, P-NEXT, P-NEVER. Here’s how it works:

  • P-NOW represents something that is urgent to fix. These are critical issues that are impacting huge numbers of users, or critical features. The whole application being down is obviously a P-NOW, but so is something like not being able to log in on Android, or a critical security vulnerability, or a data privacy problem
  • P-NEXT represents all other bugs that we agree to fix. If this is something that has a real impact on people, it’s a P-NEXT
  • P-NEVER is everything we won’t fix. We’re honest about it up front, there’s no need to pretend we’ll get to the P5s when we’re done with the P4s, because that’s not going to happen before the bug report itself is invalid.

So with those criteria, how do we prioritize this work? It’s right in the name. P-NOW means stop everything and work on this bug. It means wake the team up in the middle of the night, get people in on the weekend, keep asking for help until someone is able to fix it.

It also means once it is fixed we have a postmortem, and as part of the postmortem we find a way to make sure a bug or outage like this never happens again. Nobody likes being woken up for work, and it’s inhumane to keep expecting people on call to do this work. The results of this postmortem are all categorized as P-NEXT issues.

P-NEXT issues are worked on as soon as we’re done with our current task. They go at the top of the prioritized queue, and whenever we’re done with our work, we take on the oldest P-NEXT. In this way we work through all of the bugs that we intend to fix now. This is an extension of our definition of done. The work that we released previously had a defect, so we need to fix that defect before we move on to new work.

Lots of people will be screaming about how you’ll never get anything new done, or how non-pragmatic this is. Remember, P-NEXT doesn’t mean we fix every bug. We are just going to take the ones we don’t intend to fix and categorize them as a P-NEVER.

When do we work on P-NEVER issues? Never! Well, not exactly. We don’t put them on a board, and we don’t track them in a backlog. We don’t want to maintain an inventory of things we don’t intend to work on. That inventory takes away from our focus on higher priorities and it requires someone to periodically look through the list.

But, if a one-off issue starts to pop up again then we rethink our decision. This is fine, and with our P-NEXT policy of cutting to the top of the line, it results in these bugs being fixed earlier than they would have otherwise.

But maybe people are still screaming that they’ve correctly categorized their P-NEXT issues and there are still too many to get anything done. This system is a great way to drive quality improvements in your code and process. When you prioritize all the bugs you are going to fix first, and then work to fix them, you’re working towards a zero-defect policy.

Does this sound like a better way to prioritize? Come join us in ClassDojo Engineering, we’re hiring!

You may have read our post from a few years ago implementing a rolling-window rate limiter, where we talked about the implementation of our sliding log rate limiter. That approach has been working well since then, but with increases in traffic, we recently found ourselves rebuilding the rate limiter to improve performance. With some relatively small changes, we were able to improve our redis CPU usage by 40x!

Our requirements haven't really changed too much since that last post. We still want a rolling window. And we still need a distributed solution, which means we're sticking with redis. The final requirement from last time was having a minimum distance between requests; turns out, this hasn't been important for us. However, if that's something you need, do stay tuned until the end!

The Trouble with Sets

The challenge with our rate limiting approach (you can find the details in our previous post) was mainly caused by the expensive set operations that we were doing in redis: zrange and zadd have a time complexity of O(log(n)), which isn't bad, but zremrangebyscore is O(log(N) + M)! This operation removes M number of items from a sorted set, and in our case, we were using it to remove all request records from greater than one interval ago. So, as a user's request rate gets higher, M will get higher as well ... and instead of protecting us, the rate limiter could actually assist a single IP address in executing a successful DoS attack! As it was, our monitoring alerted us before that, but it still resulted in requests queuing up and timing out in our API and upstream in HAProxy.

The nice thing about the sorted-set-based rate limiting is that it has very "high resolution" rate limiting: it's actually tracking request timestamps, and so we can know exactly how many requests have happened in the given time interval. This means that we can know that, say, it's been exactly one interval since your last request, and let your new request go through. However, due to these performance issues, we decided to go with an approach that optimizes performance, while accepting a lower resolution result.

Sliding Windows

The solution we've switched to is called a sliding window, or sliding counter, algorithm. Instead of storing a sorted set in redis with a rateId key, we're storing simple integers in redis with a rateId-interval key.

The rateId is some way to identify or bucket the actions we're trying to limit. If it's a global rate limit, that might be the client IP address. If it's resource specific, it might be both the client IP address and the path, like for example.

In our new approach, here's what happens when a request occurs:

  1. Based on the request, we can determine two keys: for the previous interval and for the current interval.
  2. We GET the from redis (if it does not exist, we will default to 0).
  3. We INCR the from redis (if it does not exist, redis will default to 0, then increment it, and then return 1).
  4. We sum these two counts together, multiplying the previous count by a weight. For example, if we're 42% of the way through the current interval, we'll multiply the previous count by 100% - 42% = 58%, or 0.58. The result is the estimate of the number of actions taken by that user in the last interval.

Essentially, we're using the previous and current interval counts to create a "synthetic" interval. And you can see how this is lower-resolution: we don't know the exact time of past requests, only the interval. This means that a user may have to wait up to 2 whole intervals to be able to max out in new requests.

For example, let's say your limit is 100 requests per 1 minute. And let's say a user made 100 requests in the first 0.25 minutes of an interval, then stopped, and didn't make another request for 1 minute. At 1.25 minutes (0.25 minutes into the second interval) our previous interval count will be 100, and our current interval count will be 1 (because we just incremented it for the current request). So our "number of actions taken in the last interval" will be 1 + (0.75*100) or 76. We would only allow the user to make 24 more requests (and the current one would succeed, so 25 in total).

If the user waited until the 1.75 minutes, the math would be 1 + (0.25*100) and we would allow 74 (plus the 1 current) requests.

1T0 T1 T2
3 | | |
4 100 reqs 25 reqs allowed 75 reqs allowed

Of course, this low resolution can work against us as well. Let's say those initial 100 requests had come in right at the end of the first interval, say, at 0.99 minutes. At 1.25 minutes (only ~16 seconds later), we'd still allow 25 requests.

Is that acceptable? Well, depends on your infrastructure and application. But, there is a way to tweak this.

Tweaking the Resolution

Notice that this produces the lowest possible resolution for a sliding window approach, because we track one counter per interval; this is why our practical interval is twice as long as we actually configure.

We could improve the resolution here by tracking multiple counters per interval. For example, If your limit is 100 requests per 1 minute, you could track intervals of 30 seconds in redis. Then, for a given request, you'll get the current interval (n) and the previous two intervals (n-1 and n-2), and weight the n-2 count in the same way. Try this math with the first example above, and you'll see that at 1.25 minutes, we would allow 50 requests instead of just 25.

Or, in our second example, if those 100 request came in at 0.99 minutes, we would still be blocking all requests at 1.25 minutes.

Basically, the smaller your interval counter, the higher the resolution of your result. It'd be pretty simple to parameterize that counter size to allow custom resolutions per endpoint, client, and so on.

What about minimum distance?

One feature that our previous rate limiter supported—which we dropped here—is enforcing a minimum amount of time between two individual requests from the same client. Since we're no longer tracking timestamps for individual requests, we can't really support that.

Or can we?

Actually, it's pretty simple: just add a second limit! For example, a given endpoint might have limits of 100 requests per minute, and 2 requests per second. By configuring both of these, we can ensure a particular cadence of requests from a given client.

Performance Wins

The performance improvement that we've seen in our redis usage has been pretty amazing. In our staging environment tests (using jmeter), we consistently see ~1% Engine CPU Utilization with the new rate limiter, where the old one sees ~40% with the same load.

It's not hard to see why. GET and INCR are O(1) operations in redis, and will remain fast no matter how much traffic we're seeing.

Performance improvements almost always come with a compromise: sometimes, that's increased complexity or a reduced feature set. In this case, we were willing to sacrifice some accuracy in order to keep our response times snappy, even under load.

Older posts