Fastest Node.js Fizzbuzz
I recently encountered an interesting thread on CodeGolf that asked for the fastest implementation of everybody's favorite interview question FizzBuzz. Unsurprisingly, the fastest solutions were written in Assembly and C that optimized their use of cache lines and syscalls.
In that thread, there were also some people who were curious on how fast other languages can be when pushed to their limits. Surprisingly nobody has attempted with JavaScript yet, at least not one with a valid solution.
Testing Methodology
All tests were run with the command node.js main.js | pv > /dev/null
on my machine with Ryzen 2600 and Node.js 18.4.0.
Basic Implementation
This is the basic FizzBuzz that everybody knows and love.
However, this is not a valid solution for this particular challenge since the author wanted solutions that can execute up to 2^63-1
. In JavaScript, the maximum safe integer value that can be stored in a native number
(which is basically a double precision float) is 2^53-1
. To fix this issue, we have to use the BigInt
datatype introduced in Node.js 10.4.0.
Running this baseline implementation yields around 1 MiB/s.
Remove Branches
One property of FizzBuzz is that it is cyclic every 15 integers. The first optimization that we can make is to simply remove all the branches (if-statements) and unroll every 15 iterations.
Running this implementation yields around 10 MiB/s.
Remove console.log
Next, we can use process.stdout.write
to directly write to the output stream and bypass the formatter logic inside console.log
.
Running this implementation yields around 30 MiB/s.
Reduce process.stdout.write
Calls
Another caveat of Node.js is that whenever we are writing to stdout
, the main thread gets blocked until the write finishes. Internally, each write is an expensive system call. To minimize these write calls, we can batch what we want to write until our buffer is full.
Running this implementation yields around 100 MiB/s.
Out of Memory Errors
One thing I noticed while testing the previous implementations is that after a few minutes they will run out of memory and crash.
For whatever reason, the garbage collector was not able to free up all the temporary strings. The error log indicates that the garbage collector executes every 4 seconds but fails to free up more than 5 MB each time. As we generate upwards of 100 MiB/s worth of strings, I speculate that the garbage collector simply did not have enough time budgeted to free up all the unused strings.
Memory Safe Implementation
Since this challenge is to implement something that can, in theory, execute on all 2^63-1
integers given enough time, we need a solution that doesn't run out of memory after only a few minutes.
Since strings are too problematic, I decided to use a fixed binary buffer and avoid as much non-stack allocations as possible. This buffer is explicitly allocated once and is reused by resetting the offset counter. The only heap object allocated is where I called Buffer.subarray
which returns a new Buffer
object. Since this new buffer object internally points to the same memory location as the original buffer, the overall the heap impact should be marginal and is unlikely to exceed our hypothesized 5 MB per 4 second garbage collector time budget.
This implementation yields an astounding 8 MiB/s. Yes, this is not a typo. We went from 100 back down to 8!
This is a huge regression but, at the same time, it's not that surprising. Running a profiler on this code reveals the biggest bottleneck is the BigInt
division and modulo operations used to calculate the i-th decimal digit for printing non-fizzbuzz numbers. Since BigInt
is designed to work on arbitrarily large numbers, it must use software-based division algorithms instead of hardware-based 64-bit division instructions.
Worker Threads
Another property of FizzBuzz is that it is trivially parallelizable since the result for one number is independant of every other number. Well... not quite. This challenge also stipulates that we cannot have null bytes in our output. Since different numbers have different number of digits, their locations in our buffer will be dependant on every previous number.
We can still make use of the data-independance of FizzBuzz by delegating each thread to work on a different batch of numbers. For example, thread 1 works on [0, 3000), thread 2 works on [3000, 6000), thread 3 works on [6000, 9000), etc. Inside each thread, we execute sequentially to pack the thread-specific output buffer. Finally, we round-robin print each thread's buffer in our main thread.
Putting everything that we have learned together, we have arrived at this implementation that yields around 370 MiB/s. Not bad considering our base implementation started at 1 MiB/s!
Final Thoughts
Node.js' garbage collector sucks.- Buffer system calls in memory as long as possible.
BigInt
is insanely expensive. If you need to process integers larger than2^53-1
, you should consider using WebAssembly or native modules with access to hardware 64-bit integers. I used pure JavaScript for this challenge because I felt loading modules written in other languages defeats the spirit of the contest for language-specific implementions. Otherwise, every language's fastest implementation will just be a thin wrapper around C modules.
Oct 2022 Update: Fixed the Garbage Collector Issue
I recently revisited this problem and realized that our garbage collector problems were actually caused by backpressure in the write buffer. The functions were not waiting for the stdout
stream to fully flush its write buffer when it was full. As a result, the buffer had to queue up the data from subsequent write calls faster than it can consume which in turn overloaded the garbage collector.
To fix this issue, we can simply pass a callback to the stdout.write
calls to know when we can safely continue.
Surprisingly this change did not affect the performance and this solution still yields an impressive 100 MiB/s.