The problem with large values
After some deep instrumentation and inspection we determined the problem in this particular scenario was that some of our menus were almost half a MB long. Our instrumentation showed us that reading these large values repeatedly during peak hours was one of few reasons for high p99 latency. During peak hours, reads from Redis took more sometimes at random took more than 100ms. This was especially true when a restaurant or a chain with really large menus were running promotions. Why this happens should be a surprise to no one, reading or writing many large payloads over the network during peak hours can end up causing network congestion and delays.Compression to the rescue
To fix this issue, we obviously wanted to reduce the amount of traffic between our server nodes and cache. We were well aware of techniques like LevelDB using snappy to compress, and decrease the on-disk size. Similarly, our friends at CloudFlare also used a similar technique to squeeze more speed out of Kafka. We wanted to do something similar i.e. use a compression algorithm, with good speed and a decent compression ratio. Like other folks we did our benchmarks, and found that LZ4, and Snappy were two nice options. We also considered other famous options like Zlib, Zstandard, and Brotli but found their decompression speeds (and CPU load) were not ideal for our scenario. Due to the specific nature of our endpoint, we found LZ4 and Snappy were more favorable. Both libraries were in the Goldilocks zone of compression/decompression speed, CPU usage, and compression ratio. There are a plethora of benchmarks on the internet already comparing compression speeds and ratios. So without going into detail and repeating the same benchmarks, here are some examples and a summary of our findings:- 64,220 bytes of Chick-fil-A menu (serialized JSON) was compressed down to 10,199 bytes with LZ4, and 11,414 bytes with Snappy.
- 350,333 bytes of Cheesecake factory (serialized JSON) menu 67,863 bytes with LZ4, and 77,048 bytes with Snappy.
- On average LZ4 had slightly higher compression ratio than Snappy i.e. while compressing our serialized payloads, on average LZ4 was 38.54% vs. 39.71% of Snappy compression ratio.
- Compression speeds of LZ4, and Snappy were almost the same. LZ4 was fractionally slower than Snappy.
- LZ4 was hands down faster than Snappy for decompression. In some cases we found it to be 2x faster than Snappy.
Connecting the dots
To see things in action before deploying them to production, we setup a sandbox and chose 10K random menus. The sample contained a good mix of menu sizes ranging from 9.5KB - 709KB when serialized. Getting and setting these entries in Redis without compression, with Snappy, and with LZ4 yielded following numbers:Redis Operation | No Compression (seconds) | Snappy (seconds) | LZ4 (seconds) |
---|---|---|---|
Set (10000) | 16.526179 | 12.635553 | 12.802149 |
Get (10000) | 12.047090 | 07.560119 | 06.434711 |
Conclusion
After deployment in production our instrumentation not only confirmed a drop in p99 latency, but we also noticed reduced Redis memory usage.
Redis memory usage with compression vs without compression

p99 latency and a spike for uncompressed values (compared on same time when values are compressed)