Backpressure – slowing down producers – and load shedding – dropping messages – are two of the methods you can use to mitigate queue overload. Backpressure and load shedding are reaction mechanisms your producers and/or consumers automatically take during queue overload.
These methods are useful because they enforce limits to your queues. A limit is valuable because without it your systems are vulnerable to the producers given the produce rate may be outside of your control.
Even if you don’t enforce a limit, all queues have a limit. That limit may be space in your datastore (e.g. Redis, disk, memory). That limit may be the the constraints for your downstream systems (e.g. throughput, number of connections). That limit may be how much time you are able to wait for the messages to be consumed (e.g. an hour? a day? a week?).
You want to spend time enforcing limits when designing a system rather than having to enforce it during a service incident.
You may provision enough capacity in the consumers to deal with produce rate today, but you don’t know what will happen in the future: how and when will the produce rate change.
Backpressure and load shedding allow the queue to self-heal. This robustness allows you to reduce the amount, impact and stress of incidents:
- lower amount of incidents – the queue no longer overloads therefore, there are a set of incidents which stop occurring.
- reduced impact of incidents- the downstream systems are shielded by the impact of queue overload.
- less stress during incidents- the on-call engineer does not need to rush to mitigative actions because the system can self heal.
I see using backpressure/load shedding a bit like using timeouts in HTTP requests. These mechanisms are really useful under unexpected system conditions and incidents.
Failures in distributed system typically happen at the integration points. Queues are often an integration point between systems so, it is likely they will experience queue overload, a type of failure. Load shedding/backpressure measures increase the resilience of your queues.
You can find other posts of my series on queues here. I can send you my posts straight to your e-mail inbox if you sign up for my newsletter.