When dealing with queue overload you effectively have two levers: increasing the consumption rate or reducing the production rate.
How to increase consumption rate?
This is about adding boxes/containers. An example is whenever you add Sidekiq workers or Kafka consumers to a consumer group.
This is a common resolution because it does not require much effort to do.
You may not always be able to do this due to constraints on your system: pressure on downstream systems, limit on database connections, messaging order constraints, etc.
Reduce processing latency
Consumers are typically picking messages from the queue, doing some work with the messages and then picking more messages. If it takes less time to process messages then consumers can consume messages at a faster pace.
Typical solutions involve performance optimization work (e.g. optimizing database queries, caching), fetching larger batches from the queue, etc.
How to reduce production rate?
Change upstream systems in order to produce less messages
This is about reducing the flow of incoming messages at the source.
You may not be able to do this if the incoming messages are related to user behavior outside of your control (e.g. background jobs which are enqueued in order to send e-mails to users whenever they sign up).
This is a viable option whenever the messages are being enqueued by periodic (cron) jobs. You may change the frequency of those cron jobs or change the logic so they send less message over the queue.
This solution acknowledges that sometimes the answer to a problem is not to throw more resources at it but instead question if we even need to do the work.
Drop messages (a.k.a. Shed load)
This is about discarding messages from the queue whenever it reaches a certain threshold of overload. The producer is still producing messages at a high rate, but the consumer effectively only has to consume a smaller reasonable subset of messages.
To achieve this you can remove messages that were already in the queue whenever a message is added. Alternatively you can drop the new messages placed by the producer.
This is what TCP does at a network level. It has a bounded listen queue that rejects incoming connection requests whenever full.
You may use this approach only if your domain can handle not having received all the messages: either your producers have the ability to retry discarded messages or your application still provides value in cases where only part of the messages have been processed.
This is about being able to automatically slow down producers whenever the queue is overloaded. This can be achieved by having producers block when placing messages on the queue if it is overloaded. Another alternative is to force the producers to spread the work over a large period of time whenever the queue is overloaded.
Very often queue overload is resolved by increasing consumption rate. But reducing production rate is equally effective. Reducing production rate automatically through load shedding or backpressure is also the most effective tool at your disposal to deal with sudden unexpected queue overload.
You can find other posts in this series about queues here.
I can send you my posts straight to your e-mail inbox if you sign up for my newsletter.