Kafka vs RabbitMQ vs SQS: Message Bus Tradeoffs

Message bus choice depends on what you mean by ‘messaging.’ Stream, queue, or pub-sub all map to different tools.

Kafka: streaming, retention

Kafka is high-throughput streaming with long retention and replay. Ordered partitions, durable storage, consumer-driven offsets; the right call when "I might want to replay this" is part of the requirements.

High throughput. Hundreds of thousands of messages per second per broker; scales horizontally with partitions.
Ordered partitions. Per-partition ordering guarantee; the consumer reads in producer order.
Long retention. Days, weeks, or forever; the broker is the source of truth, not a transient buffer.
Sweet spot. Event streams, analytics pipelines, log aggregation; anything you might want to replay or reprocess.

RabbitMQ: queues, routing

RabbitMQ is queues with rich routing. Lower throughput than Kafka, more flexible than SQS; the right call when routing rules and selective consumption are first-class needs.

Queues. Classic queue semantics; consumers pull, broker tracks acks; the canonical message-queue model.
Routing rules. Topic, header, and direct exchanges; complex topologies expressed declaratively.
Selective consumption. Consumer subscribes to specific routing keys; the broker filters; saves consumer-side complexity.
Sweet spot. Task queues, work distribution, complex routing topologies; anywhere "this message goes to that consumer" is intricate.

SQS: managed simplicity

SQS is AWS-managed simplicity: no broker to operate, near-zero ops, standard or FIFO. The right call for AWS-committed teams that want a queue and want to skip the operational tax of running their own.

AWS-managed. No broker, no patches, no failover engineering; the operational story is "submit and pay."
Standard or FIFO. Standard for at-least-once and best-effort ordering; FIFO for strict ordering at lower throughput.
Near-zero ops. Auto-scaling, durability, regional replication all handled by AWS.
Sweet spot. Simple async tasks where the queue is just a queue; you want to focus on producers and consumers, not brokers.

The dual-bus pattern

Many teams ship a dual-bus architecture: Kafka for events that other systems consume, SQS for tasks one service produces and another consumes. Each at the right scale; no overlap if you scope by purpose.

Kafka for events. Domain events that multiple systems consume; the analytics warehouse and the search index both subscribe.
SQS for tasks. Producer-consumer task queues; one service hands a unit of work to another; queue semantics fit.
No overlap. Scope each by purpose; the two patterns rarely compete; each plays to its strengths.
The cost. Two systems to operate; team needs both literacies; the duplication pays back if both purposes exist.

Antipatterns

Kafka for simple async tasks. Operational overhead exceeds value.
RabbitMQ for analytics streams. Throughput limit; replay weak.
SQS with throughput beyond limits. Hits soft caps; degraded mid-quarter.

What to do this week

Three moves. (1) Run a 30-day trial of the candidate against your real workload. (2) Compare TCO + workflow fit, not just feature checklists. (3) Decide and commit; running both in parallel is the most expensive option.