Nikita Dolgov's technical blog: November 2014

Nov 30, 2014

Kafka design ideas

A new company behind Kafka development and a fascinating new high-throughput messaging framework making waves inspired me to go look at major ideas in Kafka design. Its strategy in general is to maximize throughput by having non-blocking, sequential access logic accompanied by decentralized decision making.

A partition corresponds to a logical file which is implemented as a sequence approximately same-size physical files.
New messages are appended to the current physical file.
Each message is addressed by its global offset in the logical file.
A consumer reads from a partition file sequentially (pre-fetching chunks in the background).
No application level cache. Kafka relies on the file system page cache.
A highly efficient Unix API (available in Java via FileChannel::transferTo()) is used to minimize data copying between application/kernel buffers.
A consumers is responsible for remembering how much it has consumed to avoid broker-side book keeping.
A message is deleted by the broker after a configured timeout (typically a few days) for the same reason.
A partition is consumed by a single consumer (from each consumer group) to avoid synchronization.
Brokers, consumers and partition ownership are registered in ZK.
Broker and consumer membership changes trigger a rebalancing process.
The rebalancing algorithm is decentralized and relies on typical ZK idioms.
Consumers periodically update ZK registry with the last consumed offset.
Kafka provides at least-once delivery semantics. When a consumer crashes the corresponding last read offset in ZK could lag behind. So the consumer that takes over will re-deliver the events from that window.
Event-consuming application itself is supposed to apply deduplication logic if necessary.