A new company behind Kafka development and a fascinating new high-throughput messaging framework making waves inspired me to go look at major ideas in Kafka design. Its strategy in general is to maximize throughput by having non-blocking, sequential access logic accompanied by decentralized decision making.
- A partition corresponds to a logical file which is implemented as a sequence approximately same-size physical files.
- New messages are appended to the current physical file.
- Each message is addressed by its global offset in the logical file.
- A consumer reads from a partition file sequentially (pre-fetching chunks in the background).
- No application level cache. Kafka relies on the file system page cache.
- A highly efficient Unix API (available in Java via FileChannel::transferTo()) is used to minimize data copying between application/kernel buffers.
- A consumers is responsible for remembering how much it has consumed to avoid broker-side book keeping.
- A message is deleted by the broker after a configured timeout (typically a few days) for the same reason.
- A partition is consumed by a single consumer (from each consumer group) to avoid synchronization.
- Brokers, consumers and partition ownership are registered in ZK.
- Broker and consumer membership changes trigger a rebalancing process.
- The rebalancing algorithm is decentralized and relies on typical ZK idioms.
- Consumers periodically update ZK registry with the last consumed offset.
- Kafka provides at least-once delivery semantics. When a consumer crashes the corresponding last read offset in ZK could lag behind. So the consumer that takes over will re-deliver the events from that window.
- Event-consuming application itself is supposed to apply deduplication logic if necessary.