Nikita Dolgov's technical blog: Druid and Pinot low latency OLAP design ideas

0. Introduction

Imagine a data schema that can be described as (M, D1, .. , Dn, {(t,v)} ) where every entry has

M - a measure (e.g. Site Visits)
Di - a few dimensions (e.g. Country, Traffic Source)
{(t,v)} - a time series with a measure value for each timestamp

Calling it a time series OLAP could be a misnomer. Time is just another dimension in a typical OLAP schema. Time series usually implies just a metric without dimensions (e.g. server CPU usage as a function of time). Nevertheless in real life systems dealing with such data is quite common. They also face similar challenges. Chief among them is the need for low latency/interactive queries. So traditional Hadoop scalability is not enough. There are a couple systems out there that are less well known than Spark or Flink but seem to be good at solving the time series OLAP puzzle.

1. Druid ideas

For event streams, pre-aggregate small batches (at some minimum granularity e.g. 1 min)
Time is a dimension to be treated differently because all the queries use it
Partition on time into batches (+ versions), each to become a file
Each batch to have (IR-style): a dictionary of terms, a compressed bitmap for each term (and so query filters are compiled into efficient binary AND/OR operations on large bitmaps)
Column-oriented format for partitions
A separate service to decide which partitions to be stored /cached by which historic data server
Metadata is stored as BLOBs in a relational DB
ZK for service discovery and segment mappings
Historical and real-time nodes, queried by coordinators separately to produce final results

2. Pinot ideas

Kafka and Hadoop M/R based
Historical and real-time nodes, queried by brokers
Data segment : fixed-size blocks of records stored as a single file
Data segments
Time is special. Ex: brokers know how to query realtime and historic nodes for the same query by sending queries with different time filters. Realtime is slower because it aggregates at different granularity ("min/hour range instead of day")
ZK for service discovery
Realtime has segments in-memory, flushes to disk periodically, queries both.
Multiple segments (with the same schema) constitute a table
Column-oriented format for partition for segments
A few indices for a segment (forward, single-value sorted, )

3. Common threads

Column oriented storage formats
Partition on time
Information Retrieval/Search Engine techniques for data storage (inverted index)
ZK is a popular choice of discovery service

4. In other news

There also smaller scale products using search engine technology for OLAP data storage.

Nikita Dolgov's technical blog

Jun 12, 2016

Druid and Pinot low latency OLAP design ideas

No comments: