Apr 28, 2011

Mesos and inspiration for next generation Hadoop

The notion of Next Generation Hadoop (NGH) is somewhat blurred at this point. Recent announcements by YHOO and Facebook could be construed as at least two independent branches referred to as NGH. So even now that YHOO is merging all its development to main Apache codeline I am not sure how many NGHs are being developed right now and how convergent that process is.

The only technical description of the NGH I am aware of was written by YHOO engineers. Even though that posts does not mention Mesos I am pretty sure it is not coincidental that the NGH shares so much with it. It is also noteworthy that Mesos itself is now an Apache project.

A summary of "Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center":

Goal: when running multiple frameworks in the same cluster
  • improve utilization through statistical multiplexing
  • share datasets that are too expensive to replicate
Two-stage scheduling with resource offers
  • Mesos decides how many resources to offer each framework
  • Frameworks decide which resources to accept and which computations to run on them
Architecture
  • A fault-tolerant Zookeeper-based master process manages slave daemons running on each cluster node
  • Slaves report to the master which vacant resources they have (e.g. {2CPUs,16GB})
  • A framework scheduler registers with the master
  • A framework scheduler is offered resources, decides which ones to use and describe tasks to launch on those resources
  • A framework executor is launched on slave nodes to execute tasks
  • Each resource offer is a list of free resources on multiple slaves
  • A pluggable master strategy decides how many resources to offer to teach framework
  • Supports long tasks by allowing to designate a set of resources on a slave for use by long tasks
  • Linux containers are used to isolate frameworks
Dominant Resource Fairness
  • Equalize each framework's fractional share of its dominant resource (i.e. the resource that it has the largest fractional share of)
  • Example: make F1's share of CPU equals F2's share of RAM if F1 is CPU-pound and F2 needs mostly memory
Optimizations
  • Filters registered by a framework with the master to short-circuit the rejection process (e.g. only from nodes from a given list or at least as many resources free)
  • For the purpose of allocation, count offered resources as used by the framework to encourage the framework to respond quickly 
  • Rescind an offer if the framework does not respond for too long

No comments: