Feb 18, 2015

GridGain is now an Apache project

It's not exactly news but still quite a recent stage in its evolution. Years ago, say in 2009, it was quite a unique product. There were a few popular data grids/distributed caches to choose from. There was no alternative open source computational grid though. Moreover, GG was easy to use and well documented. Its code base was well structured as a consequence of a highly pluggable architecture. For a small team with roots in a peculiar location it was a very impressive product. 

I am not sure why GG is not more popular. It's not exactly obscure but it certainly feels like a niche system. From the core API it's clear that the original MapReduce had some influence. Taking into account how generic most of GG core is it probably became a limiting factor later. One really important piece missing from their architecture was an explicit scheduler. Without it GG could not be re-purposed as a really generic basis for something akin to "sql-on-hadoop".

Which also raises a question. Did they fail to pivot in that direction? They had a lot of experience with middleware. They were smart enough to appreciate Scala extremely early. Is it conceivable that they might have come up with something like Spark? Did they lack database internals expertise? Did not have enough resources to compete with Internet companies? There is very little reuse in this space. Netty and ANTLR are probably the only libraries shared by the Prestos/Hives/Drills of this world. So apparently nobody needs common middleware. The Impalas of this world try to skip even most of Hadoop.

I am curious about GG future. They went back and forth on their open source/community edition strategy and at at times it looked like desperation. I believe they initially released community edition simultaneously with enterprise one. Then separated them by almost a year. Their github repository was just a backup for development done somewhere else. And now it's all an Apache project. Could it be a new beginning? 

It would be interesting to know how people use GG in 2015 when the market is so over-saturated with "grid/cluster" frameworks. For example, Analytics and Big Data space is big but owned by Hadoop&Co and being seriously attacked by Spark. Event-oriented/stream processing is a contested space now but I never heard GG mentioned in that context. If you squint enough, distributed actors in Akka look similar to GG jobs. I hardly ever see GG in job ads or people's resumes. Is there a market segment large enough left for GG and what is it?

No comments: