Sep 24, 2018

Generating akka-http endpoints from protobuf declarations

When I started working with akka-http last winter I was surprised they had no better alternative to hand-crafting RESTful endpoints. The gRPC Gateway prototype inspired me to some experimentation. Back then I was still hoping to convince people at work to avoid REST. Recently I wanted to share what I had learned with a colleague and it pushed me to finish the first actually working version.

There are a couple interesting tidbits for protobuf connoisseurs in general. I picked them up while digging into the gateway project. To begin with, the protobuf specification has a description written in protobuf. It is important because the protoc compiler supports plugins which in turn operate in terms of those meta-level protobufs. So you don't have to start your protobuf transpiler from scratch. The ScalaPB project did all the hard work of integrating with protoc toolchain. I am still afraid of my SBT file which I had to mostly borrow.

Secondly, the protobuf specification defines a somewhat obscure feature known as extensions. GOOG itself used an extension to define a mapping of RESTful API elements onto protobuf declarations. It was a pleasant surprise to finally find a standard. You take a protobuf service declaration, augment it with an option clause following clear guidelines, and all of a sudden your still valid protobuf file can be a player in the RESTful world too.

All that remains is to walk an AST comprised of protobuf descriptor classes and generate some code.  On the one hand, it easier than writing a query parser. On the other hand dealing mostly with Strings feels less structured. The current approach was borrowed from the gateway project. I am unhappy with its structure but don't have a good enough alternative yet. Nevertheless the code is straightforward and can be extended.

In our case the generated code will be a skeleton of akka-http service. It starts with some boilerplate Akka initialization and then generates HTTP request processing directives. To illustrate the use of the code generator there is a small sample service project. It's pretty self-explanatory once you look at the protobuf file

I learned a few things from this PoC:

  • the protoc toolchain is not particularly nice to JVM-based code; thank God for ScalaPB but your SBT file will be rather cryptic anyway
  • now that we know how to write a scalac plugin the actual transpiler logic is quite easy to write
  • the protobuf-to-REST mapping is very reasonable but has edge cases such as nested messages
One point of contention is streaming. It's well known that gRPC streaming is not synonymous with bulk data transfer. The former operates on individual messages, the latter deals with sending a large file without buffering the entire file in a byte array. The former is represented by some ScalaPB-generated case classes, the latter typically looks like an akka-http Source. So there is no standard way to express in protobuf the need to generate anything akka-streaming-related.

Aug 28, 2018

Transactional saga in Scala IV

Part V : Mapping future

.. in which Japanese researchers and a mad Russian scientist continue

With a reasonable Scala implementation and its Java approximation the original problem was solved. What kept nagging me was a feeling that something "monadic" could be possible. Sagas are not new, a lot of functional idioms are about sequences. Surely some prior art must exist.

As luck would have it I found an interesting paper almost immediately. It was both very relevant (see "2.1 Atomic Action and Workflow") and accompanied by Scala code. Fundamentally the paper considers some advanced features but it starts from a key idea of applying the Continuation Monad to chaining saga transactions. The bad news is that the original code relies on scalaz and does not work with Futures.

At first I stared at the code in despair. Then I remembered that I happen to sit nine feet from a legit mad scientist who is really into FP. He got interested in my use case and graciously adapted the code to the problem at hand. Behold the monadic version:
If you are like me it'll take you a while to meditate on the simplified code above. First continuation machinery is defined, then a means of converting an action into a continuation is provided (notice the "a: => Future[A]" syntax), and finally our favorite saga is actually composed.

Now about that continuation monad. Apparently it is classical but not really discussed even in the red book. Enough people were confused at work about it that the very same scientist came up with a lengthy explanation. I couldn't recommend it enough. Also now I know that implementing a monad in Scala requires three methods that satisfy the monadic laws.

This post is unusual because I don't fully understand the stuff I am writing about. I am still trying to wrap my head around it. My lame excuse is that very few people at work are comfortable with this code. It has nice type safety and composability. For me writing the monadic wiring from scratch would be a stretch. Lots of intellectual stimulation but merging something like that into a real codebase is a very open question for me.

What is worse is that for those who are unprepared (empirically, the vast majority) the core of this code is unmaintainable, pretty much "write-only" incantations. Concurrency could be hard but one can always go read JCiP one more time. For this stuff there is no bible. As a matter of fact I know only one book that successfully tries to put FP into practical use but it's too basic anyway.

P.S.
Speaking of monads and scientists, if you are into akka-http go look at the pull request making Directives monadic. Personally I expected the Akka guys to be much more impressed.

Aug 14, 2018

Transactional saga in Java

Part IV : Intermezzo

.. in which we try and fail exceptionally to have Futures more complete

Having developed the third Scala version I got curious about doing the same in Java.  I more or less expected the CompletableFuture to be a more verbose flavor of the Scala Future. It had also been a year since I stopped writing Java and I wanted to check how I felt about it. If you remember the last Scala iteration the following should look familiar:


At first glance "thenCompose" is like "flatMap" and "exceptionally" resembles "recoverWith". Close but no cigar. You cannot return a CompletableFuture from a function taken by CF::exceptionally. It does not cancel the rest of saga execution either. Not even in JDK10. So I had to throw an exception and to manually roll back transactions the way it worked in the second Scala version. It's tolerable but not exactly elegant.

One lesson here is that Scala Futures are at least a little more powerful than the Java ones. Once you get used to monadic composition two dozen Java signatures do not look all that readable anymore. While typing Java code I also noticed how tedious it is in comparison with Scala. All trivial bits such as constructors, semicolons, and no variable type inference result in constant friction. I am afraid I don't feel like writing Java anymore.

Aug 10, 2018

Transactional saga in Scala III

Part III : Back to the Future

.. in which we return to a Futures-based design but this time compose them idiomatically

Now that a naive Future-based version is out it's time to make it look idiomatic in the third iteration.  I was thinking about composing Futures with flatMap. I did not know how to chain multiple transaction rollbasks though. What I learned in the code review was unexpected.

A colleague caught me by surprise by showing that in a chain of Futures a recoverWith clause is called not just for the Future that failed but also for all the preceding ones.
Another small difference is the nested exception class. The problem it solves is logging the transaction that failed. Without it every rolled back transaction would be logged. This cosmetic improvement doubles the size of the recoverWith body though. Without it this version looks really slick.

At this point I was happy with my pull request. But one thought spoiled the pleasure. We are dealing with a sequence of monadic abstractions. We use flatMap. Could there be some truly monadic approach to composing sagas? Yes, we are finally leaving common sense territory and going some controversial places.

Aug 9, 2018

Transactional saga in Scala II

Part II : Future Imperfect

.. in which Future-based signatures are introduced but the Futures are orchestrated naively 

Now that the problem is understood it's time to consider the second iteration. This time all transactions return Futures. It is very convenient from the perspective of building a reactive service.  Compensation actions still return Tries. Partially for the sake of simplicity, partially to reduce potential for additional errors in the error handling path. The whole saga is wrapped into a Promise to make possible for the client to wait for completion.
As you can see, the conversion to Futures was done rather literally. On the positive side, it's so imperative one can easily see how it works. On the other hand, it looks more like Java. Stay tuned for what I learned during the code review.

Aug 8, 2018

Transactional saga in Scala I

Part I : Tried But Not True

.. in which we come across a real-life problem, establish its context, and try a naive approach to set the stage for the following posts

Recently I faced a practical problem at work which resulted in four iterations of a promising idea. I found the process instructive enough to share some details with the world. Individual iterations could be not particularly interesting but it's relatively unusual to see a head-to-head comparison of this kind. In this and following posts I am going to embed only skeleton implementations from a gist. If you are interested in detail, please see the real code with tests. 

The problem I was working on was making an operation spanning HDFS and RDBMS transactional. Imagine a (micro)service dealing with file uploads. In a typical flow you first put a file into a temporary location, then register it in the DB, and at last move the file to its real location based on a version-like value returned by the DB. We've got three atomic transactions. They should either all succeed or the ones that have been executed by the time of a failure must be rolled back.

My first and, frankly, only idea was to use sagas. It sounds epic :) but it was not supposed to be about fancy CQRS frameworks. I was just looking for a clean way to compose a few basic Scala types such as Try or Future. Also notice that the entire operation is stateful - each transaction returns some values and usually uses the values produced by previously executed transactions. We'll refer to that sate as transaction context. We can represent the basic abstractions like that


  • TxContext represents shared state with type-safe access
  • ObjectStore represents operations with HDFS files; some of them return a Future because input could be an akka-stream Source, others take an array and so return a Try
  • ObjectCatalog represents operations with RDBMS rows


My first iteration was about sketching the general flow of a saga. To begin with there were individual transactions doing actual work. Then there was some looping machinery to execute them and support automatic rollback. 
The ObjectStoreTx trait represents a transaction that can be executed with a context or rolled back. The TryListSaga object implements a loop that executes transactions from a list sequentially and can trigger rollback in case of error.

One obvious problem that the code surfaced is very common in Scala if you mix APIs based on Futures and Tries. No only they do not compose easily but also "waiting on a Future" should happen only at the top level. I had to find a way to rely on Futures everywhere.

Apr 16, 2018

Spark Data Source V2 - core interactions

With the general idea of Spark Data Source API v2 understood, it's time to make another step towards working code. Quite some time ago I was experimenting with the original Data Source API. It's only natural to see how that idea would look after the major API overhaul. I make the same assumptions as before so it's more of a prototype illustrating key interactions. 

Write path


In the write path a new Dataset is created from a few data rows generated by the client. An application-specific Lucene document schema is used to mark which columns to index, store or both. The Dataset is repartitioned and each partition is stored as an independent Lucene index directory.



  • Client - creates a new Dataset from a set of rows; configures data source-specific properties (Lucene document schema and the root directory for all index partitions)
  • LuceneDataSourceV2 - plugs LuceneDataSourceV2Writer into the standard API
  • LuceneDataSourceV2Writer - creates a new LuceneDataWriterFactory; on commit saves the Lucene schema into a file common for all partitions; on rollback would be responsible for deleting created files
  • LuceneDataWriterFactory - is sent to worker nodes where is creates a LuceneDataWriter for every partition and configures it to use a dedicated directory
  • LuceneDataWriter - creates a new Lucene index in given directory and writes every partition Row to the index
  • LuceneIndexWriter - is responsible for building a Lucene index represented by a directory in the local file system; it's basically the same class as used in my old experiment

Read path


In the read path a previously created directory structure is parsed to find the Lucene schema and then load a Dataset partition from every subdirectory. Optionally, (A) the data schema is pruned to retrieve only requested columns and (B) supported filters are pushed to the Lucene engine to retrieve only matching rows.


  • Client - loads a previously written Dataset from a given root directory
  • LuceneDataSourceV2 - plugs LuceneDataSourceV2Reader into the standard API
  • LuceneDataSourceV2Reader - loads Lucene schema from the root directory, prunes it to align with the schema requested by Spark; in case of predicate pushdown, recognizes supported filter types and lets the Spark engine know; scans the root directory for partition subdirectories, creates LuceneReadTask for each
  • LuceneReadTask - is sent to worker nodes where it uses a LuceneIndexReader to read data Rows from the corresponding Lucene index
  • LuceneIndexReader - wraps Lucene index access; pushes the predicates chosen by the LuceneDataSourceV2Reader to the Lucene engine

The first impression is that the new API is actually an API. It has a clear structure and can be extended with additional functionality. We can already see exciting work happening in 2.4 master. 

Mar 28, 2018

Integrating AWS SDK v2 with Akka streams

It's been a while since I last used AWS S3 a lot. In the mean time AWS came up with a new SDK and I started working with Scala and Akka full time. The SDK is still in beta but it's hard to imagine it'll take them more than a few more months to finish. Akka is still convenient but choke-full of APIs at different levels of complexity. Among other things I am researching how some of them could play together nicely. While digging, invariably a nugget or two can be found.

I wanted to plug S3 file download functionality from the SDK into a code path based on Akka streams. The former is all about CompletableFutures, the latter is about Source/Sink abstractions wired together with mostly built-in stages. Both are relatively large and self-sufficient frameworks with their own thread pools and coding idioms. I found one easy way to make them work together.

The trick is to notice that the AsyncResponseHandler returns a reactive stream Publisher. Yes, a third famous API in a file of less than a hundred lines. Akka can create a Source form a Publisher. It was also an unusual opportunity to use a Scala Promise (a quick reminder - in Scala a Promise provides write access to the state represented by a Future for read access). The idea is
  • as is custom in Scala, a time consuming operation is wrapped into a Future
  • a Promise is created before a download operation is started
  • when the download operation is started, the Promise is used to create a Future that is returned to the client
  • when the download operation finishes, the Promise is completed successfully
  • there is also the error case where the Promise is used to convey the error to the client

A unit test illustrates how a created Akka Source can be used to actually transfer some data using built-in Akka file Sink. I use a small S3 file made publicly available by some kind strangers.  

My first impression from this exercise is that it is reasonable but feels a little unnatural. Yes, the code itself is quite concise and the performance penalty must be negligible. But the j.u.c ecosystem is mostly alien to idiomatic Scala.  

Jan 10, 2018

Adapting a gRPC service to REST with ScalaPB Gateway

Now that we have seen a Scala gRPC service and a Scala RESTful service it's only natural to wonder if there is any middle ground. You want to have a nice and performant backend but you could also  have a lot of RESTful legacy code or be really into curl. Your wish is my command.

The first piece of the puzzle is the fact that GOOG has a standard for mapping REST APIs onto protobuf declarations. With no hacks or any hard thinking you can augment your existing proto files to be REST-friendly.

The second piece is making use of such a mapping. Apparently there is a Go implementation of what is known as gRPC Gateway. It can work with any gRPC service and so could be used with the ones implemented in Scala. But there is another, arguably more appealing option. The ScalaPB library we met before can generate a Scala RESTful gateway too. I guess it's not as well tested or supported but then again there is not that much code behind it to prevent it from serious use.

So I went ahead and updated my two-service gRPC example to support ScalaPB REST Gateway generation. It intentionally borrows a lot from the pure gRPC example so that the difference can be easily seen by comparing files. Technically, you still need exactly the same gRPC service implementation as before. The proto files are almost the same.

There are two new objects. To begin with we need a new abstraction to wire together the generated Gateway code. It's essentially a gRPC client with hooks to plug the RESTful stuff into it. The other is just a typical HTTP client. I followed the path of least resistance and used Jackson for JSON marshaling and commons-http for sending requests. There is a little wiring but in general the code speaks for itself.

What I learned from this exercise is that making gRPC co-exist with REST is easy. I am not sure if running a Gateway locally for each service instance is a better idea than creating a single API Gateway service to front the rest of the backend.

As a parting note, I am surprised there is no other IDL for describing RESTful services for the purpose of code generation. With a standard AST and parser library it would be quite strait forward to generate REST endpoints for any popular service framework. It's equally surprising that the Akka team does not have a generator for akka-http endpoint from protobuf declarations. It looks like such a natural step for them and the ScalaPB project even has the foundation in place.

Jan 9, 2018

Asynchronous RPC server with Akka HTTP

At work my team maintains a number of RESTful services. The good news is that they are implemented with akka-http which makes things tolerable. The bad news is that it's all about REST and JSON. So we are having a lively debate about a possible migration to something more like gRPC.

Personally, I am not a fan of non-binary protocols and data formats. At least partially it is related to my experience in building analytics systems where any query routinely requires a lot of data. Yes, for the time being you have to use REST/JSON to communicate between the browser and your API Gateway. But using the same approach to make your backend services talk to each other never made sense to me. I am really looking forward to the day when WASM/HTTP2&Co finally make front end development great again.

In the mean time I would like to see if I can gather enough evidence in favor of my position. So in the spirit of my recent gRPC experiment I am going to implement basically the same API with REST. As a bonus, I want to mention a couple of observations I made using akka-http. As usual, the code is on github. I continue the tradition of implementing a service with two endpoints. I even mostly follow the same code structure as before with gRPC.


The picture above sketches high-level interactions. There are a few noteworthy Akka-related details inside but nobody will be surprised by another RESTful service implementation.
  • TestServiceA - is where the actual logic would be. It also hosts a trait and case classes which would be generated from a protobuf file in case of gRPC
  • HttpEndpointA - an actual HTTP endpoint that plugs service logic into HTTP network endpoint. A good example of Akka HTTP expressiveness.
  • AkkaHttpServer - the service implementation entry point. In typical Scala fashion it directly implements the main method and performs basic Akka dependency wiring. It is also responsible for binding request handlers to the underlying HTTP server.
  • Http - technically an object in a high-level Akka API it represents the usual functionality of request/response-style interactions over HTTP 
  • AkkaHttpClient - not shown; wraps akka-http-based client logic. I spent some time getting it right because client-side examples are harder to find. In particular it illustrates how to use Spray JSON marshaller. But it could be any HTTP client capable of sending a POST request with a JSON body. 
If you are curious enough to look at the code I would like to highlight a few details useful in real life. Not all of them are prominent enough in usual examples. The HttpEndpointA class would be a good illustration of most of them.
  • ExceptionHandler - you can configure a handler for exceptions uncaught otherwise. It is the last line of defense against unexpected problems. 
  • RejectionHandler - such a handler is very useful if you want to explicitly deal with invalid requests (e.g. garbled JSON or more likely a JSON body representing a request to some other service)
  • request timeout - even though Akka has a default global property ("akka.http.server.request-timeout") you could configure it explicitly for a service that is expected to take an unusually long time 
  • implicit JSON (de)serialization with Spray - it's very convenient to use but rather puzzling initially to configure. In addition to the marshaller object holding implicit vals notice how the "import ServiceAJsonMarshaller" statement makes it usable by "as" and "complete" directives
  • composing multiple Routes - different styles are possible here; if there were another URL to support in this endpoint we could (a) declare another "def process2: Route" and then compose the two with "process ~ process2" using another somewhat cryptic Akka directive
On the client side there are a few more details to mention. My first reaction was that Akka HTTP client code looks somehow less straitforward than what one would see with, say, commons-http. Any HTTP client would work for this example but I wanted to see one with akka-http to better grok the API.
  • JSON marshalling with Spray was a puzzle again. It was not clear to me from the official documentation but googling helped.
  • Somewhat unusually, the client requires the same ActorSystem and Materializer infrastructure as the server. 
My conclusion is that if circumstances force me to implement anything RESTful I will reach for Akka HTTP first. But in comparison with real RPC it's more work to produce less safe code. But that's a topic for another day.

Jan 1, 2018

gRPC in Scala with ScalaPB

Curiously enough my old post on gRPC still gets disproportionately many hits. Now that I work with Scala full time I was curious to see how my old Java example would compare with a Scala version. Spoiler alert: I don't have that much new to report. Other people covered it better before. This post probably makes more sense as a head-to-head comparison between Java and Scala versions.

From what I see Scala people tend to avoid Java libraries. I am skeptical it's a good idea especially around networking and concurrency where the Java ecosystem is at its best. But I am still learning and have no strong opinions developed yet. After all, last time I used Java j.u.c Futures, Guava Futures, and CompletableFutures were still all in the game.

In case of gRPC the unofficial Scala code generator is known as ScalaPB. Once you configure it in your SBT project it feels as natural as the official Java version. The good news is that the generated code operates in terms of Scala Futures and so feels idiomatic. But you still need to initialize the gRPC foundation and that part looks almost the same as in Java.

As before, the goal is to support two different service endpoints on the same server and to have fully asynchronous client and server implementations. As we concentrate on the request/response approach we completely ignore the question of streaming. gRPC bootstrap code is so simple that I could not find enough of it to extract anything truly generic. So even though I notionally have a server and a client they just wrap initialization logic and might be unnecessary.

There are a couple technicalities I would like to highlight:
  • I had to google and experiment to find out how to configure ScalaPB for a multi-module project and a non-default proto file directory
  • even though your actual logic is all about Futures and so requires ExecutionContexts, it's easier to start with an Executor to name threads and configure underlying thread pools and then wrap it into a context
  • I can think of a few slightly different ways to propagate ExecutionContexts; implicit vals are idiomatic but seem to be less readable despite saving a little typing
All in all, using ScalaPB feels a little more idiomatic but not that different from how it's done in Java.