Nikita Dolgov's technical blog: 2008

Nov 24, 2008

FP and LISP for the rest of us

As a developer with a traditional C++/Java background and a non-CS major I find the recent surge in popularity of functional programming pretty confusing. It turns out that there is a whole parallel universe of FP where most hard-won imperative experience is of little help in understanding new concepts. And those are frequently hidden behind bizarre syntax.

I have stumbled upon a couple of very well written articles on FP and LISP . The author starts from something as trivial as Java and Ant and ends up explaining, say, the power of LISP. Very approachable with quite a few insights.

Nov 6, 2008

Date and time in Java

Today I was fixing a bug which revealed a problem in how we incremented dates (yes, by adding 86400000). Naturally, it led to corrupted data after recent switch back to Pacific Standard Time from Pacific Daylight Time. I had to go review a very good overview of time-related Java issues and play with a date calculator.

Frankly, it's still rather counter-intuitive to me how much complexity is behind seemingly innocent long timestamps. What could be simpler than milliseconds passed since 1970? ;)

Nov 2, 2008

Event Stream Processing

I have just stumbled upon this new approach to event-based design. A wikipedia example says it all in 7 lines. The idea is to use an SQL-like language to create filters (capable of event correlation, typically limited to a certain period) and run a continuous stream of events through them. There is even an open source implementation. At first glance most currently available products are used in algorithmic trading (whatever was left of it after the crisis :) ).

To me this concept seems to be a kind of rule engine and so potentially there are multiple domains where it might be applied. On the other hand it makes me wonder whether it is indeed a re-packaged rule engine and in such a case what is so different about it.

Performance is the name of the game because ESP is positioned as a soft real-time technology. It must be pretty interesting to read how they manage to achieve their declared ability to process a huge number of events per second on commodity hardware.

Oct 15, 2008

Bay Area Scala Enthusiasts

There are a few user group-like communities I am aware of

Bay Area Scala Enthusiasts, Silicon Valley Java Users Group,Bay Area Functional Programmers Unfortunately they frequently meet somewhere on the peninsula which is a decisive no-go for people working in SF and commuting on BART.

Luckily, the latest Bay Area Scala Enthusiasts meeting was held at Twitter HQ in downtown SF yesterday. Naturally, I could not resist the temptation of walking 0.8 mile. The people were very nice and it felt so good to be in a true geek crowd. It would be great to attend such gatherings on a permanent basis so I am crossing my fingers hoping that the organizers will pay more attention to San Francisco.

The official presentation was a rather lame discussion of a nascent coding guideline. I would say that this is a non-issue (at least in the scope they mentioned) usually addressed by each team (e.g. questions such as "2-space v 4-space v tab formatting"). The fact that some people come to Scala from fairly uncommon scripting backgrounds complicated the discussion. Later the conversation shifted to real life experiences and favorite tools and that part was more instructive.

It was mentioned that in contrast to Java the question of Scala style is open because different approaches are used in Scala standard library. Many people are disappointed to discover that behind the functional API there is a lot of imperative code. In other words, although Scala seems to be the front runner in Java 2.0 race there is very little infrastructure (both conceptual - such as books/idioms/style guidelines and concrete - such as DI containers) in place or even under active development. And it looks like there is still enough confusion whether to re-use/wrap Java components or re-implement them from scratch.

Actually, I am surprised to find out that non-Java folks are interested in Scala. Judging from blogs web-heads are into dynamic /scripting languages such as Ruby. I believe Scala should be more appealing to people who moved to Java from C++ or find Erlang too weird because of its OOP-unfriendly nature.

Sep 14, 2008

Improved Scala support in Idea

The other day I noticed that IntelliJ had issued an updated Scala plugin. Unfortunately it works in Idea 8 only (with a good justification) so I had to install its beta as well. The good news is that it was worth it. I remember the original plugin which did not qualify even as a toy. This time they made real progress.

On the positive side now there is decent syntax highlighting, Ctrl-click navigation and other basic Idea facilities actually working. Clearly there is a lot of space for improvement from better code correctness checks to Scala javadoc support in Ctrl-Q but at least now it is possible to play with the language in a meaningful way.

This year I have seen so many blogs telling about Java decline and the rise of dynamic and/or functional languages that I decided it's time to extend my horizons. I decided to look at Scala (as a "better C++ than Java") and Erlang (because of its telco roots and the availability of a very good book). So far I have been struggling mightily with weird syntax of functional languages.

What I really find puzzling is that despite all the buzz about superiority of dynamic/functional languages I have not seen a single book on design with them. The standard OOA&D cannot be completely applicable (e.g. there are no classes in Erlang and no interfaces in a typical dynamic language) and even UML is likely to be more convenient for pure OO. Taking into account how long and hard it was for most people to master true OOP I am surprised with the lack of such discussions.

Sep 2, 2008

Development documentation and Wiki

For me Wiki is a relatively new development documentation media in comparison with Word documents. Although I understand its advantages I am also familiar with its inconveniences. To me Wiki seems to be a compromise where we essentially trade order for universal accessibility.

Recently I had an opportunity to compare the two approaches. Having implemented a new component I wanted to describe it for posterity at the end of the sprint. For simplicity sake I drafted a Word version of the design specification on the basis of a template I came up with a few years ago.

It should be noted that our startup would hardly win any award for the quality of development documentation. In our line of business even established companies struggle with it and fashionable Agile methodologies provided well-intentioned justifications for getting rid of it altogether. So my second goal was to use the opportunity to give a good example of how decent documentation looks like.

I was not surprised to know that although our Director of Engineering appreciated the content he immediately asked me to convert it into Wiki pages. I should admit not all of his objections sound convincing to me. I would even argue many of them are nothing more than a lame excuse for developers who just do not care. Among them were:

It is hard to update, especially in a collaborative fashion. This means that it might easily become out of date as the system changes.
It is hard to index and reference. A multi-page wiki document can be easily bookmarked, watched for changes, and referenced from bugs/tasks in the future.
There is a level of formalism (naming and numbering conventions) that might discourage contributions from people other than the original author.

Just to be balanced here is my Wiki hate list:

It's impossible to have a document template (analogous to ".dot" files in Word)
It's impossible to baseline a document together with code in VCS
It's impossible to version a document as a whole
Splitting a long document into Wiki pages is painful (you need a page naming convention, it's significantly less convenient to format, there is no support for automatic section numbering at different levels, there is no way to generate a TOC)
It's difficult to print the whole document

In other words, Wiki loses a lot of power in comparison with real documents in exchange for, basically, WWW-like look and feel. From a more practical perspective, there are a few question to answer before you write a Wiki document or migrate from a Word one.

A dedicated wiki section. You will need a root for the hierarchy of development documents. Virtually all the wikis I have seen were structured by departments at a high level (e.g. Development, Operations, QA). Under Development pages are typically grouped by type rather than subsystem as well. So although theoretically there might be already a sub-tree for each component in your system I would expect to find or start a new tree somewhere under Development (e.g. Development/Development documentation/Component X)

A standardized tree structure for every component. A conceptual document will need to be split into multiple physical Wiki pages if only to keep them short enough. It' is important to keep a uniform tree structure for all the components.

A page naming convention. There are a few different page types such as a component root (e.g. "Component X"), a document root (e.g. "Component X/Design specification") and document chapter (e.g. "Component X/Design specification/Static view"). It would be even messier if a chapter were comprised of multiple pages. Now those were just page titles, real page names would be like "componentx", "componentx_sds" and "componentx_sds_static".

To have access to multiple versions of the same document pages should have versions. Although Wiki keeps track of page changes it is of little convenience if you want just to read a particular final version (as opposed to searching through multiple drafts with highlighted changes). So I would expect all the pages to carry component version numbers as well. Consequently, page names are likely to resemble "componentx_1_0_0", "componentx_1_0_0_sds" and "componentx_1_0_0_sds_static".

Aug 12, 2008

Pair programming in real life

I am not much of an Agile fan. With a degree in engineering and a background in telco software I came to appreciate the architecture-centered mindset, development methodologies and documentation. Nevertheless, the Agile movement has been very important for marketing of multiple best practices which we all know and love. Just think of TDD and how development had felt before xUnit came to our rescue.

Arguably, XP is the most controversial approach and Pair Programming the most contentious practice it introduced. Personally, I never felt good about it. There are very few people I would tolerate that close and I do not like people touching my keyboard and mouse for sanitary reasons anyway.

There seems to be a more benign variant of this practice though. I have noticed it in more than one company so it seems to be quite common. Reading a good overview of Agile practices recently I found that there is a name for it in FDD: a feature team. The idea is that a feature or component is assigned to a small team, not a single developer. In my experience it usually takes a 2-developer team to be really productive so the parallel with Pair Programming is self-evident.

The benefit is clear. You can bounce ideas of each other, you are likely to have different areas of expertise or at least skill levels, you can complete the assignment almost twice as fast and there will be two people knowledgeable of that particular component.

Aug 6, 2008

Promoting Maven2

As a relatively small company we have a lot of rather messy code to maintain and not much spare capacity to significantly refactor it immediately. All kinds of bad smells are present - from having just two huge binaries shared by drastically different components to having compiled unit tests in those two production JAR files. It makes me cringe every time I think about.

Curiously enough, once upon a time there was a pretty good (at least according to the founder who is a competent software developer) reason for putting everything into one binary. It was easy to updated multiple servers and keeps things simple in general, at least from the operational perspective.

For more than three months I have been pushing for better software engineering practices in a few different areas. My first win was persuading our Director of Engineering to allow me to use Maven2 as the build tool for a new component we designed in this last sprint. To me M2 feels pretty much the way the Spring Framework does - once you try it you cannot imagine you lived without it.

On my current team I turned out to be the only one with previous M2 experience so I expected troubles. From what I observed, people with no previous knowledge of dependency management systems tend to be less than excited about keeping their dependencies explicit. In a sense, it's like TDD - it takes certain changes in your mindset to realize how valuable it can be.

The good news is that another engineer we teamed up in the last sprint was easily convinced once he saw how easy it was to add new classes and unit tests without messing up with Ant-style classpathes/directories/proprietary targets. Luckily, our Bamboo CI server integrates with M2 as well so barring minor integration difficulties (such as the need to install a few proprietary libraries built with the old Ant-based approach into the M2 repository) we are all set.

I really like M2. It's simple but flexible. It has tons of plugins - it took me just ten XML lines or so to add Cobertura to out build. And the really good news is that finally we have two extremely good free sources of documentation. I remember struggling mightily with M2 a couple of years ago when their infamous site was pretty much all there was. Nowadays, you can just go and download either a more introductory book or a more reference-like volume. Even search in public repositories hardly could be made easier.

May 8, 2008

Terracotta at JavaOne

Today I went to see what was going on at JavaOne which is held a couple of blocks from my office. The Pavilion was not much different from last year so I ended up talking mostly to Terracotta guys and brainy Taylor in particular. I was asking different nasty questions about their approach and he was kind enough to talk to me for at least half an hour. Right after he talked to Brian Goetz himself.

My opening salvo quite naturally was about their messaging framework. I was essentially told that JGroups clusters of more than 4 servers had been widely known and Apache Tribes did not fit exactly and bug-fixing was insufficiently quick. From what I know the whole JBoss stack (at least in its JBoss AS 5 reincarnation) is supposed to depend on JGroups and so I am at a loss to reconcile such contradictory statements. Probably the key is the number of servers in a cluster.

At face value it means there is no open source reliable messaging framework capable of scalability beyond a couple of servers. Taking into account that pretty much anything which is not yet "in the cloud" is clustered nowadays that sounds odd. And it drives home the idea of reliable messaging as a truly challenging thing to make of production quality.

I learned more about there positioning as well. They are after the middle market of, roughly speaking, up to 50 servers in domains such as web application. Which I guess implies that JBoss is a competitor more than Coherence/Gigaspaces going after larger clusters in Finance.

An active-active L2 server configuration is expected by the end of the year although the common belief is that the 10 seconds required to switch to a backup server now are tolerable. From how I understood it they are planning to send separate updates to both L2 servers instead of multicast or replication between the two. I might have misconstrued something though.

We talked about their paradigm a little bit. I admit to being rather uncomfortable with it because they are the only company I know literally exploiting the conceptual similarity of concurrent and distributed systems (i.e. CPUs sitting on the same bus differ from servers in a cluster only by communication delays much more pronounced in the case of a LAN). It is so different from pretty much any product (exposing a real API in terms of actual interfaces in, say, JCache as opposed to delimiting transactions with monitorenter/monitorexit pairs) that either they have invented the best thing since sliced bread or they are likely to fail as mavericks. They might as well be the next "the network is the computer" after all.

The foundational paradigm of Terracotta as a distributed JVM (complete with a DGC) evokes the same kind of argument as JVM used to ten years ago. Back then the idea of Java performance comparable with C++ was ridiculous although it was said at the very beginning that JIT-style dynamic optimizations would do the trick one day. It looks like JVM guys have pulled the trick after all so this lesson may have significant implications for Terracotta.

As an example, it can detect that a particular instance is used exclusively by one L1 server and transfer lock ownership to that L1 server from the central L2 host and so effectively avoid using distributed locking. As a result we have a sort of buddy replication (between L1 and L2). Like a JVM silently eliminating synchronization in a sequential program. Theoretically neat :)

One thing I can say safely is that in contrast to many companies in this field Terracotta is not afraid to share its source code. They do not pretend like Coherences of this world do that someone can steal anything from the code base and ruin their empire (anybody heard of a new M$ after the windows source code was leaked on the net?). As a developer I believe that code quality says a lot about the corresponding system (not to mention things one can learn from a large successful system) and I applaud Terracotta for their bravery.

Apr 23, 2008

NIH and reliable messaging

We are planning to use Terracotta as our distributed caching infrastructure. I always thought this framework was rather odd - pretty much every one else follows the usual "JGroups-like" approach of providing an API manipulating with such abstractions as Channel/Group/Member or the recent Map/Reduce-inspired analogues.

For historical reasons I am curious about different transport-level frameworks. Last year I was looking closely at JBoss and was amazed by their indiscriminate use of a few overlapping technologies such as JGroups and JBoss Remoting (which they were planning to migrate to MINA anyway). Coherence has its ownTCMP protocol and so does Gigaspaces. And a new messaging standard will likely influence the field. I certainly understand why developers are so inclined to re-implement the same functionality. Although there are well-known patterns for designing such a component there is a lot of hard-core fun left in all the intricate details of harnessing NIO and multithreading.

The flip side is that this kind of software is extremely difficult to get right. Even years later there could be elusive bugs (I heard JGroups still fails when there are a few dozen severs) . From my experience hunting them even in a 4-server configuration can be nightmarish to put it mildly. So I guess for those who do it implementing such a framework is a major professional self-esteem booster.

Naturally, one of the first things I was looking at in the Terracotta code base was their clustering. Curiously enough, they borrowed their transport layer - I guess they were having enough fun with higher-level state clustering. But the layer itself proved my point again - it's the framework used in Tomcat 6. So ASF seems to be following the same route of developing multiple overlapping technologies as JBoss is.

Mar 19, 2008

Consistent Hashing

An interesting and conceptually simple approach to choosing what to put on a new cluster member or where to move the state of a failed one.

Mar 10, 2008

Interview-quality concurrent primitives

The exciting thing about interviews is that you hardly ever know how any of those will be conducted. Take concurrency as an example. It could be ignored, some people may ask you about plain vanilla wait/notify semantics, other about using primitives from the util.concurrent library, and then someone is likely to ask you about arguably more intellectual algorithmic side. In the latter case, a typical task is to implement something like a thread-safe bounded buffer or a read-write lock.

Well, I guess except for Messrs Lea and Goetz few would probably attempt to roll out a production quality CAS-based design having just a white board and half an hour. Which means that knowing simpler approaches makes sense at least in the context of interviews. Here where a simplistic book can help.

A half of it is dedicated to the RTSJ which seems to be more of a research toy. The other half provides a rather messy introduction to concurrency. I used to consider this book to be mostly useless (too thin theoretically, insufficiently util.concurrent-oriented for real-life development). But a few basic implementations of primitives such as semaphores and locks certainly can be used for interview purposes.