Nikita Dolgov's technical blog: SE

Showing posts with label SE. Show all posts

Apr 20, 2017

Rules of three in software development

Most people agree that software development is a craft. As such, it accumulated heuristics and rules of thumb that are hard to prove in any remotely scientific way. But see enough code bases over a few years and certain patterns will certainly look plausible. I would like to ponder a couple of representative examples that struck a chord with me.

Make it work, make it work right, make it work fast

I believe this saying originated in the heydays of the OO. You could probably interpret it as "building a vertical slice of the system" in RUP or "YAGNI and iterations" in Agile. If you squint enough even MVP-centered thinking common in startups could be another reincarnation of the same principle.

In practical terms it boils down to:

find a small use-case, choose OSS libraries that can help implement it quickly, prepare project infrastructure (e.g. build system, code tree structure), write the code, produce something small but runnable and call it a prototype
once you acquire basic experience with new technologies and have your prototype to share with other people you are ready to have a meaningful technical discussion with them; if all goes well, extend the prototype into an MVP that could be deployed to production
once you have a basic version running in production, start working on a second major release that can improve scalability and stability

My last experience with this flow was with Spark and Parquet. In mid-2015 I prototyped a few basic ideas such as exporting data from our system as Parquet files and running Spark locally to work with Parquet files. I made a few observations about small details not discussed in the documentation. So when ideas for a new ETL pipeline were discussed I could show actual code examples closely related to our particular needs.

Once we could show some numbers from running the prototype in Spring 2016 other people were comfortable with making an architectural decision. By Fall 2016 we had an MVP that could do most of what the legacy code was capable of. While migration to the new pipeline started in production in late Fall, we could switch attention to the second major release which addressed stability issues and applied data storage optimizations.

OSS/3d party, low-level API, less generic in-house implementation

Another evolution I see frequently goes like that:

Come up with big ideas for a new service
Quickly develop a prototype using an OSS product that implements some of the big ideas
In case of encouraging performance test results, finish the MVP and go to production
Once initial performance gets insufficient, start digging into the OSS code and documentation is search for obscure lower-level APIs; replace usage of simple and easy APIs with more efficient but cumbersome ones
Once you understand your system trade-offs and data access patterns consider having your own, less generic but more optimized for your circumstances replacement for the initially used OSS component

A recent example would be solving the challenge of data retrieval for interactive analytics. Postgres was not fast enough anymore. The big idea was to use search engine-style approaches such as fast bitmap indices. The obvious OSS candidate was Lucene. We used a less popular but still high-level API with additional optimizations added later.

Once we had it in production the urgency subsided and we had time to recognize some context-specific assumptions about our data we could make. The second major release dived much deeper into Lucene internals and resulted in a few customizations built to take advantage of what we found. At some point in the future the next step could be going the same way as Druid did - replacing Lucene altogether with an in-house implementation of the inverted index.

Three kinds of developers

By the virtue of being a human issue and not just a technical question this example is harder to discuss. As a matter of fact, there is a beautiful post that you should read even if you ignore the rest of that highly recommended blog.

There is no question that both individual psychology and level of experience play a huge role in this. Unfortunately all I can imagine to be actionable about it is to try to be self-aware enough to see those patterns in yourself and your teammates.

It might be also the case that if you are a settler you will not like A-round companies because of all the chaos, poor engineering, and one-man components. Conversely, a pioneer might feel strange after a B round when a larger team starts building real technology and team communications and documentation needs grow in importance.

Probably because of the time I spent in B-round startups I believe I have seen more pioneers and settlers than town planners. The latter are probably those mythical "enterprise developers" frequently mentioned on HackerNews.

Sep 3, 2010

Lies, damned lies, and bare metal

A former colleague wrote about something I occasionally think about. In my view the post touches on at least three areas that could result in a holy war ("The need for bare metal programming skills", "The need for strong concurrent programming skills", "The right approach to concurrent programming"). What fascinates me is that I am not aware of a coherent narrative for at least some of them.

To begin with, the notion of bare metal is a blurred and moving target. From Cliff Click's detailed discussions to Joshua Bloch'shigher-level recommendations there seems to be a wide-spread belief that contemporary hardware is virtually incomprehensible by most programmers. There are too many moving parts and indirection levels to have a mental model good enough for reliable predictions. And it's not just about high-level Java-like platforms.

As history of computing demonstrates, the hardware progress enables programming in terms of higher-level abstractions. And that pushes the border with bare metal higher with time. It's true for programming languages (consider how VM-based or interpreted/dynamic languages became practical in the last ten years) and for particular sub-fields such as concurrent programming (think about a recent surge of interest in Actors and STM).

Concurrent programming is a similarly gray area and also because of strong influence of hardware. On the one hand, shared memory and actors are just models and can be used to solve the same problems. On the other, software runs on real hardware and commodity hardware nowadays means you start from XCHG instruction and layer levels of abstractions on it. Be it Clojure or Scala, you are looking at java.util.concurrent (and, ultimately the very same CPU-level support of CAS) in disguise.

So with time some indirection levels become so low-level that only very few people have time/need/desire to look at. Today java.util.concurrent is the corner-stone and very few people bother even to look inside (for example, compare the number of people intimately familiar with JCiP with those fluent in AoMP). In a few years it might as well be Actors, if not STM/HTM. And then knowledge of j.u.c will count as low-level black magic. It's not easy to see when such a transition happens and adjust one's definition of "bare" or "low-level".

In addition, there are entire domains such as big data. People in the map-reduce land think in terms of [dozens of] servers, not threads. Actually, even in mainstream software not many are lucky to work with j.u.c-level abstractions and some even explicitly prefer even higher-level ones.

But in general, my experience confirms that serious usage of any technology implies comprehensive understanding of its design and some implementation details. As an example, you do not need to know about the GC to program in Java but if you actually do not you probably work on something trivial.

For complicated technologies it can be hard and take time and so one is necessarily limited and cannot be familiar with everything even in a particular language universe (just think about Java - from J2ME to Hadoop "and still counting"). At least in this day and age, most software comes with source code and so you can always dig deeper to learn the details.

Feb 27, 2009

Programmer Competency Matrix

It is not he first such matrix but it made me think a little. In a week I will be celebrating my first ten years in professional software development. And looking back I can only laugh at myself back then. It's been a long way from almost "Unable to find the average of numbers in an array" but log(n) (Level 3)-like requirements show that the next decade will probably make me feel something similar :)

Sep 2, 2008

Development documentation and Wiki

For me Wiki is a relatively new development documentation media in comparison with Word documents. Although I understand its advantages I am also familiar with its inconveniences. To me Wiki seems to be a compromise where we essentially trade order for universal accessibility.

Recently I had an opportunity to compare the two approaches. Having implemented a new component I wanted to describe it for posterity at the end of the sprint. For simplicity sake I drafted a Word version of the design specification on the basis of a template I came up with a few years ago.

It should be noted that our startup would hardly win any award for the quality of development documentation. In our line of business even established companies struggle with it and fashionable Agile methodologies provided well-intentioned justifications for getting rid of it altogether. So my second goal was to use the opportunity to give a good example of how decent documentation looks like.

I was not surprised to know that although our Director of Engineering appreciated the content he immediately asked me to convert it into Wiki pages. I should admit not all of his objections sound convincing to me. I would even argue many of them are nothing more than a lame excuse for developers who just do not care. Among them were:

It is hard to update, especially in a collaborative fashion. This means that it might easily become out of date as the system changes.
It is hard to index and reference. A multi-page wiki document can be easily bookmarked, watched for changes, and referenced from bugs/tasks in the future.
There is a level of formalism (naming and numbering conventions) that might discourage contributions from people other than the original author.

Just to be balanced here is my Wiki hate list:

It's impossible to have a document template (analogous to ".dot" files in Word)
It's impossible to baseline a document together with code in VCS
It's impossible to version a document as a whole
Splitting a long document into Wiki pages is painful (you need a page naming convention, it's significantly less convenient to format, there is no support for automatic section numbering at different levels, there is no way to generate a TOC)
It's difficult to print the whole document

In other words, Wiki loses a lot of power in comparison with real documents in exchange for, basically, WWW-like look and feel. From a more practical perspective, there are a few question to answer before you write a Wiki document or migrate from a Word one.

A dedicated wiki section. You will need a root for the hierarchy of development documents. Virtually all the wikis I have seen were structured by departments at a high level (e.g. Development, Operations, QA). Under Development pages are typically grouped by type rather than subsystem as well. So although theoretically there might be already a sub-tree for each component in your system I would expect to find or start a new tree somewhere under Development (e.g. Development/Development documentation/Component X)

A standardized tree structure for every component. A conceptual document will need to be split into multiple physical Wiki pages if only to keep them short enough. It' is important to keep a uniform tree structure for all the components.

A page naming convention. There are a few different page types such as a component root (e.g. "Component X"), a document root (e.g. "Component X/Design specification") and document chapter (e.g. "Component X/Design specification/Static view"). It would be even messier if a chapter were comprised of multiple pages. Now those were just page titles, real page names would be like "componentx", "componentx_sds" and "componentx_sds_static".

To have access to multiple versions of the same document pages should have versions. Although Wiki keeps track of page changes it is of little convenience if you want just to read a particular final version (as opposed to searching through multiple drafts with highlighted changes). So I would expect all the pages to carry component version numbers as well. Consequently, page names are likely to resemble "componentx_1_0_0", "componentx_1_0_0_sds" and "componentx_1_0_0_sds_static".

Aug 12, 2008

Pair programming in real life

I am not much of an Agile fan. With a degree in engineering and a background in telco software I came to appreciate the architecture-centered mindset, development methodologies and documentation. Nevertheless, the Agile movement has been very important for marketing of multiple best practices which we all know and love. Just think of TDD and how development had felt before xUnit came to our rescue.

Arguably, XP is the most controversial approach and Pair Programming the most contentious practice it introduced. Personally, I never felt good about it. There are very few people I would tolerate that close and I do not like people touching my keyboard and mouse for sanitary reasons anyway.

There seems to be a more benign variant of this practice though. I have noticed it in more than one company so it seems to be quite common. Reading a good overview of Agile practices recently I found that there is a name for it in FDD: a feature team. The idea is that a feature or component is assigned to a small team, not a single developer. In my experience it usually takes a 2-developer team to be really productive so the parallel with Pair Programming is self-evident.

The benefit is clear. You can bounce ideas of each other, you are likely to have different areas of expertise or at least skill levels, you can complete the assignment almost twice as fast and there will be two people knowledgeable of that particular component.

Aug 6, 2008

Promoting Maven2

As a relatively small company we have a lot of rather messy code to maintain and not much spare capacity to significantly refactor it immediately. All kinds of bad smells are present - from having just two huge binaries shared by drastically different components to having compiled unit tests in those two production JAR files. It makes me cringe every time I think about.

Curiously enough, once upon a time there was a pretty good (at least according to the founder who is a competent software developer) reason for putting everything into one binary. It was easy to updated multiple servers and keeps things simple in general, at least from the operational perspective.

For more than three months I have been pushing for better software engineering practices in a few different areas. My first win was persuading our Director of Engineering to allow me to use Maven2 as the build tool for a new component we designed in this last sprint. To me M2 feels pretty much the way the Spring Framework does - once you try it you cannot imagine you lived without it.

On my current team I turned out to be the only one with previous M2 experience so I expected troubles. From what I observed, people with no previous knowledge of dependency management systems tend to be less than excited about keeping their dependencies explicit. In a sense, it's like TDD - it takes certain changes in your mindset to realize how valuable it can be.

The good news is that another engineer we teamed up in the last sprint was easily convinced once he saw how easy it was to add new classes and unit tests without messing up with Ant-style classpathes/directories/proprietary targets. Luckily, our Bamboo CI server integrates with M2 as well so barring minor integration difficulties (such as the need to install a few proprietary libraries built with the old Ant-based approach into the M2 repository) we are all set.

I really like M2. It's simple but flexible. It has tons of plugins - it took me just ten XML lines or so to add Cobertura to out build. And the really good news is that finally we have two extremely good free sources of documentation. I remember struggling mightily with M2 a couple of years ago when their infamous site was pretty much all there was. Nowadays, you can just go and download either a more introductory book or a more reference-like volume. Even search in public repositories hardly could be made easier.

Oct 31, 2007

TDD

In my opinion test-driven development and refactoring are the most important contributions of the Agile movement. Personally, I find the movement itself rather controversial and oriented on less complex systems. I believe that school of thought brings in a few valuable ideas from cognitive and industrial psychology. On the flip side, I am highly suspicious of their lack of engineering rigor.

Throughout my career I have seen different development organizations. Some took unit testing for granted (sadly, TDD was not explicitly encouraged but even sufficiently extensive unit tests go a long way). Others only briefly dabbled with JUnit. The latter invariably turned out to work on profoundly boring applications with little to brag about in the quality attributes department. Naturally, this experience taught me to screen potential employers by their attitude to unit testing and test coverage tools.

Most of the time the refactoring part of the equation was mostly the matter of taste of individual developers. It is easy to justify but certainly much more challenging to amend. I think refactoring is so intrinsically intertwined with artsy questions of style, readability and proper OOD that it is destined to be a way of thinking and not merely a cookbook. If you remember the founding book on the subject, virtually each and every refactoring there is accompanied with a counterpart reversing the action - you choose which one to apply in a particular situation.

In my experience things that are difficult to formalize are easier to learn by watching. Probably this is the only way and that's why it takes so long for a developer to mature despite a relatively narrow skill set required typically. Such learning tends to evolve through time and requires good reference materials to learn from.

I am reading a couple of books on the subject, namely Test Drivenand xUnit Test Patterns. Clearly, they are not comparable - the former is an also-ran, the latter is essentially the bible from a world-renowned series.

From Test Driven I would recommend an interesting chapter on testing concurrent code (thread-safety, blocking operations, starting and stopping threads, asynchronous execution). I hesitate to come up with another book addressing it.

Strangely enough, I found Test Patterns to be rather boring. Probably because of its 950 pages :) It reads more as a catalog of test idioms ("five ways to create a test fixture - transient, persistent, shared ..") and I was overwhelmed in my attempts to find and learn quickly a few new useful tricks. Surely enough the book is valuable, especially as a conceptual guide, but do not expect much fun.