Tuesday, September 27, 2011

Client Oriented Architecture

For no particular reason, I have been thinking a fair amount recently about the CAP theorem and how the basic problem that it presents is worked around in various ways by contemporary and even ancient systems.

I remember years ago as a freshly minted SOA zealot, I was confused by the pushback that I got from mainframe developers who insisted that client applications needed to have more control over how services were activated and how they worked.  I always thought that good, clean service API design and "separation of concerns," along with developer education and evangelism would make this resistance go away.  I was wrong.

I still think the basic idea of SOA (encapsulation and loose coupling) is correct; but once you shatter the illusion of the always-available, always-consistent central data store, you need to let the client do what it needs to do.  The whole system has to be a little more "client-oriented."

The Dynamo Paper provides a great example of what I am talking about here.  I am not sure it is still an accurate description of how Amazon's applications work; but the practical issues and approaches described in the paper are really instructive.  According to the paper, Dynamo is a key-value store designed to deliver very high availability but only "eventual consistency" (i.e., at any given time, there may be multiple, inconsistent versions of an object in circulation and the system provides mechanisms to resolve conflicts over time).  For applications that require it, Dynamo lets clients decide how to resolve version conflicts.  To do that, services maintain vector clocks of version information and surface what would in a "pure" SOA implementation be "service side" concerns to the client.  To add even more horror to SOA purists, the paper also reports that in some cases, applications that have very stringent performance demands can bypass the normal service location and binding infrastructure  - again, letting clients make their own decisions.  Finally, they even mention the ability of clients to tune the "sloppy quorum" parameters that determine effective durability of writes, availability of reads and incidence of version conflicts.

Despite the catchy title for this post, I don't mean to suggest that SOA was a bad idea or that we should all go back to point-to-point interfaces and tight coupling everywhere.  What I am suggesting is that just having clean service APIs at the semantic, or "model" level and counting on the infrastructure to make all decisions on behalf of the client doesn't cut it in the post-CAP world.   Clients need to be allowed to be intelligent and engaged in managing their own QoS.  The examples above illustrate some of the ways that can happen.  I am sure there are lots of others.  An interesting question is how much of this does it make sense to standardize and what ends up as part of service API definitions.   Dynamo's context is a concrete example.  Looks like it just rides along in the service payloads so is effectively standardized into the infrastructure.

Sunday, September 4, 2011

Scaling DevOps

In a great InfoQ talk, John Allspaw (Flickr, Etsy) presents a compelling argument for really aggressively breaking down the barriers between IT development and operations.  The talk presents lots of simple examples that anyone who has ever managed development and/or operations can relate to.  It also shows great tech leadership attitude, IMO.

One thing that Allspaw mentions off hand is that the organizational scalability of the model is still being proved out.  It is ironic that the secret behind manageable massive scalability in some of the largest web sites may be small team size.  When the whole tech team consists of 20 people, it is not hard to get them all in a room and develop and maintain a completely shared vision.

There are two questions that I have been thinking about related to this.  First, how do you scale it organizationally - i.e., how do you make it work in large organizations with multiple applications?  Secondly,  while I think the basic principles of DevOps are great, how do you manage the collision with ITIL and other traditional management processes and structures?  I find myself more than a little amused by this post that basically points out the lack of any kind of blueprint or methodology worked out beyond "break down the silos" and do the obvious in terms of config management, continuous integration and "infrastructure as software."

The scaling problem here is the same problem faced by "new management" principles such as the transition from bureaucracy to dynamic linking advocated by Steve Denning.   Part of what is needed is the management equivalent of inversion of control - some kind of "execution framework" enabling small, collaborating but decentralized teams to achieve their goals without either having a static master plan or detailed knowledge of everything everyone else is doing [1].  The other is a scalable approach to prioritization, planning and performance measurement.  Again without a static top-down process, we need a way to develop plans and measures that maintain direct and strong connection between each small team with the end customer and product. 

The second question is even harder.  Allspaw acknowledges for example that trying to throw out traditional change management altogether is a bad idea, even with the greatest, most well-coordinated team.  For some changes, you need to, as he puts it "get out the stack of ITIL books" and follow a more traditional process.  The problem here is both what to aim for and how to engineer the transformation, especially when the point of departure is a large group with traditional processes and detached, bureaucratic management.

[1] For another way to see the analogy here, have a look at Robert C Martin's,  Dependency Inversion Principle.  Look at the small, self-directed teams as the "lower level modules," senior management as "higher level modules" and the "abstractions" as the higher level goals and values of the enterprise.