Big Ball of Mud

A while back I somehow stumbled upon a little paper about the Big Ball of Mud patten.

This was an interesting and amusing read. In this blog post I’m adding some additional thoughts from my side, on things I found to be missing, misleadingly explained or disagree with altogether. To be fair to the original authors, the paper is 17 years old. I only noticed this after finishing reading it, though it does explain why certain buzzwords such as “agile” and “scrum” are not used, even though the authors are clearly describing the same concepts.

Don’t know what the Big Ball of Mud pattern is about? You can check the Wikipedia article or simply read the introductory quotes from the paper:

While much attention has been focused on high-level software architectural patterns, what is, in effect, the de-facto standard software architecture is seldom discussed. This paper examines the most frequently deployed architecture: the BIG BALL OF MUD. A BIG BALL OF MUD is a casually, even haphazardly, structured system. Its organization, if one can call it that, is dictated more by expediency than design. Yet, its enduring popularity cannot merely be indicative of a general disregard for architecture.

A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-baling-wire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems.

If you are able to distinguish good code from spaghetti code and have been in the software development field for a while, you will undoubtedly also come to the conclusion that this is indeed the most pervasive “architecture pattern”. The paper outlines some interesting factors and contains several analogies on both the cause and the effect of this. My main interest as a software craftsmanship advocate is better understanding these, so they can be dealt with better.

Architecture is often seen as a luxury or a frill, or the indulgent pursuit of lily-gilding compulsives who have no concern for the bottom line.

This resonates much with what I have observed over my modest career so far. In fact, I had this attitude towards software design and architecture for many years. And I got out of that almost by accident. The pervasiveness of this attitude makes it very hard to realize it is short-sighted. Education is likely to blame as well to some extend. At least the one I got completely failed to stress the importance of good design.

It seems to me that many people seem to think that when one needs to put mental effort into designing a system, the same effort will be required to understand it later on. While it is certainly possible to do this through inexperience, or intentionally when one tries to obfuscate a system, the opposite is true when an experienced person pursuits good design. The effort put into the design is to make it simple, not complicated. A well thought out design will be simpler than a pile of code that was written without thought about its organization.

Indeed, an immature architecture can be an advantage in a growing system because data and functionality
can migrate to their natural places in the system unencumbered by artificial architectural constraints.
Premature architecture can be more dangerous than none at all, as unproved architectural hypotheses turn
into straightjackets that discourage evolution and experimentation.

I certainly do see the point being made here and agree that architectural constrains can be very dangerous to a project. What I absolutely object to in this paragraph is the notion that a mature architecture contains these constraints while an immature one does not. One of the main tasks of architecture is to not make choices and delay them as long as possible.

Example: An architectural decision is to abstract the storage mechanism of an application. Now it can be developed without creating a full MySQL implementation or whatnot. As the storage is abstracted, you can develop most of the app using some simple in memory data. Then later on, when you have much more experience with the project and the actual needs, you can decide what technology to use for the real implementation. And you will be able to use different implementations in different contexts. Contrast this to the Big Ball of Mud approach, where no abstraction is used. In this case the code directly binds the the implementation, and you do not only make the decision to go with a particular technology, you also end up binding to it in such a way that you greatly constrain the evolution options of the project.

Architecture is a hypothesis about the future that holds that subsequent change will be confined to that part of the design space encompassed by that architecture

This is much in line with the previous quote. Architecture is definitely a hypothesis about the future. Doing architecture well involves balancing many forces and probabilities. And as already mentioned, one of the main goals is to keep ones options open. While one can guess at the future and distinguish between the likely and the unlikely, no one can predict it. What is not done by a good architect is putting all your money on a specific bet unless it cannot be reasonably avoided.

BIG BALL OF MUD might be thought of as an anti-pattern, since our intention is to show how passivity in the face of forces that undermine architecture can lead to a quagmire. However, its undeniable popularity leads to the inexorable conclusion that it is a pattern in its own right. It is certainly a pervasive, recurring solution to the problem of producing a working system in the context of software development. It would seem to be the path of least resistance when one confronts the sorts of forces discussed above.

The path of least resistance.

The paper somewhere mentions that creating good architecture requires effort as is the case with all things that decrease entropy. This effort is well worth it in many cases.

It is however not the quickest path. The resistance in writing the initial code is very low, though you will quickly pay for this with technical debt, debugging, and trying to figure out what the code you wrote a while back actually intends to accomplish. It seems that many developers do not realize these pains are mostly caused because of the “shortcuts” they are taking and can largely be avoided.

When it comes to software architecture, form follows function. The distinct identities of the system’s architectural elements often don’t start to emerge until after the code is working.

Domain experience is an essential ingredient in any framework design effort. It is hard to try to follow a front-loaded, top-down design process under the best of circumstances. Without knowing the architectural demands of the domain, such an attempt is premature, if not foolhardy. Often, the only way to get domain experience early in the lifecycle is to hire someone who has worked in a domain before from someone else.

Domain Driven Design advocates iterative refinement of the model via the process of knowledge crunching in which the domain designers consult the domain experts.

Indeed some engineers are particularly skilled at learning to navigate these quagmires, and guiding others through them. Over time, this symbiosis between architecture and skills can change the character of the organization itself, as swamp guides become more valuable than architects. As per CONWAY’S LAW [Coplien 1995], architects depart in futility, while engineers who have mastered the muddy details of the system they have built in their images prevail. [Foote & Yoder 1997] went so far as to observe that inscrutable code might, in fact, have a survival advantage over good code, by virtue of being difficult to comprehend and change. This advantage can extend to those programmers who can find their ways around such code.

This definitely amuses me, and goes a long way in explaining why the most “senior” technical people at some organizations don’t have a clue about the basics of software design. And are then actually revered for the mess they created by their fellow developers.

During the PROTOTYPE and EXPANSIONARY PHASES of a systems evolution, expedient, white-box inheritance-based code borrowing, and a relaxed approach to encapsulation are common. Later, as experience with the system accrues, the grain of the architectural domain becomes discernible, and more durable black-box components begin to emerge. In other words, it’s okay if the system looks at first like a BIG BALL OF MUD, at least until you know better.

I definitely disagree with this. It is never OK for your system to look like a big ball of mud, unless you are creating a prototype (that will not go into production) or something similar. Should everything be shiny and perfect from the start? No, of course not. That is not possible. And it is often fine to create a bit of a mess in places, as long as it is under control. (Managed technical debt.) By the time your code qualifies as a big ball of mud, the technical debt will no longer be under control, and you will have a serious problem. Going from big ball of mud to sane design is VERY difficult. Thus advising it is fine to start out with a Big Ball of Mud is rubbish.

They also can emerge as gradual maintenance and PIECEMEAL GROWTH impinges upon the structure of a mature system. Once a system is working, a good way to encourage its growth is to KEEP IT WORKING. One must take care that this gradual process of repair doesn’t erode its structure, or the result can be a BIG BALL OF MUD.

Yes, one should be vigilant and not let code rot go so far that a system turns into a big ball of mud. Continuous refacoring and following the so called “boy scout rule” is a big part of the answer to this.

The PROTOTYPE PHASE and EXPANSION PHASE patterns in [Foote & Opdyke 1995] both emphasize that a period of exploration and experimentation is often beneficial before making enduring architectural commitments.

I agree with this, though one should keep in mind that the goal of architecture is not to pin things down and make decisions that are hard to change later.

Time, or a lack thereof, is frequently the decisive force that drives programmers to write THROWAWAY CODE. Taking the time to write a proper, well thought out, well documented program might take more time that is available to solve a problem, or more time that the problem merits.

THROWAWAY code is often written as an alternative to reusing someone else’s more complex code. When the deadline looms, the certainty that you can produce a sloppy program that works yourself can outweigh the unknown cost of learning and mastering someone else’s library or framework.

There is nothing wrong with writing prototype code just to see how something works, or to try out if a particular approach is effective.

Prototypes should be treated as prototypes though, and not be deployed as a non-prototype. This might often be an easy thing to do for developers, though it is also clearly irresponsible. You are introducing a huge liability into your company or handing it over to your client. This liability is likely to cost them a lot of money as maintenance costs go through the roof, further development takes ages and as customers switch to less buggy software. This is compounded by the inability of your customer to know the real state of the code and realize what is bound to happen down the road.

Other forms of throwaway code include katas, a structured form of deliberate practice for developers, and “spikes”, little pieces of code you write typically to explore some an API before writing the real deal.

Keeping them on the air takes far less energy than rewriting them.

Here I like to add that rewriting a Big Ball of Mud is likely to be a bad idea. Unless the system in question is very small, doing so is probably not practical. An iterative process of improving the system by breaking things apart, brining things under test, etc, is the recommended approach when dealing with such legacy systems.

Master plans are often rigid, misguided and out of date. Users’ needs change with time.

If “master plans” refers to big design upfront, then yeah, I agree. As noted before, it is critical one things about design and architecture to prevent prematurely committing to decisions. Clients don’t know what they really need at the start, so an iterative approach is the logical thing to go with.

Successful software attracts a wider audience, which can, in turn, place a broader range of requirements on it. These new requirements can run against the grain of the original design. Nonetheless, they can frequently be addressed, but at the cost of cutting across the grain of existing architectural assumptions.

This again portrays architecture as a source of rigidity. It’s true that sometimes one finds out that a specific design in place is not going to work, and that it needs to be evolved or removed. To make this unavoidable process as easy as possible, one should write clean well designed code. If you have a tangled spaghetti mess, then it will be more rigid, and much less able to deal with changing requirements. Indeed, a good domain model tends to become more and more powerful over time, opening up many valuable opportunities for the business.

When designers are faced with a choice between building something elegant from the ground up, or undermining the architecture of the existing system to quickly address a problem, architecture usually loses. Indeed, this is a natural phase in a system’s evolution [Foote & Opdyke 1995]. This might be thought of as messy kitchen phase, during which pieces of the system are scattered across the counter, awaiting an eventual cleanup. The danger is that the clean up is never done. With real kitchens, the board of health will eventually intervene. With software, alas, there is seldom any corresponding agency to police such squalor. Uncontrolled growth can ultimately be a malignant force.

There certainly are moments in which parts of a system get a little messy. Sometimes it just makes pragmatic sense to just copy paste something, or to add a method where it does not really belong. Sometimes – not most of the time. And if you have a well though out system with high cohesion and low coupling, this will be less often the case than when it lacks perceivable design. One should be careful about where this is done. If it is not behind an abstraction, the rot can easily spread to the rest of the system. It is also wise to hold into account who is working on the codebase, in particular what their experience and attitude is. Inexperienced or badly disciplined people can easily not see or not care that they are creating binding to something that should be cleaned up first.

When constant cleaning and refactoring is applied, messes are kept small and under control. With less vigilance they can easily cause serious problems in the entire codebase.

Maintenance needs have accumulated, but an overhaul is unwise, since you might break the system. There may be times where taking a system down for a major overhaul can be justified, but usually, doing so is fraught with peril. Once the system is brought back up, it is difficult to tell which from among a large collection of modifications might have caused a new problem. Therefore, do what it takes to maintain the software and keep it going. Keep it working.

As already mentioned, I am of the opinion that big rewrites are indeed a bad idea. One should not let a system deteriorate into a Big Ball of Mud in any case. When dealing with a legacy system is required, a careful step by step approach is advised. First bring the relevant part under test, then slowly refactor it. A great book on the subject is Working Effectively with Legacy Code by  Michael Feathers.

If you can’t easily make a mess go away, at least cordon it off. This restricts the disorder to a fixed area, keeps it out of sight, and can set the stage for additional refactoring. One frequently constructs a FAÇADE [Gamma et. al. 1995] to put a congenial “pretty face” on the unpleasantness that is SWEPT UNDER THE RUG.

This is a very effective and pragmatic approach. To break dependencies and be able to work with clean interfaces I often create new interfaces (the language constructs) and then make trivial implementations that act as adapters for the old code.

While using this technique, I have gotten some complaints from people along the lines of “you are not really fixing the problem”. And indeed, the mess is not gone. This is merely the first step in doing so, which both decreases the damage done by the mess, and enables further clean-up.

The second was that the stadium’s attempt to provide a cheap, general solution to the problem of providing a forum for both baseball and football audiences compromised the needs of both. Might there be lessons for us about unexpected requirements and designing general components here?

There is definitely a lesson here yes. The problem however does not lie with generality. This example strikes me much more as a violation of the Single Responsibility Principle, as well as bad interface segregation. Code on different levels should do one thing, and do it well. This applies to functions, classes, components and systems.

This also makes me think of YAGNI. I often see people make things on a function level more generic then they need to be, by adding support for things “that might be needed”. There are of course cases where this makes sense, as some things will be very hard to change on later. However in many others there just is no need for handling some arbitrary whatever, or taking in an optional argument which is never passed. This then essentially ends up being hard to spot dead code, which often people do not realize is dead code, and then end up doing crazy students to support it.

Change: Even though software is a highly malleable medium, like Fulton County Stadium, new demands can, at times, cut across a system’s architectural assumptions in such a ways as to make accommodating them next to impossible. In such cases, a total rewrite might be the only answer.

Only if your architecture sucks balls. Changing requirements might cause things to be obsoleted, new things to be written and old things to be rearranged. If a total rewrite is needed, you probably want to think about firing your “architect”.

That concludes my remarks on this paper. Let’s go clean up some code 🙂