OOP file_get_contents

I’m happy to announce the immediate availability of FileFetcher 4.0.0.

FileFecther is a small PHP library that provides an OO way to retrieve the contents of files.

What’s OO about such an interface? You can inject an implementation of it into a class, thus avoiding that the class knows about the details of the implementation, and being able to choose which implementation you provide. Calling file_get_contents does not allow changing implementation as it is a procedural/static call making use of global state.

Library number 8234803417 that does this exact thing? Probably not. The philosophy behind this library is to provide a very basic interface (FileFetcher) that while insufficient for plenty of use cases, is ideal for a great many, in particular replacing procedural file_get_contents calls. The provided implementations are to facilitate testing and common generic tasks around the actual file fetching. You are encouraged to create your own core file fetching implementation in your codebase, presumably an adapter to a library that focuses on this task such as Guzzle.

So what is in it then? The library provides two trivial implementations of the FileFetcher interface at its heart:

  • SimpleFileFetcher: Adapter around file_get_contents
  • InMemoryFileFetcher: Adapter around an array provided to its constructor (construct with [] for a “throwing fetcher”)

It also provides a number of generic decorators:

Version 4.0.0 brings PHP7 features (scalar type hints \o/) and adds a few extra handy implementations. You can add the library to your composer.json (jeroen/file-fetcher) or look at the documentation on GitHub. You can also read about its inception in 2013.

PHP development with Docker

I’m the kind of dev that dreads configuring webservers and that rather does not have to put up with random ops stuff before being able to get work done. Docker is one of those things I’ve never looked into, cause clearly it’s evil annoying boring evil confusing evil ops stuff. Two of my colleagues just introduced me to a one-line docker command that kind off blew my mind.

Want to run tests for a project but don’t have PHP7 installed? Want to execute a custom Composer script that runs both these tests and the linters without having Composer installed? Don’t want to execute code you are not that familiar with on your machine that contains your private keys, etc? Assuming you have Docker installed, this command is all you need:

This command uses the Composer Docker image, as indicated by the first composer at the end of the command. After that you can specify whatever you want to execute, in this case composer ci, where ci is a custom composer Script. (If you want to know what the Docker image is doing behind the scenes, check its entry point file.)

This works without having PHP or Composer installed, and is very fast after the initial dependencies have been pulled. And each time you execute the command, the environment is destroyed, avoiding state leakage. You can create a composer alias in your .bash_aliases as follows, and then execute composer on your host just as you would do if it was actually installed (and running) there.

Of course you are not limited to running Composer commands, you can also invoke PHPUnit

or indeed any PHP code.

This one liner is not sufficient if you require additional dependencies, such as PHP extensions, databases or webservers. In those cases you probably want to create your own Docker file. Though to run the tests of most PHP libraries you should be good. I’ve now uninstalled my local Composer and PHP.

Why Every Single Argument of Dan North is Wrong

Alternative title: Dan North, the Straw Man That Put His Head in His Ass.

This blog post is a reply to Dan’s presentation Why Every Element of SOLID is Wrong. It is crammed full with straw man argumentation in which he misinterprets what the SOLID principles are about. After refuting each principle he proposes an alternative, typically a well-accepted non-SOLID principle that does not contradict SOLID. If you are not that familiar with the SOLID principles and cannot spot the bullshit in his presentation, this blog post is for you. The same goes if you enjoy bullshit being pointed out and broken down.

What follows are screenshots of select slides with comments on them underneath.

Dan starts by asking “What is a single responsibility anyway?”. Perhaps he should have figured that out before giving a presentation about how it is wrong.

A short (non-comprehensive) description of the principle: systems change for various different reasons. Perhaps a database expert changes the database schema for performance reasons, perhaps a User Interface person is reorganizing the layout of a web page, perhaps a developer changes business logic. What the Single Responsibility Principle says is that ideally changes for such disparate reasons do not affect the same code. If they did, different people would get in each other’s way. Possibly worse still, if the concerns are mixed together and you want to change some UI code, suddenly you need to deal with and thus understand, the business logic and database code.

How can we predict what is going to change? Clearly you can’t, and this is simply not needed to follow the Single Responsibility Principle or to get value out of it.

Write simple code… no shit. One of the best ways to write simple code is to separate concerns. You can be needlessly vague about it and simply state “write simple code”. I’m going to label this Dan North’s Pointlessly Vague Principle. Congratulations sir.

The idea behind the Open Closed Principle is not that complicated. To partially quote the first line on the Wikipedia Page (my emphasis):

… such an entity can allow its behaviour to be extended without modifying its source code.

In other words, when you ADD behavior, you should not have to change existing code. This is very nice, since you can add new functionality without having to rewrite old code. Contrast this to shotgun surgery, where to make an addition, you need to modify existing code at various places in the codebase.

In practice, you cannot gain full adherence to this principle, and you will have places where you will need to modify existing code. Full adherence to the principle is not the point. Like with all engineering principles, they are guidelines which live in a complex world of trade offs. Knowing these guidelines is very useful.

Clearly it’s a bad idea to leave code¬†in place that is wrong after a requirement change. That’s not what this principle is about.

Another very informative “simple code is a good thing” slide.

To be honest, I’m not entirely sure what Dan is getting at with his “is-a, has-a” vs “acts-like-a, can-be-used-as-a”. It does make me think of the Interface Segregation Principle, which, coincidentally, is the next principle he misinterprets.

The remainder of this slide is about the “favor compositions about inheritance” principle. This is really good advice, which has been well-accepted in professional circles for a long time. This principle is about code sharing, which is generally better done via composition than inheritance (the latter creates very strong coupling). In the last big application I wrote I have several 100s of classes and less than a handful inherit concrete code. Inheritance has a use which is completely different from code reuse: sub-typing and polymorphism. I won’t go into detail about those here, and will just say that this is at the core of what Object Orientation is about, and that even in the application I mentioned, this is used all over, making the Liskov Substitution Principle very relevant.

Here Dan is slamming the principle for being too obvious? Really?

“Design small , role-based classes”. Here Dan changed “interfaces” into “classes”. Which results in a line that makes me think of the Single Responsibility Principle. More importantly, there is a misunderstanding about the meaning of the word “interface” here. This principle is about the abstract concept of an interface, not the language construct that you find in some programming languages such as Java and PHP. A class forms an interface. This principle applies to OO languages that do not have an interface keyword such as Python and even to those that do not have a class keyword such as Lua.

If you follow the Interface Segregation Principle and create interfaces designed for specific clients, it becomes much easier to construct or invoke those clients. You won’t have to provide additional dependencies that your client does not actually care about. In addition, if you are doing something with those extra dependencies, you know this client will not be affected.

This is a bit bizarre. The definition Dan provides is good enough, even though it is incomplete, which can be excused by it being a slide. From the slide it’s clear that the Dependency Inversion Principle is about dependencies (who would have guessed) and coupling. The next slide is about how reuse is overrated. As we’ve already established, this is not what the principle is about.

As to the DIP leading to DI frameworks that you then depend on… this is like saying that if you eat food you might eat non-nutritious food which is not healthy. The fix here is to not eat non-nutritious food, it is not to reject food altogether. Remember the application I mentioned? It uses dependency injection all the way, without using any framework or magic. In fact, 95% of the code does not bind to the web-framework used due to adherence to the Dependency Inversion Principle. (Read more about this application)

That attitude explains a lot about the preceding slides.

Yeah, please do write simple code. The SOLID principles and many others can help you with this difficult task. There is a lot of hard-won knowledge in our industry and many problems are well understood. Frivolously rejecting that knowledge with “I know better” is an act of supreme arrogance and ignorance.

I do hope this is the category Dan falls into, because the alternative of purposefully misleading people for personal profit (attention via controversy) rustles my jimmies.

If you’re not familiar with the SOLID principles, I recommend you start by reading their associated Wikipedia pages. If you are like me, it will take you practice to truly understand the principles and their implications and to find out where they break down or should be superseded. Knowing about them and keeping an open mind is already a good start, which will likely lead you to many other interesting principles and practices.

My year in books

This is a short summary of my 2016 reading experience, following my 2015 Year In Books.

Such Stats

I’ve read 38 books, most of which novels, up from last years “44”, which included at least a dozen short stories. These totaled 10779 pages, up from 6xxx in both previous years.

Favorites FTW

Peter Watts

Peter Watts

My favorites for 2016 are without a doubt Blindsight and Echopraxia by Peter Watts. I got to know Watts, who is now in my short list of favorite authors, though the short story Malek, which I mentioned in my 2015 year in books. I can’t describe the books in a way that does them sufficient justice, but suffice to say that they explore a good number of interesting questions and concepts. Exactly what you (or at least I) want from Hard (character) Science Fiction.

You can haz a video with Peter Watts reading a short excerpt from Echopraxia at the Canada Privacy Symposium. You can also get a feel from these books based on their quotes: Blindsight quotes, Echopraxia quotes. Or simply read Malek ūüôā Though Malek does not have OP space vampires that are seriously OP.

My favorite non-fiction book for the year is Thinking Fast and Slow, which is crammed full with information useful to anyone who wants to better understand how their (or others) mind works, where it tends to go off the rails, and what can be done about those cases. The number two slot goes to Domain-Driven Design Distilled, which unlike the red and blue books is actually somewhat readable.

My favorite short story was Crystal Nights by Greg Egan. It explores the creation of something akin to strong AI via simulated evolution, including various ethical and moral questions this raises. It does not start with a tetravalent graph more like diamond than graphite, but you can’t have everything now can you?

Dat Distribution

All books I read, except for 2 short stories, where published earliest in the year 2000.

Lying Lydia <3

Lydia after audiobook hax

For 2016 I set a reading goal of 21 books, to beat Lydia‘s 20. She ended up reading 44, so beat me. However, she used OP audio book hax. I attempted this as well but did not manage to catch up, even though I racked up 11.5 days worth of listening time. This year I will win of course. (Plan B)

Series, Seriously

As you might be able to deduce from the read in vs published in chart, I went through a number of series. I read all The Expanse novels after watching the first season of the television series based on them. Similarly I read the Old Man’s War novels except the last one, which I have not finished yet. Both series are fun, though they might not live up to the expectations of fellow Hard Science Fiction master race members, as they are geared more towards plain SF plebs. Finally I started with the Revelation Space books by Alastair Reynolds after reading House of Suns, which has been on my reading list since forever. This was also high time since clearly I need to have read all the books from authors that have been in a Google hangout with Special Circumstances agent Ian M. Banks.

Simple is not easy

Simplicity is possibly the single most important thing on the technical side of software development. It is crucial to keep development costs down and external quality high. This blog post is about why simplicity is not the same thing as easiness, and common misconceptions around these terms.

Simple is not easy

Simple is the opposite of complex. Both are a measure of complexity, which arises from intertwining things such as concepts and responsibilities. Complexity is objective, and certain aspects of it, such as Cyclomatic Complexity, can be measured with many code quality tools.

Easy is the opposite of hard. Both are a measure of effort, which unlike complexity, is subjective and highly dependent on the context. For instance, it can be quite hard to rename a method in a large codebase if you do not have a tool that allows doing so safely. Similarly, it can be quite hard to understand an OO project if you are not familiar with OO.

Achieving simplicity is hard

I’m sorry I wrote you such a long letter; I didn’t have time to write a short one.

Blaise Pascal

Finding simple solutions, or brief ways to express something clearly, is harder than finding something that works but is more complex. In other words, achieving simplicity is hard. This is unfortunate, since dealing with complexity is so hard.

Since in recent decades the cost of software maintenance has become much greater than the cost of its creation, it makes sense to make maintenance as easy as we can. This means avoiding as much complexity as we can during the creation of the software, which is a hard task. The cost of the complexity does not suddenly appear once the software goes into an official maintenance phase, it is there on day 2, when you need to deal with code from day 1.

Good design requires thought

Questions about whether design is necessary or affordable are quite beside the point: design is inevitable. The alternative to good design is bad design, not no design at all.

— Vaughn Vernon in Domain-Driven Design Distilled

Some people in the field conflate simple and easy in a particularly unfortunate manner. They reason that if you need to think a lot about how to create a design, it will be hard to understand the design. Clearly, thinking a lot about a design does not guarantee that it is good and minimizes complexity. You can do a good job and create something simple or you can overengineer. There is however one guarantee that can be made based on the effort spent: for non-trivial problems, if little effort was spent (by going for the easy approach), the solution is going to be more complex than it could have been.

One high-profile case of such conflation can be found in the principles behind the Agile Manifesto. While I don’t fully agree with some of the other principles, this is the only one I strongly disagree with (unless you remove the middle part). Yay Software Craftsmanship manifesto.

Simplicity–the art of maximizing the amount of work not done–is essential

Principles behind the Agile Manifesto

Similarly we should be careful to not confuse the ease of understanding a system with the ease of understanding how or why it was created the way it was. The latter, while still easier than the actual task of creating a simple solution, is still going to be harder than working with said simple solution, especially for those that lack the skills used in its creation.

Again, I found a relatively high-profile example of such confusion:

If the implementation is hard to explain, it’s a bad idea. If the implementation is easy to explain, it may be a good idea.

The Zen of Python

I think this is just wrong.

You can throw all books in a library onto a big pile and then claim it’s easy to explain where a particular book is – in the pile – though actually finding the book is a bigger challenge. It’s true that you need more skills to use a well-organized library effectively than you need to go through a pile of books randomly. You need to know the alphabet, be familiar with the concept of genres, etc. Clearly an organized library is easier to deal with than our pile of books for anyone that has those skills.

It is also true that sometimes it does not make sense to invest in the skill that allows working more effectively, and that sometimes you simply cannot find people with the desired skills. This is where the real bottleneck is: learning. Most of the time these investments are worth it, as they allow you to work both faster and better from that point on.

See also

In my reply to the Big Ball of Mud paper I also talk about how achieving simplicity requires effort.

The main source of inspiration that led me to this blog post is Rich Hickeys 2012 Rails Conf keynote, where he starts by differentiating simple and easy. If you don’t know who Rich Hickey is (he created Clojure), go watch all his talks on YouTube now, well worth the time. (I don’t agree with everything he says but it tends to be interesting regardless.) You can start with this keynote, which goes into more detail than this blog post and adds a bunch of extra goodies on top. <3 Rich

Following the reasoning in this blog post, you cannot trade software quality for lower cost. You can read more about this in the Tradable Quality Hypothesis and Design Stamina Hypothesis articles.

There is another blog post titled Simple is not easy, which as far as I can tell, differentiates the terms without regard to software development.

PHP 7.1 is awesome

PHP 7.1 has been released, bringing some features I was eagerly anticipating and some surprises that had gone under my radar.

New iterable pseudo-type

This is the feature I’m most exited about, perhaps because I had no clue it was in the works. In short, iterable allows for type hinting in functions that just loop though their parameters value without restricting the type to either array or Traversable, or not having any type hint at all. This partially solves one of the points I raised in my Missing in PHP 7 series post Collections.

Nullable types

This feature I also already addressed in Missing in PHP 7 Nullable return types. What somehow escaped my attention is that PHP 7.1 comes not just with nullable return types, but also new syntax for nullable parameters.

Intent revealing

Other new features that I’m excited about are the Void Return Type and Class Constant Visibility Modifiers. Both of these help with revealing the authors intent, reduce the need for comments and make it easier to catch bugs.

A big thank you to the PHP contributors that made these things possible and keep pushing the language forwards.

For a full list of new features, see the PHP 7.1 release announcement.

Final Rush Pro 5

I’m happy to announce the immediate availability of the Final Rush Pro 5 map for Supreme Commander Forged Alliance Forever. During the past few weeks I’ve been reworking version 4 of the map, and have added many new features, fixed some bugs and improved balance¬†in the team vs team modes.

final-rush-pro-5-largeVersion 4 has a Game Mode setting with¬†Paragon Wars, Survival¬†Versus, Normal and 4 different difficulties of Survival Classic. Version 5¬†has a dedicated Survival Difficulty setting, so it’s not possible to change the difficulty for Survival Versus. Furthermore, a ton of new lobby options have been added that allow changing the delay of the various tech level waves, their frequency, how quickly the units should gain health, how often random events should happen, etc. This gives you much greater control over the difficulty in all survival modes, and¬†adds a lot of replayability by enabling alteration of¬†the nature of the challenge the map provides.

The team vs team modes, Survival Versus and Paragon Wars, both had some serious balance issues. In Survival Versus,¬†random events and bounty hunters would attack a random player. While that works fine in Survival Classic, in the modes with two teams, it makes things unfairly harder for a single team. I’ve observed this¬†several times, where suddenly one team gets whacked by a few random events even though they where doing better than the other team. In this new version, random events and bounty hunters target a random player from each team.

In Paragon Wars, the issue was that the civilian base protecting the Paragon Activator would be randomly constructed. Within a certain bounding box, the Paragon Activator and a bunch of defensive structures would spawn. This means the Activator could be at the far side of the bounding box, and the defenses mostly on the other side, making it a lot easier for one team to approach the Activator than for the other. Now the base is entirely symmetrical (using a circular layout) and spawns at the exact center of the map.

Version 4 had 6 working lobby options, while version 5 has 23. Besides the new difficulty related options, it is now possible to turn off aspects of the game. For instance, you can now completely disable random events, MMLs and “aggression tracking” (punishing of fast tech, high eco and aggressive ACU placement).

Some significant bugs were fixed, most notably Paragon Wars not working correctly when playing with less than 8 people. You can now play it 2v2, 4v1 or however else you see fit. Another thing that was fixed is the Auto Reclaim option, so there is no more need for the Vampire mod. Unlike the mod, this option allows you to specify how much resources you should get, all the way from none, to over 9000% (that is an actual value you can select yes).

Another mod that is no longer needed is the FinalRushPro3 itself. You now just need the map, and it will function properly without any mods. Due to removing integration with the FinalRushPro3 mod, the special UI is no longer present. In a lot of cases it did not work properly anyway and just took up space, and I’ve not gotten around to making a better replacement yet.

You can download the map as a zip. Unfortunately the FAF map vault infrastructure is rather broken, so I’ve not been able to upload the map to the vault. Hopefully this gets resolved soon.

For a full list of changes, see the readme. I got a number of ideas for future enhancements, which might become part of a version 5.1, 5.2, etc. Feel free to submit your own feature requests!

If you’re interested in how I¬†went about creating version 5 from a technical point of view, see my post on¬†Refactoring horrible Lua code.

Implementing the Clean Architecture

Both Domain Driven Design and architectures such as the Clean Architecture and Hexagonal are often talked about. It’s hard to go to a conference on software development and not run into one of these topics. However it can be challenging to find good real-world examples. In this blog post I’ll introduce you to an application following the Clean Architecture and incorporating a lot of DDD patterns. The focus is on the key concepts of the Clean Architecture, and the most important lessons we learned implementing it.

The application

The real-world application we’ll be looking at is the Wikimedia Deutschland fundraising software. It is a PHP application written in 2016, replacing an older legacy system. While the application is written in PHP, the patterns followed are by and large language agnostic, and are thus relevant for anyone writing object orientated software.

I’ve outlined what the application is and why we replaced the legacy system in a blog post titled Rewriting the Wikimedia Deutschland fundraising. I recommend you have a look at least at its “The application” section, as it will give you a rough idea of the domain we’re dealing with.

A family of architectures

Architectures such as Hexagonal and the Clean Architecture are very similar. At their core, they are about separation of concerns. They decouple from mechanisms such as persistence and used frameworks and instead focus on the domain and high level policies. A nice short read on this topic is Unclebob’s blog post on the Clean Architecture. Another recommended post is Hexagonal != Layers, which explains that how just creating a bunch of layers is missing the point.

The Clean Architecture

cleanarchitecture

The arrows crossing the circle boundaries represent the allowed direction of dependencies. At the core is the domain. “Entities” here means Entities such as in Domain Driven Design, not to be confused by ORM entities. The domain is surrounded by a layer containing use cases (sometimes called interactors) that form an API that the outside world, such as a controller, can use to interact with the domain. The use cases themselves only bind to the domain and certain cross cutting concerns such as logging, and are devoid of binding to the web, the database and the framework.

In this example you can see how the UC for canceling a donation gets a request object, does some stuff, and then returns a response object. Both the request and response objects are specific to this UC and lack both domain and presentation mechanism binding. The stuff that is actually done is mainly interaction with the domain through Entities, Aggregates and Repositories.

This is a typical way of invoking a UC. The framework we’re using is Silex, which calls the function we provided when the route matches. Inside this function we construct our framework agnostic request model and invoke the UC with it. Then we hand over the response model to a presenter to create the appropriate HTML or other such format. This is all the framework bound code we have for canceling donations. Even the presenter does not bind to the framework, though it does depend on Twig.

If you are familiar with Silex, you might already have noticed that we’re constructing our UC different than you might expect. We decided to go with our own top level factory, rather than using the dependency injection mechanism provided by Silex: Pimple. Our factory internally actually uses Pimple, though this is not visible from the outside. With this approach we gain a nicer access to service construction, since we can have a getLogger() method with LoggerInterface return type hint, rather than accessing $app['logger'] or some such, which forces us to bind to a string and leaves us without type hint.

use-case-list

This use case based approach makes it very easy to see what our system is capable off at a glance.

use-case-directory

And it makes it very easy to find where certain behavior is located, or to figure out where new behavior should be put.

All code in our src/ directory is framework independent, and all code binding to specific persistence mechanisms resides in src/DataAccess. The only framework bound code we have are our very slim “route handlers” (kinda like controllers), the web entry point and the Silex bootstrap.

For more information on The Clean Architecture I can recommend Robert C Martins NDC 2013 talk. If you watch it, you will hopefully notice how we slightly deviated from the UseCase structure like he presented it. This is due to PHP being an interpreted language, and thus does not need certain interfaces that are beneficial in compiled languages.

Lesson learned: bounded contexts

By and large we started with the donation related use cases and then moved on to the membership application related ones. At some point, we had a Donation entity/aggregate in our domain, and a bunch of value objects that it contained.

As you can see, one of those value objects is PersonalInfo. Then we needed to add an entity for membership applications. Like donations, membership applications require a name, a physical address and an email address. Hence it was tempting to reuse our existing PersonalInfo class.

Luckily a complication made us realize that going down this path was not a good idea. This complication was that membership applications also have a phone number and an optional date of birth. We could have forced code sharing by doing something hacky like adding new optional fields to PersonalInfo, or by creating a MorePersonalInfo derivative.

Approaches such as these, while resulting in some code sharing, also result in creating binding between Donation and MembershipApplication. That’s not good, as those two entities don’t have anything to do with each other. Sharing what happens to be the same at present is simply not a good idea. Just imagine that we did not have the phone number and date of birth in our first version, and then needed to add them. We’d either end up with one of those hacky solutions, or need to refactor code that has nothing to do (apart from the bad coupling) with what we want to modify.

What we did is renaming PersonalInfo to Donor and introduce a new Applicant class.

These names are better since they are about the domain (see ubiquitous language) rather than some technical terms we needed to come up with.

Amongst other things, this rename made us realize that we where missing some explicit boundaries in our application. The donation related code and the membership application related code where mostly independent from each other, and we agreed this was a good thing. To make it more clear that this is the case and highlight violations of that rule, we decided to reorganize our code to follow the strategic DDD pattern of Bounded Contexts.

contexts-directory

This mainly consisted out of reorganizing our directory and namespace structure, and a few instances of splitting some code that should not have been bound together.

Based on this we created a new diagram to reflect the high level structure of our application. This diagram, and a version with just one context, are available for use under CC-0.

Clean Architecture + Bounded Contexts

Lesson learned: validation

A big question we had near the start of our project was where to put validation code. Do we put it in the UCs, or in the controller-like code that calls the UCs?

One of the first UCs we added was the one for adding donations. This one has a request model that contains a lot of information, including the donor’s name, their email, their address, the payment method, payment amount, payment interval, etc. In our domain we had several value objects for representing parts of donations, such as the donor or the payment information.

Since we did not want to have one object with two dozen fields, and did not want to duplicate code, we used the value objects from our domain in the request model.

If you’ve been paying attention, you’ll have realized that this approach violates one of the earlier outlined rules: nothing outside the UC layer is supposed to access anything from the domain. If value objects from the domain are exposed to whatever constructs the request model, i.e. a controller, this rule is violated. Loose from the this abstract objection, we got into real trouble by doing this.

Since we started doing validation in our UCs, this usage of objects from the domain in the request necessarily forced those objects to allow invalid values. For instance, if we’re validating the validity of an email address in the UC (or a service used by the UC), then the request model cannot use an EmailAddress which does sanity checks in its constructor.

We thus refactored our code to avoid using any of our domain objects in the request models (and response models), so that those objects could contain basic safeguards.

We made a similar change by altering which objects get validated. At the start of our project we created a number of validators that worked on objects from the domain. For instance a DonationValidator working with the Donation Entity. This DonationValidator would then be used by the AddDonationUseCase. This is not a good idea, since the validation that needs to happen depends on the context. In the AddDonationUseCase certain restrictions apply that don’t always hold for donations. Hence having a general looking DonationValidator is misleading. What we ended up doing instead is having validation code specific to the UCs, be it as part of the UC, or when too complex, a separate validation service in the same namespace. In both cases the validation code would work on the request model, i.e. AddDonationRequest, and not bind to the domain.

After learning these two lessons, we had a nice approach for policy-based validation. That’s not all validation that needs to be done though. For instance, if you get a number via a web request, the framework will typically give it to you as a string, which might thus not be an actual number. As the request model is supposed to be presentation mechanism agnostic, certain validation, conversion and error handling needs to happen before constructing the request model and invoking the UC. This means that often you will have validation in two places: policy based validation in the UC, and presentation specific validation in your controllers or equivalent code. If you have a string to integer conversion, number parsing or something internationalization specific, in your UC, you almost certainly messed up.

Closing notes

You can find the Wikimedia Deutschland fundraising application on GitHub and see it running in production. Unfortunately the code of the old application is not available for comparison, as it is not public. If you have questions, you can leave a comment, or contact me. If you find an issue or want to contribute, you can create a pull request. If you are looking for my presentation on this topic, view the slides.

As a team we learned a lot during this project, and we set a number of firsts at Wikimedia Deutschland, or the wider Wikimedia movement for that matter. The new codebase is the cleanest non-trivial application we have, or that I know of in PHP world. It is fully tested, contains less than 5% framework bound code, has strong strategic separation between both contexts and layers, has roughly 5% data access specific code and has tests that can be run without any real setup. (I might write another blog post on how we designed our tests and testing environment.)

Many thanks for my colleagues Kai Nissen and Gabriel Birke for being pretty awesome during our rewrite project.

Rewriting the Wikimedia Deutschland fundraising

Last year we rewrote the Wikimedia Deutschland fundraising software. In this blog post I’ll give you an idea of what this software does, why we rewrote it and the outcome of this rewrite.

The application

Our fundraising software is a homegrown PHP application. Its primary functions are donations and membership applications. It supports multiple payment methods, needs to interact with payment providers, supports submission and listing of comments and exchanges data with another homegrown PHP application that does analysis, reporting and moderation.

fun-app

The codebase was originally written in a procedural style, with most code residing directly in files (i.e., not even in a global function). There was very little design and completely separate concerns such as presentation and data access were mixed together. As you can probably imagine, this code was highly complex and very hard to understand or change. There was unused code, broken code, features that might not be needed anymore, and mysterious parts that even our guru that maintained the codebase during the last few years did not know what they did. This mess, combined with the complete lack of a specification and units tests, made development of new features extremely slow and error prone.

derp-code

Why we rewrote

During the last year of the old application’s lifetime, we did refactor some parts and tried adding tests. In doing so, we figured that rewriting from scratch would be easier than trying to make incremental changes. We could start with a fresh design, add only the features we really need, and perhaps borrow some reusable code from the less horrible parts of the old application.

They did it by making the single worst strategic mistake that any software company can make: […] rewrite the code from scratch. —Joel Spolsky

We were aware of the risks involved with doing a rewrite of this nature and that often such rewrites fail. One big reason we did not decide against rewriting is that we had a time period of 9 months during which no new features needed to be developed. This meant we could freeze the old application and avoid parallel development, resulting in some kind of feature race. Additionally, we set some constraints: we would only rewrite this application and leave the analysis and moderation application alone, and we would do a pure rewrite, avoiding the addition of new features into the new application until the rewrite was done.

How we got started

Since we had no specification, we tried visualizing the conceptual components of the old application, and then identified the “commands” they received from the outside world.

old-fun-code-diagram

Creating the new software

After some consideration, we decided to try out The Clean Architecture as a high level structure for the new application. For technical details on what we did and the lessons we learned, see Implementing the Clean Architecture.

The result

With a team of 3 people, we took about 8 months to finish the rewrite successfully. Our codebase is now clean and much, much easier to understand and work with. It took us over two man years to do this clean up, and presumably an even greater amount of time was wasted in dealing with the old application in the first place. This goes to show that the cost of not working towards technical excellence is very high.

We’re very happy with the result. For us, the team that wrote it, it’s easy to understand, and the same seems to be true for other people based on feedback we got from our colleagues in other teams. We have tests for pretty much all functionality, so can refactor and add new functionality with confidence. So far we’ve encountered very few bugs, with most issues arising from us forgetting to add minor but important features to the new application, or misunderstanding what the behavior should be and then correctly implementing the wrong thing. This of course has more to do with the old codebase than with the new one. We now have a solid platform upon which we can quickly build new functionality or improve what we already have.

The new application is the first Wikimedia (Deutschland) deployed on, and wrote in, PHP7. Even though not an explicit goal of the rewrite, the new application has ended up with better performance than the old one, in part due to the PHP7 usage.

Near the end of the rewrite we got an external review performed by thePHPcc, during which Sebastian Bergmann, who you might know from PHPUnit fame, looked for code quality issues in the new codebase. The general result of that was a thumbs up, which we took the creative license to translate into this totally non-Sebastian approved image:

You can see our new application in action in production. I recommend you try it out by donating ūüôā

Technical statistics

These are some statistics for fun. They have been compiled after we did our rewrite, and where not used during development at all. As with most software metrics, they should be taken with a grain of salt.

In this visualization, each dot represents a single file. The size represents the Cyclomatic complexity while the color represents the Maintainability Index. The complexity is scored relative to the highest complexity in the project, which in the old application was 266 and in the new one is 30. This means that the red on the right (the new application) is a lot less problematic than the red on the left. (This visualization was created with PhpMetrics.)

fun-complexity

Global access in various Wikimedia codebases (lower is better). The rightmost is the old version of the fundraising application, and the one next to it is the new one. The new one has no global access whatsoever. LLOC stands for Logical Lines of Code. You can see the numbers in this public spreadsheet.

global-access-stats

Static method calls, often a big source of global state access, where omitted, since the tools used count many false positives (i.e. alternative constructors).

The differences between the projects can be made more apparent by visualizing them in another way. Here you have the number of lines per global access, represented on a logarithmic scale.

lloc-per-global

The following stats have been obtained using phploc, which counts namespace declarations and imports as LLOC. This means that for the new application some of the numbers are very slightly inflated.

  • Average class LLOC: 31 => 21
  • Average method LLOC: 4 => 3
  • Cyclomatic Complexity / LLOC : 0.39 => 0.10
  • Cyclomatic Complexity / Number of Methods: 2.67 => 1.32
  • Global functions: 58 => 0
  • Total LLOC: 5517 => 10187
  • Test LLOC: 979 => 5516
  • Production LLOC: 4538 => 4671
  • Classes 105 => 366
  • Namespaces: 14 => 105

This is another visualization created with PhpMetrics that shows the dependencies between classes. Dependencies are static calls (including to the constructor), implementation and extension and type hinting. The applications top-level factory can be seen at the top right of the visualization.

fun-dependencies

Maps 4.0.0-RC1 released!

I’m happy to announce the first release candidate for Maps 4.0. Maps is a MediaWiki extension to work with and visualize geographical information. Maps 4.0 is the first major release of the extension since January 2014, and it brings a ton of “new” functionality.

First off, this blog post is about a release candidate, meant to gather feedback and not suitable for usage in production. The 4.0 release itself will be made one week from now if no issues are found.

Almost all features from the Semantic Maps extension got merged into Maps, with the notable omission of the form input, which now resides in Yaron Korens Page Forms extension. I realized that spreading out the functionality over both Maps and Semantic Maps was hindering development and making things more difficult for the users than needed. Hence Semantic Maps is now discontinued, with Maps containing the coordinate datetype, the map result formats for each mapping service, the KML export format and distance query support. All these features will automatically enable themselves when you have Semantic MediaWiki installed, and can be explicitly turned off with a new egMapsDisableSmwIntegration setting.

The other big change is that, after 7 years of no change, the default mapping service was changed from Google Maps to Leaflet. The reason for this alteration is that Google recently required obtaining and specifying an API key for its maps to work on new websites. This would leave some users confused when they first installed the Maps extension and got a non functioning map, even though the API key is mentioned in the installation instructions. Google Maps is of course still supported, and you can make it the default again on your wiki via the egMapsDefaultService setting.

Another noteworthy change is the addition of the egMapsDisableExtension setting, which allows for disabling the extension via configuration, even when it is installed. This has often been requested by those running wiki farms.

For a full list of changes, see the release notes. Also check out the new features in Maps 3.8, Maps 3.7 and Maps 3.6 if you have not done so yet.

Upgrading

Since this is a major release, please beware of the breaking changes, and that you might need to change configuration or things inside of your wiki. Update your mediawiki/maps version in composer.json to ~4.0@rc (or ~4.0 once the real release has happened) and run composer update.

Beware that as of Maps 3.6, you need MediaWiki 1.23 or later, and PHP 5.5 or later. If you choose to remain with an older version of PHP or MediaWiki, use Maps 3.5. Maps works with the latest stable versions of both MediaWiki and PHP, which are the versions I recommend you use.