Implementing the Clean Architecture

Both Domain Driven Design and architectures such as the Clean Architecture and Hexagonal are often talked about. It’s hard to go to a conference on software development and not run into one of these topics. However it can be challenging to find good real-world examples. In this blog post I’ll introduce you to an application following the Clean Architecture and incorporating a lot of DDD patterns. The focus is on the key concepts of the Clean Architecture, and the most important lessons we learned implementing it.

The application

The real-world application we’ll be looking at is the Wikimedia Deutschland fundraising software. It is a PHP application written in 2016, replacing an older legacy system. While the application is written in PHP, the patterns followed are by and large language agnostic, and are thus relevant for anyone writing object orientated software.

I’ve outlined what the application is and why we replaced the legacy system in a blog post titled Rewriting the Wikimedia Deutschland fundraising. I recommend you have a look at least at its “The application” section, as it will give you a rough idea of the domain we’re dealing with.

A family of architectures

Architectures such as Hexagonal and the Clean Architecture are very similar. At their core, they are about separation of concerns. They decouple from mechanisms such as persistence and used frameworks and instead focus on the domain and high level policies. A nice short read on this topic is Unclebob’s blog post on the Clean Architecture. Another recommended post is Hexagonal != Layers, which explains that how just creating a bunch of layers is missing the point.

The Clean Architecture

cleanarchitecture

The arrows crossing the circle boundaries represent the allowed direction of dependencies. At the core is the domain. “Entities” here means Entities such as in Domain Driven Design, not to be confused by ORM entities. The domain is surrounded by a layer containing use cases (sometimes called interactors) that form an API that the outside world, such as a controller, can use to interact with the domain. The use cases themselves only bind to the domain and certain cross cutting concerns such as logging, and are devoid of binding to the web, the database and the framework.

In this example you can see how the UC for canceling a donation gets a request object, does some stuff, and then returns a response object. Both the request and response objects are specific to this UC and lack both domain and presentation mechanism binding. The stuff that is actually done is mainly interaction with the domain through Entities, Aggregates and Repositories.

This is a typical way of invoking a UC. The framework we’re using is Silex, which calls the function we provided when the route matches. Inside this function we construct our framework agnostic request model and invoke the UC with it. Then we hand over the response model to a presenter to create the appropriate HTML or other such format. This is all the framework bound code we have for canceling donations. Even the presenter does not bind to the framework, though it does depend on Twig.

If you are familiar with Silex, you might already have noticed that we’re constructing our UC different than you might expect. We decided to go with our own top level factory, rather than using the dependency injection mechanism provided by Silex: Pimple. Our factory internally actually uses Pimple, though this is not visible from the outside. With this approach we gain a nicer access to service construction, since we can have a getLogger() method with LoggerInterface return type hint, rather than accessing $app['logger'] or some such, which forces us to bind to a string and leaves us without type hint.

use-case-list

This use case based approach makes it very easy to see what our system is capable off at a glance.

use-case-directory

And it makes it very easy to find where certain behavior is located, or to figure out where new behavior should be put.

All code in our src/ directory is framework independent, and all code binding to specific persistence mechanisms resides in src/DataAccess. The only framework bound code we have are our very slim “route handlers” (kinda like controllers), the web entry point and the Silex bootstrap.

For more information on The Clean Architecture I can recommend Robert C Martins NDC 2013 talk. If you watch it, you will hopefully notice how we slightly deviated from the UseCase structure like he presented it. This is due to PHP being an interpreted language, and thus does not need certain interfaces that are beneficial in compiled languages.

Lesson learned: bounded contexts

By and large we started with the donation related use cases and then moved on to the membership application related ones. At some point, we had a Donation entity/aggregate in our domain, and a bunch of value objects that it contained.

As you can see, one of those value objects is PersonalInfo. Then we needed to add an entity for membership applications. Like donations, membership applications require a name, a physical address and an email address. Hence it was tempting to reuse our existing PersonalInfo class.

Luckily a complication made us realize that going down this path was not a good idea. This complication was that membership applications also have a phone number and an optional date of birth. We could have forced code sharing by doing something hacky like adding new optional fields to PersonalInfo, or by creating a MorePersonalInfo derivative.

Approaches such as these, while resulting in some code sharing, also result in creating binding between Donation and MembershipApplication. That’s not good, as those two entities don’t have anything to do with each other. Sharing what happens to be the same at present is simply not a good idea. Just imagine that we did not have the phone number and date of birth in our first version, and then needed to add them. We’d either end up with one of those hacky solutions, or need to refactor code that has nothing to do (apart from the bad coupling) with what we want to modify.

What we did is renaming PersonalInfo to Donor and introduce a new Applicant class.

These names are better since they are about the domain (see ubiquitous language) rather than some technical terms we needed to come up with.

Amongst other things, this rename made us realize that we where missing some explicit boundaries in our application. The donation related code and the membership application related code where mostly independent from each other, and we agreed this was a good thing. To make it more clear that this is the case and highlight violations of that rule, we decided to reorganize our code to follow the strategic DDD pattern of Bounded Contexts.

contexts-directory

This mainly consisted out of reorganizing our directory and namespace structure, and a few instances of splitting some code that should not have been bound together.

Based on this we created a new diagram to reflect the high level structure of our application. This diagram, and a version with just one context, are available for use under CC-0.

Clean Architecture + Bounded Contexts

Lesson learned: validation

A big question we had near the start of our project was where to put validation code. Do we put it in the UCs, or in the controller-like code that calls the UCs?

One of the first UCs we added was the one for adding donations. This one has a request model that contains a lot of information, including the donor’s name, their email, their address, the payment method, payment amount, payment interval, etc. In our domain we had several value objects for representing parts of donations, such as the donor or the payment information.

Since we did not want to have one object with two dozen fields, and did not want to duplicate code, we used the value objects from our domain in the request model.

If you’ve been paying attention, you’ll have realized that this approach violates one of the earlier outlined rules: nothing outside the UC layer is supposed to access anything from the domain. If value objects from the domain are exposed to whatever constructs the request model, i.e. a controller, this rule is violated. Loose from the this abstract objection, we got into real trouble by doing this.

Since we started doing validation in our UCs, this usage of objects from the domain in the request necessarily forced those objects to allow invalid values. For instance, if we’re validating the validity of an email address in the UC (or a service used by the UC), then the request model cannot use an EmailAddress which does sanity checks in its constructor.

We thus refactored our code to avoid using any of our domain objects in the request models (and response models), so that those objects could contain basic safeguards.

We made a similar change by altering which objects get validated. At the start of our project we created a number of validators that worked on objects from the domain. For instance a DonationValidator working with the Donation Entity. This DonationValidator would then be used by the AddDonationUseCase. This is not a good idea, since the validation that needs to happen depends on the context. In the AddDonationUseCase certain restrictions apply that don’t always hold for donations. Hence having a general looking DonationValidator is misleading. What we ended up doing instead is having validation code specific to the UCs, be it as part of the UC, or when too complex, a separate validation service in the same namespace. In both cases the validation code would work on the request model, i.e. AddDonationRequest, and not bind to the domain.

After learning these two lessons, we had a nice approach for policy-based validation. That’s not all validation that needs to be done though. For instance, if you get a number via a web request, the framework will typically give it to you as a string, which might thus not be an actual number. As the request model is supposed to be presentation mechanism agnostic, certain validation, conversion and error handling needs to happen before constructing the request model and invoking the UC. This means that often you will have validation in two places: policy based validation in the UC, and presentation specific validation in your controllers or equivalent code. If you have a string to integer conversion, number parsing or something internationalization specific, in your UC, you almost certainly messed up.

Closing notes

You can find the Wikimedia Deutschland fundraising application on GitHub and see it running in production. Unfortunately the code of the old application is not available for comparison, as it is not public. If you have questions, you can leave a comment, or contact me. If you find an issue or want to contribute, you can create a pull request. If you are looking for my presentation on this topic, view the slides.

As a team we learned a lot during this project, and we set a number of firsts at Wikimedia Deutschland, or the wider Wikimedia movement for that matter. The new codebase is the cleanest non-trivial application we have, or that I know of in PHP world. It is fully tested, contains less than 5% framework bound code, has strong strategic separation between both contexts and layers, has roughly 5% data access specific code and has tests that can be run without any real setup. (I might write another blog post on how we designed our tests and testing environment.)

Many thanks for my colleagues Kai Nissen and Gabriel Birke for being pretty awesome during our rewrite project.

18 Comments on “Implementing the Clean Architecture

  1. Pingback: Rewriting the Wikimedia Deutschland fundraising – Entropy Wins

  2. Pingback: Missing in PHP7: Value objects – Entropy Wins

    • Hey Dmitriy, that’s a good question.

      We did not move the presenters to src, but moved the contexts out of it. Still, the question of why we did not put the presenters together with the associated context remains valid.

      We decided to limit our contexts to things bound to the domain model. That means the outermost “layer” of the contexts are the usecases, and that they remain fully framework independent.

      From the literature I figured that what you suggest is the more common approach. Indeed we discussed reorganizing our codebase as such, but no compelling arguments where found that justify the effort.

      I raised this question on the DDD CQRS list when we where talking about this in out team. Unfortunately we did not get any reply. https://groups.google.com/forum/#!topic/dddcqrs/tWc6iZGhvUU

  3. Pingback: Why Every Single Argument of Dan North is Wrong – Entropy Wins

  4. This is a fantastic writeup. It’s very nice to see that the Silex actions are almost point-for-point examples of Action-Domain-Responder . (Some of the actions might do with trivial refactorings to bring them even more in-line with the pattern, but overall it’s very well done.)

  5. Pingback: WikiMedia, Clean Architecture, and ADR | Paul M. Jones

  6. A longer reply here: http://paul-m-jones.com/archives/6535

    An excerpt:

    “The only place where Jeroen’s implementation deviates from ADR is that the Action code builds the presentation itself, instead of handing off to a Responder. (This may be a result of adhering to the idioms and expectations specific to Silex.) Because the rest of the implementation is so well done, refactoring to a separated presentation in the form of a Responder is a straightforward exercise. Let’s see what that might look like.”

  7. Pingback: PHP Annotated Monthly – March 2017 | PhpStorm Blog

  8. Pingback: WikiMedia, arquitetura limpa e ADR – Agency Major

  9. Pingback: WikiMedia, arquitetura limpa e ADR – Tudo sobre PHP

  10. Pingback: WikiMedia, arquitetura limpa e ADR | grupo IO Multi Soluções Inteligentes

  11. Pingback: Clean architecture implemented as a PHP app | Dmitriy Lezhnev - PHP, LEMP, backend developer

  12. Pingback: Clean architecture links | Dmitriy Lezhnev - PHP, LEMP, backend developer

  13. Pingback: Generic Entity handling code – Entropy Wins

Leave a Reply