Implementing the Clean Architecture

Both Domain Driven Design and architectures such as the Clean Architecture and Hexagonal are often talked about. It’s hard to go to a conference on software development and not run into one of these topics. However it can be challenging to find good real-world examples. In this blog post I’ll introduce you to an application following the Clean Architecture and incorporating a lot of DDD patterns. The focus is on the key concepts of the Clean Architecture, and the most important lessons we learned implementing it.

The application

The real-world application we’ll be looking at is the Wikimedia Deutschland fundraising software. It is a PHP application written in 2016, replacing an older legacy system. While the application is written in PHP, the patterns followed are by and large language agnostic, and are thus relevant for anyone writing object orientated software.

I’ve outlined what the application is and why we replaced the legacy system in a blog post titled Rewriting the Wikimedia Deutschland fundraising. I recommend you have a look at least at its “The application” section, as it will give you a rough idea of the domain we’re dealing with.

A family of architectures

Architectures such as Hexagonal and the Clean Architecture are very similar. At their core, they are about separation of concerns. They decouple from mechanisms such as persistence and used frameworks and instead focus on the domain and high level policies. A nice short read on this topic is Unclebob’s blog post on the Clean Architecture. Another recommended post is Hexagonal != Layers, which explains that how just creating a bunch of layers is missing the point.

The Clean Architecture

cleanarchitecture

The arrows crossing the circle boundaries represent the allowed direction of dependencies. At the core is the domain. “Entities” here means Entities such as in Domain Driven Design, not to be confused by ORM entities. The domain is surrounded by a layer containing use cases (sometimes called interactors) that form an API that the outside world, such as a controller, can use to interact with the domain. The use cases themselves only bind to the domain and certain cross cutting concerns such as logging, and are devoid of binding to the web, the database and the framework.

In this example you can see how the UC for canceling a donation gets a request object, does some stuff, and then returns a response object. Both the request and response objects are specific to this UC and lack both domain and presentation mechanism binding. The stuff that is actually done is mainly interaction with the domain through Entities, Aggregates and Repositories.

This is a typical way of invoking a UC. The framework we’re using is Silex, which calls the function we provided when the route matches. Inside this function we construct our framework agnostic request model and invoke the UC with it. Then we hand over the response model to a presenter to create the appropriate HTML or other such format. This is all the framework bound code we have for canceling donations. Even the presenter does not bind to the framework, though it does depend on Twig.

If you are familiar with Silex, you might already have noticed that we’re constructing our UC different than you might expect. We decided to go with our own top level factory, rather than using the dependency injection mechanism provided by Silex: Pimple. Our factory internally actually uses Pimple, though this is not visible from the outside. With this approach we gain a nicer access to service construction, since we can have a getLogger() method with LoggerInterface return type hint, rather than accessing $app['logger'] or some such, which forces us to bind to a string and leaves us without type hint.

use-case-list

This use case based approach makes it very easy to see what our system is capable off at a glance.

use-case-directory

And it makes it very easy to find where certain behavior is located, or to figure out where new behavior should be put.

All code in our src/ directory is framework independent, and all code binding to specific persistence mechanisms resides in src/DataAccess. The only framework bound code we have are our very slim “route handlers” (kinda like controllers), the web entry point and the Silex bootstrap.

For more information on The Clean Architecture I can recommend Robert C Martins NDC 2013 talk. If you watch it, you will hopefully notice how we slightly deviated from the UseCase structure like he presented it. This is due to PHP being an interpreted language, and thus does not need certain interfaces that are beneficial in compiled languages.

Lesson learned: bounded contexts

By and large we started with the donation related use cases and then moved on to the membership application related ones. At some point, we had a Donation entity/aggregate in our domain, and a bunch of value objects that it contained.

As you can see, one of those value objects is PersonalInfo. Then we needed to add an entity for membership applications. Like donations, membership applications require a name, a physical address and an email address. Hence it was tempting to reuse our existing PersonalInfo class.

Luckily a complication made us realize that going down this path was not a good idea. This complication was that membership applications also have a phone number and an optional date of birth. We could have forced code sharing by doing something hacky like adding new optional fields to PersonalInfo, or by creating a MorePersonalInfo derivative.

Approaches such as these, while resulting in some code sharing, also result in creating binding between Donation and MembershipApplication. That’s not good, as those two entities don’t have anything to do with each other. Sharing what happens to be the same at present is simply not a good idea. Just imagine that we did not have the phone number and date of birth in our first version, and then needed to add them. We’d either end up with one of those hacky solutions, or need to refactor code that has nothing to do (apart from the bad coupling) with what we want to modify.

What we did is renaming PersonalInfo to Donor and introduce a new Applicant class.

These names are better since they are about the domain (see ubiquitous language) rather than some technical terms we needed to come up with.

Amongst other things, this rename made us realize that we where missing some explicit boundaries in our application. The donation related code and the membership application related code where mostly independent from each other, and we agreed this was a good thing. To make it more clear that this is the case and highlight violations of that rule, we decided to reorganize our code to follow the strategic DDD pattern of Bounded Contexts.

contexts-directory

This mainly consisted out of reorganizing our directory and namespace structure, and a few instances of splitting some code that should not have been bound together.

Based on this we created a new diagram to reflect the high level structure of our application. This diagram, and a version with just one context, are available for use under CC-0.

Clean Architecture + Bounded Contexts

Lesson learned: validation

A big question we had near the start of our project was where to put validation code. Do we put it in the UCs, or in the controller-like code that calls the UCs?

One of the first UCs we added was the one for adding donations. This one has a request model that contains a lot of information, including the donor’s name, their email, their address, the payment method, payment amount, payment interval, etc. In our domain we had several value objects for representing parts of donations, such as the donor or the payment information.

Since we did not want to have one object with two dozen fields, and did not want to duplicate code, we used the value objects from our domain in the request model.

If you’ve been paying attention, you’ll have realized that this approach violates one of the earlier outlined rules: nothing outside the UC layer is supposed to access anything from the domain. If value objects from the domain are exposed to whatever constructs the request model, i.e. a controller, this rule is violated. Loose from the this abstract objection, we got into real trouble by doing this.

Since we started doing validation in our UCs, this usage of objects from the domain in the request necessarily forced those objects to allow invalid values. For instance, if we’re validating the validity of an email address in the UC (or a service used by the UC), then the request model cannot use an EmailAddress which does sanity checks in its constructor.

We thus refactored our code to avoid using any of our domain objects in the request models (and response models), so that those objects could contain basic safeguards.

We made a similar change by altering which objects get validated. At the start of our project we created a number of validators that worked on objects from the domain. For instance a DonationValidator working with the Donation Entity. This DonationValidator would then be used by the AddDonationUseCase. This is not a good idea, since the validation that needs to happen depends on the context. In the AddDonationUseCase certain restrictions apply that don’t always hold for donations. Hence having a general looking DonationValidator is misleading. What we ended up doing instead is having validation code specific to the UCs, be it as part of the UC, or when too complex, a separate validation service in the same namespace. In both cases the validation code would work on the request model, i.e. AddDonationRequest, and not bind to the domain.

After learning these two lessons, we had a nice approach for policy-based validation. That’s not all validation that needs to be done though. For instance, if you get a number via a web request, the framework will typically give it to you as a string, which might thus not be an actual number. As the request model is supposed to be presentation mechanism agnostic, certain validation, conversion and error handling needs to happen before constructing the request model and invoking the UC. This means that often you will have validation in two places: policy based validation in the UC, and presentation specific validation in your controllers or equivalent code. If you have a string to integer conversion, number parsing or something internationalization specific, in your UC, you almost certainly messed up.

Closing notes

You can find the Wikimedia Deutschland fundraising application on GitHub and see it running in production. Unfortunately the code of the old application is not available for comparison, as it is not public. If you have questions, you can leave a comment, or contact me. If you find an issue or want to contribute, you can create a pull request. If you are looking for my presentation on this topic, view the slides.

As a team we learned a lot during this project, and we set a number of firsts at Wikimedia Deutschland, or the wider Wikimedia movement for that matter. The new codebase is the cleanest non-trivial application we have, or that I know of in PHP world. It is fully tested, contains less than 5% framework bound code, has strong strategic separation between both contexts and layers, has roughly 5% data access specific code and has tests that can be run without any real setup. (I might write another blog post on how we designed our tests and testing environment.)

Many thanks for my colleagues Kai Nissen and Gabriel Birke for being pretty awesome during our rewrite project.

Rewriting the Wikimedia Deutschland fundraising

Last year we rewrote the Wikimedia Deutschland fundraising software. In this blog post I’ll give you an idea of what this software does, why we rewrote it and the outcome of this rewrite.

The application

Our fundraising software is a homegrown PHP application. Its primary functions are donations and membership applications. It supports multiple payment methods, needs to interact with payment providers, supports submission and listing of comments and exchanges data with another homegrown PHP application that does analysis, reporting and moderation.

fun-app

The codebase was originally written in a procedural style, with most code residing directly in files (i.e., not even in a global function). There was very little design and completely separate concerns such as presentation and data access were mixed together. As you can probably imagine, this code was highly complex and very hard to understand or change. There was unused code, broken code, features that might not be needed anymore, and mysterious parts that even our guru that maintained the codebase during the last few years did not know what they did. This mess, combined with the complete lack of a specification and units tests, made development of new features extremely slow and error prone.

derp-code

Why we rewrote

During the last year of the old application’s lifetime, we did refactor some parts and tried adding tests. In doing so, we figured that rewriting from scratch would be easier than trying to make incremental changes. We could start with a fresh design, add only the features we really need, and perhaps borrow some reusable code from the less horrible parts of the old application.

They did it by making the single worst strategic mistake that any software company can make: […] rewrite the code from scratch. —Joel Spolsky

We were aware of the risks involved with doing a rewrite of this nature and that often such rewrites fail. One big reason we did not decide against rewriting is that we had a time period of 9 months during which no new features needed to be developed. This meant we could freeze the old application and avoid parallel development, resulting in some kind of feature race. Additionally, we set some constraints: we would only rewrite this application and leave the analysis and moderation application alone, and we would do a pure rewrite, avoiding the addition of new features into the new application until the rewrite was done.

How we got started

Since we had no specification, we tried visualizing the conceptual components of the old application, and then identified the “commands” they received from the outside world.

old-fun-code-diagram

Creating the new software

After some consideration, we decided to try out The Clean Architecture as a high level structure for the new application. For technical details on what we did and the lessons we learned, see Implementing the Clean Architecture.

The result

With a team of 3 people, we took about 8 months to finish the rewrite successfully. Our codebase is now clean and much, much easier to understand and work with. It took us over two man years to do this clean up, and presumably an even greater amount of time was wasted in dealing with the old application in the first place. This goes to show that the cost of not working towards technical excellence is very high.

We’re very happy with the result. For us, the team that wrote it, it’s easy to understand, and the same seems to be true for other people based on feedback we got from our colleagues in other teams. We have tests for pretty much all functionality, so can refactor and add new functionality with confidence. So far we’ve encountered very few bugs, with most issues arising from us forgetting to add minor but important features to the new application, or misunderstanding what the behavior should be and then correctly implementing the wrong thing. This of course has more to do with the old codebase than with the new one. We now have a solid platform upon which we can quickly build new functionality or improve what we already have.

The new application is the first Wikimedia (Deutschland) deployed on, and wrote in, PHP7. Even though not an explicit goal of the rewrite, the new application has ended up with better performance than the old one, in part due to the PHP7 usage.

Near the end of the rewrite we got an external review performed by thePHPcc, during which Sebastian Bergmann, who you might know from PHPUnit fame, looked for code quality issues in the new codebase. The general result of that was a thumbs up, which we took the creative license to translate into this totally non-Sebastian approved image:

You can see our new application in action in production. I recommend you try it out by donating 🙂

Technical statistics

These are some statistics for fun. They have been compiled after we did our rewrite, and where not used during development at all. As with most software metrics, they should be taken with a grain of salt.

In this visualization, each dot represents a single file. The size represents the Cyclomatic complexity while the color represents the Maintainability Index. The complexity is scored relative to the highest complexity in the project, which in the old application was 266 and in the new one is 30. This means that the red on the right (the new application) is a lot less problematic than the red on the left. (This visualization was created with PhpMetrics.)

fun-complexity

Global access in various Wikimedia codebases (lower is better). The rightmost is the old version of the fundraising application, and the one next to it is the new one. The new one has no global access whatsoever. LLOC stands for Logical Lines of Code. You can see the numbers in this public spreadsheet.

global-access-stats

Static method calls, often a big source of global state access, where omitted, since the tools used count many false positives (i.e. alternative constructors).

The differences between the projects can be made more apparent by visualizing them in another way. Here you have the number of lines per global access, represented on a logarithmic scale.

lloc-per-global

The following stats have been obtained using phploc, which counts namespace declarations and imports as LLOC. This means that for the new application some of the numbers are very slightly inflated.

  • Average class LLOC: 31 => 21
  • Average method LLOC: 4 => 3
  • Cyclomatic Complexity / LLOC : 0.39 => 0.10
  • Cyclomatic Complexity / Number of Methods: 2.67 => 1.32
  • Global functions: 58 => 0
  • Total LLOC: 5517 => 10187
  • Test LLOC: 979 => 5516
  • Production LLOC: 4538 => 4671
  • Classes 105 => 366
  • Namespaces: 14 => 105

This is another visualization created with PhpMetrics that shows the dependencies between classes. Dependencies are static calls (including to the constructor), implementation and extension and type hinting. The applications top-level factory can be seen at the top right of the visualization.

fun-dependencies

Maps 4.0.0-RC1 released!

I’m happy to announce the first release candidate for Maps 4.0. Maps is a MediaWiki extension to work with and visualize geographical information. Maps 4.0 is the first major release of the extension since January 2014, and it brings a ton of “new” functionality.

First off, this blog post is about a release candidate, meant to gather feedback and not suitable for usage in production. The 4.0 release itself will be made one week from now if no issues are found.

Almost all features from the Semantic Maps extension got merged into Maps, with the notable omission of the form input, which now resides in Yaron Korens Page Forms extension. I realized that spreading out the functionality over both Maps and Semantic Maps was hindering development and making things more difficult for the users than needed. Hence Semantic Maps is now discontinued, with Maps containing the coordinate datetype, the map result formats for each mapping service, the KML export format and distance query support. All these features will automatically enable themselves when you have Semantic MediaWiki installed, and can be explicitly turned off with a new egMapsDisableSmwIntegration setting.

The other big change is that, after 7 years of no change, the default mapping service was changed from Google Maps to Leaflet. The reason for this alteration is that Google recently required obtaining and specifying an API key for its maps to work on new websites. This would leave some users confused when they first installed the Maps extension and got a non functioning map, even though the API key is mentioned in the installation instructions. Google Maps is of course still supported, and you can make it the default again on your wiki via the egMapsDefaultService setting.

Another noteworthy change is the addition of the egMapsDisableExtension setting, which allows for disabling the extension via configuration, even when it is installed. This has often been requested by those running wiki farms.

For a full list of changes, see the release notes. Also check out the new features in Maps 3.8, Maps 3.7 and Maps 3.6 if you have not done so yet.

Upgrading

Since this is a major release, please beware of the breaking changes, and that you might need to change configuration or things inside of your wiki. Update your mediawiki/maps version in composer.json to ~4.0@rc (or ~4.0 once the real release has happened) and run composer update.

Beware that as of Maps 3.6, you need MediaWiki 1.23 or later, and PHP 5.5 or later. If you choose to remain with an older version of PHP or MediaWiki, use Maps 3.5. Maps works with the latest stable versions of both MediaWiki and PHP, which are the versions I recommend you use.

Object Orientated Lua code

During the last few weeks I’ve been refactoring some horrible Lua code. This has been a ton of fun so far, and I learned many new things about Lua that I’d like to share.

Such Horrible Code

final-rush-pro-5-largeThe code in question is that of a scripted Supreme Commander Forged Alliance Forever Map called Final Rush Pro v4. Essentially all the code resides in a single Lua file slightly over 2500 lines long. It is entirely procedural, uses global state all over, contains plenty of copy pasted code and, unsurprisingly, does not have a single test. What’s more is that at least some of the code must have been written by people not even at home with procedural programming, as there are several instances when massive if-else blocks are used rather than loops.

Much Refactoring

The high level approach I took was to identify cohesive sets of code in the huge file and move them out in dedicated files. These dedicated files would then have their dependencies explicitly defined and could be cleaned up one by one. This graph shows the lines of code of the Lua file that acts as entry point over time:

final-rush-loc

The first example can be seen in moving the “PrebuildTents” code into its own file. This code, coincidentally, nicely illustrates the copy pasting and insane use of if-else over loops. One huge issue that remains when simply moving the code like that is that it remains in global/static scope. In other words, it’s not possible to use the code in the file with two different sets of local values. I did some searching on how to idiosyncratically achieve polymorphism in Lua.

One of the first things I read through was the Object Orientated Programming pages of the Programming in Lua book. Following that approach, I created the very first version of a simple wrapper around a list of player armies. As you can see there, I wrote tests for that code (more on those tests below). I was not too happy with that approach as it does not provide nice encapsulation. After looking at the code of some of the more prominent Lua tools I came across, I decided to go with a closure based approach instead. Initially I would define a this local table, which would then get functions bound to it. I switched to returning a map at the end of the closure, which makes it more clear what the public functions are, and leaves one less local variable to worry about. (The closure is assigned to newInstance rather than just returned due to the way the import mechanism of the framework works, which is different than Lua’s native require.)

A downside of how the code in the files is organized is that you essentially need to read backwards when looking at how it is invoked. The public functions are listed at the very end of the file, with their dependencies defined before, and their dependencies defined before that. It would be nice to have the public functions more clearly visible at the top of the file, which is where you need to look for the constructor signature already.

Now the creating of cohesive sets of code is mostly done, the entry point file is down to 44 Lines of Code. It defaults some options coming from the framework/game, and then invokes a high level module that sets up the various aspects of the game, which totals 70 Lines of Code.

My next steps are further cleanup of individual sets of code, with a focus on minimizing dependencies and separating concerns. For this I’m using practices, principles and patterns which are by and large language agnostic, so I won’t get into them here. You can find the code of the new version of the map in the Final Rush Pro 5 repository on GitHub, including many small refactoring commits in the git history.

Very Environment

My first modifications to the code where with Notepad++ on Windows. While that editor provides syntax highlighting, there is no static code analysis or any of the essential things that require it, such as navigating to definitions. Hence I switched to my usual development environment, IntelliJ on Linux, using the IntelliJ Lua plugin.

While that switch to Linux made refactoring the code easier, it also prevent me from (manually) testing the code. This code, like many legacy balls of mud, binds very tightly to its framework, in this case the Supreme Commander game that only runs on Windows. While it’s often good to remove such binding, it’s not a trivial task, and not something I’d want to attempt without a fast feedback cycle.

The lack of fast feedback drove me to find a Lua testing tool to use. Several are listed on the lua-users wiki. After checking the project health of several tools, I decided to go with Busted, which I installed via LuaRocks. I then proceeded to create a wrapper for the list of players in the game (to replace code that was not only crappy but also incorrect) using Test Driven Development, resulting in a nice spec for the wrapper.

Unfortunately the same approach would not work for cleaning up most of the other code. The framework binding was just too high, and in a lot of cases, contrary to the typical scenario I’m used to (which are not games), perhaps simply the best that can be done. Hence I switched back to Windows.

On Windows I installed IntelliJ with the Lua plugin, TortoiseGit, and Busted. The latter was quite a hurdle, since my Windows administration skills are not exactly stellar. For Busted I needed to install Lua (ya really), LuaRocks and the MinGW compiler. Being able to run the tests in the IDE’s terminal was worth it though.

Wow Release

Version 5 of the map has now been released, see the release post for details on the new features.

Clean Architecture diagrams

I’m happy to release a few Clean Architecture related diagrams into the public domain (CC0 1.0).

These diagrams where created at Wikimedia Deutchland by Jan Dittrich, Charlie Kritschmar and myself for an upcoming presentation I’m doing on the Clean Architecture. There are plenty of diagrams available already if you include Onion Architecture and Hexagonal, which have essentially the same structure, though none I’ve found so far have a permissive license. Furthermore, I’m not so happy with the wording and structure of a lot of these. In particular, some incorporate more than they can chew with the “dependencies pointing inward rule”, glossing over important restrictions which end up not being visualized at all.

These images are SVGs. Click them to go to Wikimedia Commons where you can download them.

Clean Architecture Clean Architecture + Bounded Context Clean Architecture + Bounded Contexts Clean Architecture + Bounded Contexts

Maps 3.8 for MediaWiki released

I’m happy to announce the immediate availability of Maps 3.8. This feature release brings several enhancements and new features.

  • Added Leaflet marker clustering (by Peter Grassberger)
    • markercluster: Enables clustering, multiple markers are merged into one marker.
    • clustermaxzoom: The maximum zoom level where clusters may exist.
    • clusterzoomonclick: Whether clicking on a cluster zooms into it.
    • clustermaxradius: The maximum radius that a cluster will cover.
    • clusterspiderfy: At the lowest zoom level markers are separated so you can see all.
  • Added Leaflet fullscreen control (by Peter Grassberger)
  • Added OSM Nominatim Geocoder (by Peter Grassberger)
  • Upgraded Leaflet library to its latest version (1.0.0-r3) (by Peter Grassberger)
  • Made removal of marker clusters more robust. (by Peter Grassberger)
  • Unified system messages for several services (by Karsten Hoffmeyer)

Leaflet marker clusters

Goolge Maps API key

Due to changes to Google Maps, an API key now needs to be set. Upgrading to the latest version of Maps will not break the maps on your wiki in any case, as the change really is on Googles end. If they are still working, you can keep running an older version of Maps. Of course it’s safer to upgrade and set the API key anyway. In case you have a new wiki or the maps broke for some reason, you will need to get Maps 3.8 or later and set the API key. See the installation configuration instructions for more information.

  • Added Google Maps API key egMapsGMaps3ApiKey setting (by Peter Grassberger)
  • Added Google Maps API version number egMapsGMaps3ApiVersion setting (by Peter Grassberger)

Upgrading

Since this is a feature release, there are no breaking changes, and you can simply run composer update, or replace the old files with the new ones.

Beware that as of Maps 3.6, you need MediaWiki 1.23 or later, and PHP 5.5 or later. If you choose to remain with an older version of PHP or MediaWiki, use Maps 3.5. Maps works with the latest stable versions of both MediaWiki and PHP, which are the versions I recommend you use.

Notes: Implementing DDD, chapter 2

Notes from Implementing Domain Driven Design, chapter 2: Domains, Subdomains and Bounded Contexts (p58 and later only)

  • User interface and service orientated endpoints are within the context boundary
  • Domain concepts in the UI form the Smart UI Anti-Pattern
  • A database schema is part of the context if it was created for it and not influenced from the outside
  • Contexts should not be used to divide developer responsibilities; modules are a more suitable tactical approach
  • A bounded context has one team that is responsible for it (while teams can be responsible for multiple bounded contexts)
  • Access and identity is its own context and should not be visible at all in the domain of another context. The application services / use cases in the other context are responsible for interacting with the access and identity generic subdomain
  • Context Maps are supposedly real cool

Maps 3.7 for MediaWiki released

I’m happy to announce the immediate availability of Maps 3.7. This feature release brings some minor enhancements.

  • Added rotate control support for Google Maps (by Peter Grassberger)
  • Changed coordinate display on OpenLayers maps from long-lat to lat-long (by Peter Grassberger)
  • Upgraded Google marker cluster library to its latest version (2.1.2) (by Peter Grassberger)
  • Upgraded Leaflet library to its latest version (0.7.7) (by Peter Grassberger)
  • Added missing system messages (by Karsten Hoffmeyer)
  • Internal code enhancements (by Peter Grassberger)
  • Removed broken custom map layer functionality. You no longer need to run update.php for full installation.
  • Translation updates by TranslateWiki

Upgrading

Since this is a feature release, there are no breaking changes, and you can simply run composer update, or replace the old files with the new ones.

Beware that as of Maps 3.6, you need MediaWiki 1.23 or later, and PHP 5.5 or later. If you choose to remain with an older version of PHP or MediaWiki, use Maps 3.5. Maps works with the latest stable versions of both MediaWiki and PHP, which are the versions I recommend you use.

PHP Unconference Europe 2016

Last week I attended the 2016 edition of the PHP Unconference Europe, taking place in Palma De Mallorca. This post contains my notes from various conference sessions. Be warned, some of them are quite rough.

Overall impression

Before getting to the notes, I’d like to explain the setup of the unconference and my general impression.

The unconference is two days long, not counting associated social events before and afterwards. The first day started with people discussing in small groups which sessions they would like to have, either by leading them themselves, or just wanting to attend. These session ideas where written down and put on papers on the wall. We then went through them one by one, with someone explaining the idea behind each session, and one or more presenters / hosts being chosen. The final step of the process was to vote on the sessions. For this, each person got two “sticky dots” (what are those things called anyway?), which they could either both put onto a single session, or split and vote on two sessions.

One each day we had 4 such sessions, with long breaks in between, to promote interaction between the attendees.

Onto my notes for individual sessions:

How we analyze your code

Analysis and metrics can be used for tracking progress and for analyzing the current state. Talk focuses on current state.

  • Which code is important
  • Probably buggy code
  • Badly tested code
  • Untested code

Finding the core (kore?): code rank (like Google page rank): importance flows to classes that are dependent upon (fan-in). Qafoo Quality Analyzer. Reverse code rank: classes that depend on lots of other classes (fan-out)

Where do we expect bugs? Typically where code is hard to understand. We can look at method complexity: cyclomatic complexity, NPath complexity. Line Coverage exists, Path Coverage is being worked upon. Parameter Value Coverage. CRAP.

Excessive coupling is bad. Incoming and outgoing dependencies. Different from code rank in that only direct dependencies are counted. Things that are depended on a lot should be stable and well tested (essentially the Stable Dependencies Principle).

Qafoo Quality Analyzer can be used to find dependencies across layers when they are in different directories. Very limited at present.

When finding highly complex code, don’t immediately assume it is bad. There are valid reasons for high complexity. Metrics can also be tricked.

The evolution of web application architecture

How systems interact with each other. Starting with simple architecture, looking at problems that arise as more visitors arrive, and then seeing how we can deal with those problems.

Users -> Single web app server -> DB

Next step: Multiple app servers + load balancers (round robin + session caching server)

Launch of shopping system resulted in app going down, as master db got too many writes, due to logging “cache was hit” in it.

Different ways of caching: entities, collections, full pages. Cache invalidation is hard, lots of dependencies even in simple domains.

When too many writes: sharding (split data across multiple nodes), vertical (by columns) or horizontal (by rows). Loss of referential integrity checking.

Complexity with relational database systems -> NoSQL: sharding, multi master, cross-shard queries. Usually no SQL or referential integrity, though those features are already lost when using sharding.

Combination of multiple persistence systems: problems with synchronization. Transactions are slow. Embrace eventual consistency. Same updating strategies can be used for caches.

Business people often know SQL, yet not NoSQL query languages.

Queues can be used to pass data asynchronously to multiple consumers. Following data flow of an action can be tricky. Data consistency is still a thing.

Microservices: separation of concerns on service and team level. Can simplify via optimal tech stack per serve. Make things more complicated, need automated deployment, orchestration, eventual consistency, failure handling.

Boring technology often works best, especially at the beginning of a project. Start with the simplest solution that works. Hold team skills into account.

How to fuck up projects

Before the project

  • Buzzword first design
  • Mismatching expectations: huge customer expectations, no budget
  • Fuzzy ambitious vocabulary, directly into the contract (including made up words)
  • Meetings, bad mood, no eye contact
  • No decisions (no decision making process -> no managers -> saves money)
  • Customer Driven Development: customer makes decisions
  • Decide on environment: tools, mouse/touchpad, 1 big monitor or 2 small ones, JIRA, etc
  • Estimates: should be done by management

During the project

  • Avoid ALL communication, especially with the customer
  • If communication cannot be avoided: mix channels
  • Responsibility: use group chats and use “you” instead of specific names (cc everyone in mails)
  • Avoid issue trackers, this is what email and Facebook are for
  • If you cannot avoid issue trackers: use multiple or have one ticket with 2000 notes
  • Use ALL the programming languages, including PHP-COBOL
  • Do YOUR job, but nothing more
  • Only pressure makes diamonds: coding on the weekend
  • No breaks so people don’t lose focus
  • Collect metrics: Hours in office, LOC, emails answered, tickets closed

Completing the project

  • 3/4 projects fail: we can’t do anything about it
  • New features? Outsource
  • Ignore the client when they ask about the completed project
  • Change the team often, fire people on a daily basis
  • Rotate the customer’s contact person

Bonus

  • No VCS. FTP works. Live editing on production is even better
  • http://whatthecommit.com/
  • Encoding: emjois in function names, umlaut in file names. Mix encodings, also in MySQL
  • Agile is just guidelines, change goals during sprints often
  • Help others fuck up: release it as open source
  • git blame-someone-else

The future of PHP

This session started with some words from the moderator, who mainly talked about performance, portability and future adoption of, or moving away from, PHP.

  • PHP now fast enough to use many PHP libraries
  • PHP now better for long running tasks (though still no 64 bit for windows)
  • PHP now has an Abstract Syntax Tree

The discussion that followed after was primarily about the future of PHP in terms of adoption. The two languages most mentioned as competitors where Javascript and Java.

Java because it is very hard to get PHP into big enterprise, where people tend to cling to Java. A point made several times about this is that such choices have very little to do with technical sensibility, and are instead influenced by the eduction system, languages already used, newness/ hipness and the HiPPO. Most people also don’t have the relevant information to make an informed choice, and do not do the effort to look up this information as they already have a preference.

Javascript is a competitor because web based projects, be it with a backend in PHP or in another language, need more and more Javascript, with no real alternatives. It was mentioned several times that not having alternatives it bad. Having multiple JS interpreters is cool, JS being the only choice for browser programming is not.

Introduction to sensible load testing

In this talk the speaker explained why it is important to do realistic load testing, and how to avoid common pitfalls. He explained how jMeter can be used to simulate real user behavior during peak load times. Preliminary slides link.

Domain Objects: not just for Domain Driven Design

This session was hard to choose, as it coincided with “What to look for in a developer when hiring, and how to test it”, which I also wanted to attend.

The Domain Objects session introduced what Value Objects are, and why they are better than long parameter lists and passing around values that might be invalid. While sensible enough, all very basic, with unfortunately no information for me whatsoever. I’m thinking it’d have been better to do this as a discussion, partly because the speaker was clearly very inexperienced, and gave most of the talk with his arms crossed in front of him. (Speaker, if you are reading this, please don’t be discouraged, practice makes perfect.)

Performance monitoring

I was only in the second half of this session, during which two performance monitoring tools where presented. Tideways by Qafoo and Instana.

Some tweets

Maps 3.6 for MediaWiki released

I’m happy to announce the immediate availability of Maps 3.6. This feature release brings marker clustering enhancements and a number of fixes.

These parameters where added to the display_map parser function, to allow for greater control over marker clustering. They are only supported together with Google Maps.

  • clustergridsize: The grid size of a cluster in pixels
  • clustermaxzoom: The maximum zoom level that a marker can be part of a cluster
  • clusterzoomonclick: If the default behavior of clicking on a cluster is to zoom in on it
  • clusteraveragecenter: If the cluster location should be the average of all its markers
  • clusterminsize: The minimum number of markers required to form a cluster

Bugfixes

  • Fixed missing marker cluster images for Google Maps
  • Fixed duplicate markers in OpenLayers maps
  • Fixed URL support in the icon parameter

Credits

Many thanks to Peter Grassberger, who made the listed fixes and added the new clustering parameters. Thanks also go to Karsten Hoffmeyer for miscellaneous support and to TranslateWiki for providing translations.

Upgrading

Since this is a feature release, there are no breaking changes, and you can simply run composer update, or replace the old files with the new ones.

There are, however, compatibility changes to keep in mind. As of this version, Maps requires PHP 5.5 or later and MediaWiki 1.23 or later. composer update will not give you a version of Maps incompatible with your version of PHP, though it is presently not checking your MediaWiki version. Fun fact: this is the first bump in minimum requirements since the release of Maps 2.0, way back in 2012.