Updated on February 26, 2018
Rewriting the Wikimedia Deutschland fundraising
Last year we rewrote the Wikimedia Deutschland fundraising software. In this blog post I’ll give you an idea of what this software does, why we rewrote it and the outcome of this rewrite.
Our fundraising software is a homegrown PHP application. Its primary functions are donations and membership applications. It supports multiple payment methods, needs to interact with payment providers, supports submission and listing of comments and exchanges data with another homegrown PHP application that does analysis, reporting and moderation.
The codebase was originally written in a procedural style, with most code residing directly in files (i.e., not even in a global function). There was very little design and completely separate concerns such as presentation and data access were mixed together. As you can probably imagine, this code was highly complex and very hard to understand or change. There was unused code, broken code, features that might not be needed anymore, and mysterious parts that even our guru that maintained the codebase during the last few years did not know what they did. This mess, combined with the complete lack of a specification and units tests, made development of new features extremely slow and error prone.
Why we rewrote
During the last year of the old application’s lifetime, we did refactor some parts and tried adding tests. In doing so, we figured that rewriting from scratch would be easier than trying to make incremental changes. We could start with a fresh design, add only the features we really need, and perhaps borrow some reusable code from the less horrible parts of the old application.
They did it by making the single worst strategic mistake that any software company can make: […] rewrite the code from scratch. —Joel Spolsky
We were aware of the risks involved with doing a rewrite of this nature and that often such rewrites fail. One big reason we did not decide against rewriting is that we had a time period of 9 months during which no new features needed to be developed. This meant we could freeze the old application and avoid parallel development, resulting in some kind of feature race. Additionally, we set some constraints: we would only rewrite this application and leave the analysis and moderation application alone, and we would do a pure rewrite, avoiding the addition of new features into the new application until the rewrite was done.
How we got started
Since we had no specification, we tried visualizing the conceptual components of the old application, and then identified the “commands” they received from the outside world.
Creating the new software
After some consideration, we decided to try out The Clean Architecture as a high level structure for the new application. For technical details on what we did and the lessons we learned, see Implementing the Clean Architecture.
With a team of 3 people, we took about 8 months to finish the rewrite successfully. Our codebase is now clean and much, much easier to understand and work with. It took us over two man years to do this clean up, and presumably an even greater amount of time was wasted in dealing with the old application in the first place. This goes to show that the cost of not working towards technical excellence is very high.
We’re very happy with the result. For us, the team that wrote it, it’s easy to understand, and the same seems to be true for other people based on feedback we got from our colleagues in other teams. We have tests for pretty much all functionality, so can refactor and add new functionality with confidence. So far we’ve encountered very few bugs, with most issues arising from us forgetting to add minor but important features to the new application, or misunderstanding what the behavior should be and then correctly implementing the wrong thing. This of course has more to do with the old codebase than with the new one. We now have a solid platform upon which we can quickly build new functionality or improve what we already have.
The new application is the first Wikimedia (Deutschland) deployed on, and wrote in, PHP7. Even though not an explicit goal of the rewrite, the new application has ended up with better performance than the old one, in part due to the PHP7 usage.
Near the end of the rewrite we got an external review performed by thePHPcc, during which Sebastian Bergmann, who you might know from PHPUnit fame, looked for code quality issues in the new codebase. The general result of that was a thumbs up, which we took the creative license to translate into this totally non-Sebastian approved image:
You can see our new application in action in production. I recommend you try it out by donating 🙂
These are some statistics for fun. They have been compiled after we did our rewrite, and where not used during development at all. As with most software metrics, they should be taken with a grain of salt.
In this visualization, each dot represents a single file. The size represents the Cyclomatic complexity while the color represents the Maintainability Index. The complexity is scored relative to the highest complexity in the project, which in the old application was 266 and in the new one is 30. This means that the red on the right (the new application) is a lot less problematic than the red on the left. (This visualization was created with PhpMetrics.)
Global access in various Wikimedia codebases (lower is better). The rightmost is the old version of the fundraising application, and the one next to it is the new one. The new one has no global access whatsoever. LLOC stands for Logical Lines of Code. You can see the numbers in this public spreadsheet.
Static method calls, often a big source of global state access, where omitted, since the tools used count many false positives (i.e. alternative constructors).
The differences between the projects can be made more apparent by visualizing them in another way. Here you have the number of lines per global access, represented on a logarithmic scale.
The following stats have been obtained using phploc, which counts namespace declarations and imports as LLOC. This means that for the new application some of the numbers are very slightly inflated.
- Average class LLOC: 31 => 21
- Average method LLOC: 4 => 3
- Cyclomatic Complexity / LLOC : 0.39 => 0.10
- Cyclomatic Complexity / Number of Methods: 2.67 => 1.32
- Global functions: 58 => 0
- Total LLOC: 5517 => 10187
- Test LLOC: 979 => 5516
- Production LLOC: 4538 => 4671
- Classes 105 => 366
- Namespaces: 14 => 105
This is another visualization created with PhpMetrics that shows the dependencies between classes. Dependencies are static calls (including to the constructor), implementation and extension and type hinting. The applications top-level factory can be seen at the top right of the visualization.