When I started writing this blog post, I realized some introduction to the query components was first due. You can find it in my last blog post: The Wikidata phase3 software components.
In this post I described how the SQLStore uses a database abstraction layer to not bind itself to a particular relational database. The Wikibase software as a whole is already using a database abstraction layer, namely the one MediaWiki provides. Even though the main Wikibase applications, Wikibase Client and Wikibase Repo, depend on MediaWiki, the SQLStore does not. That rules out direct usage of the MediaWiki database abstraction layer.
Nevertheless, the MediaWiki database abstraction layer has served us reasonably well, and the team is familiar with it. Furthermore, using it does not introduce a new dependency that WMF operations might get mad about. So what we did was create a new set of interfaces for database interaction, much in line with those of the MediaWiki database abstraction later, though without a bunch of design issues the later suffers from. We then created thin implementations of these interfaces that delegated to the MediaWiki database abstraction layer. This inversion of control made it possible to use the MW abstraction layer in SQLStore, without having SQLStore know about MediaWiki.
These interfaces where put in a new component called Wikibase Database. I covered the creation of Wikibase Database in an earlier blog post.
The SQLStore does not have a fully fixed schema. One can configure which types of data are supported, and generally each type of data has its own table. Furthermore one can provide additional types of data that the core SQLStore does not support. Hence having the tables build with manually constructed SQL in sql files as done in MediaWiki does not seem convenient at all. The Semantic MediaWiki deals with this problem by dynamically generating a declarative definition of the schema and then translating that into SQL. We went with the same approach for the SQLStore, though took a lot of care to not repeat the spaghetti implementation approach of SMW.
This means we needed some way to represent tables, columns, indexes, etc in PHP. Furthermore, we’d need to be able to turn this PHP representation into SQL compatible with the used database. Unfortunately the MediaWiki database abstraction later can do neither of those.
What we ended up doing was creating schema representation objects and adding additional interfaces to Wikibase Database. And than creating implementations of those for MySQL and SQLite. Needless to say, writing such SQL generation code is rather tedious and quite hard to test. We knew that before hand, and spend quite some time looking around for existing solutions we’d be able to delegate to. Unfortunately none where found.
Several months later I somehow ended up on the architecture page of Doctrine DBAL. After reading through the basic introduction there, I thought “oh wow, this sound very similar to what we did in Wikibase Database”. So I read through the more detailed docs, and slowly came to the realization that DBAL is exactly what we had been looking for before. Making the facepalm even worse, I actually read through the basic Doctrine docs and quickly looked at its source when doing the initial research. Base on that I had concluded that it came with all this not needed ORM stuff, not realizing Doctrine itself was build on an independent DBAL.
I’ve now migrated the SQLStore to use Doctrine DBAL rather than Wikibase Database. The similarities between the interfaces of both abstraction layers is extremely high. The structure of the schema definition objects is essentially identical.
Of course Doctrine DBAL has many interfaces that Wikibase Database does not have. For instance, it can compare two schemas with each other, and turn the diff into SQL queries. That’s something that we will very likely have use for at a later point. It also has a nice query builder, support for more databases and a much more solid implementation overall. And perhaps most important of all, it does not need to be maintained by us. That’s one big project liability traded for another liability orders of magnitude less significant 🙂
The refactoring in the SQLStore from Wikibase Database killed about 1000 lines of code and obsoleted Wikibase Database (8000 lines of code) itself. Granted, this refactoring also tackled a design problem that caused not needed complexity, which contributed to this 1000 lines. It’s always great when you can remove so much code while retaining all functionality. In fact some TODOs got tackled along the way and some bugs got fixed. Oh and, we can now run the integration tests on an in-memory SQLite database \o/
So far my experience with Doctrine DBAL has been almost entirely positive. Many thanks to the authors of this software.