New database abstraction layer for MediaWiki

Disclaimer: the views and opinions expressed in this blog post are my own, and do not reflect those of Wikmedia Germany or my colleagues.

At Wikimedia Germany, we’ve created a library that acts as database abstraction layer and which is largely build on top of the MediaWiki database abstraction layer.

This post explains the reasoning behind doing so, as well as some lessons learned along the way. Technical documentation, as well as installation and usage instructions, can be found in the documentation of this new library, linked at the end of this blog post.

One of the components being developed for the Wikidata project has a table per fundamental type of value we store. Numbers, strings, entity ids, geo coordinates, etc, all have their own table. The user of this software can decide which of these fundamental values are supported. This makes the schema dependent on configuration. The MediaWiki mechanism for setting up and (structurally) updating tables does not support this well.

The exact same situation has already been encountered by the Semantic MediaWiki extension. The solution employed by SMW is dynamic schema generation. Unfortunately the quality of the code doing this is very low, so even if it were separable (which it is not), we’d not want to use it.

Hence we decided to write a new component providing schema modification functionality. In other words, it’d be able to create tables, modify them, and delete them. And so Wikibase Database was born.

Creating such a new component allowed us to achieve another important goal as well: minimization of binding and dependencies. The component with dynamic schema depends on the Wikibase DataModel and its dependencies, and nothing else. It does not depend on Wikibase Repo, Wikibase Client, or MediaWiki.

The separation for MediaWiki is done by creating a new interface (QueryInterface) which is used by the consumers of Wikibase Database. In Wikibase Database it is implemented by a MediaWiki specific class that is essentially a thin adaptor around MediaWiki’s DatabaseBase.

On top of breaking dependence on the whole of MediaWiki, this new interface also takes care of some isolating consumers of Wikibase Database from several flaws in DatabaseBase.

First of all, DatabaseBase is bloated. It is extremely poorly segregated. The interfaces in Wikibase Database were carefully designed to avoid such a mistake. Secondly, the methods in DatabaseBase tend to inappropriately return error codes on failures, rather then exceptions. Wikibase Database catches these and throws nice fine grained exceptions with appropriate information. Lastly the MediaWiki interface has inconsistent implementations. The MySQL and SQLite implementations behave differently in certain error conditions, which really ought to not be exposed to the users of this interface.

One lesson learned is that really everything that can be separated from database interaction should be, if you want it to be testable. Initially I created a “MySQLTableBuilder” that has a createTable( TableDefinition $table ) method. Seems harmless enough right? Let’s describe the responsibility of this class: build the table creation SQL based on the provided table definition and execute it. Note how this description contains an “and”, which is a good sign of a Single Responsibility Principle violation.

The result of this poorly thought out design was that testing the meat of this class, the building of the table creation SQL, was not easily possible without database interaction. (Mocking was quite difficult in this scenario due to the nature of the MW DB abstraction layer).

A better design, which seems quite obvious in retrospect, is to put the table creation SQL building code into its own class, ie MySQLTableSqlBuilder. This class implements a TableSqlBuilder interface, which is used as strategy at appropriate locations, and has an implementation for each supported database type.

Another lesson learned is that it really pays to have really distinct boundaries within the package itself. MySQL specific and MediaWiki specific code are both in their own namespace. Schema modification code and “regular db interaction” code are also kept apart. The documentation clearly describes the responsibility of each part of the component, which other parts it is dependent on, and if it contains interfaces that can be used from outside the component or not. This makes understanding the component a lot easier, increases separability, and makes it easier to use it correctly.

We just released version 0.1 of this component, as it reached sufficient stability for third party usage. I personally hope that it will be of use to people building on top of MediaWiki, or as indeed is possible, to those who are doing PHP development in a different context.

For download and usage instructions, technical documentation, and release notes, see the README file on GitHub.

1 thought on “New database abstraction layer for MediaWiki”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.