Wikibase DataModel released!

I’m happy to announce the 0.6 release of Wikibase DataModel. This is the first real release of this component.

DataModel?

Wikibase is the software behind Wikidata.org. At its core, this software is about describing entities. Entities are collections of claims, which can have qualifiers, references and values of various different types. How this all fits together is described in the DataModel document written by Markus and Denny at the start of the project. The Wikibase DataModel component contains (PHP) domain objects representing entities and their various parts, as well as associated domain logic.

A little history

At the start of the Wikidata project, very little thought was put into creating architectural boundaries and components. As a result, we ended up with all code being in one big repo. In this little distinction between application and domain logic was made, and dependencies going from domain layer to data access and presentation layers. Much like what is done in MediaWiki, though luckily on a smaller scale. This is also known as the “big ball of mud” pattern. Needless to say, this causes many many problems, one of which is inseparability. (Inseparability is a code smell that means one cannot extract the code doing one particular task from the larger body of code it resides in.)

It quickly became clear there where a lot of places where the DataModel related code would be very useful. People writing tools dealing with the data, such as (web API) bot authors and people analyzing dumps, very often need a way to represent the entities they are dealing with. It’s silly for every user to re-implement this functionality, which ends up being done in many different manners, most of which will be incorrect to at least some extend. Then there is not only the initial cost of creating this – one also needs to keep it updated as the Wikibase software evolves. Take for instance the deserialization process of the web API format from the Wikibase software. Both authors need to write some code to interpret this format. If on the other hand they can use deserilaizers from a shared library, they can immediately get to the actual work they want to do, and have their code isolated from changes to the web API format. (Of course they can achieve this isolation as well without a shared library, though I have yet to see a bot author bother to do this.) So, lots of motivation to make this code reusable.

For this and many architectural reasons (which I will not describe here without making this post way TL;DR) the Wikibase DataModel code was the first to be pulled from the big Wikibase blob. The first step was to stuff all the obviously data model related classes with application independent code into its own component. This happened over half a year back. Initially this component still had a lot of binding to things it really should not depend on, such as MediaWiki and Wikibase Repo/Client. This was gradually solved over time, and several months ago a point was reached where it was usable as library. Meaning that it no longer depended on any framework and could thus be reused in non-Wikibase contexts. Some lingering bad dependencies remained though, resulting in certain code that should not be used from non-Wikibase context, as it would else fatal. Only last week the last remaining technical debt in this department got solved.

This release

So this release is the first one really ready for third party usage \o/

It is also the first one that got tagged at a very specific point. The intention is from now on to do releases on sensible points quickly after changes and additions where made. And those changes will now be documented in the release notes, so users of the library know what to watch out for when upgrading.

Another nice improvement included in this releases is full PSR-0 compliance. All Wikibase classes used to be directly in the Wikibase namespace. All classes of the DataModel are now in Wikibase\DataModel or sub namespaces thereof. Aliases are in place to retain compatibility with references to the older class names users might have.

Yet another first is that this release now depends only on stable releases of its dependencies. Those dependencies include the Diff library, as well as several of the DataValue components. As a user of the library you do not really have to care much about this, since the dependencies are managed via Composer. The package name is wikibase/data-model, which you can include in your project. And when developing on DataModel itself, you can run composer update in its root directory (this is done automatically if you run the PHPUnit tests).

I hope people building on top of our awesome platform are helped out a lot by this library. I’ll be putting in more work into making other application independent parts of our code reusable, so expect more such goodness in the near future. And as always, contributions are definitely welcome. You can create pull requests against the repo on GitHub or submit changes to gerrit, against the mediawiki-extensions-WikibaseDataModel project. If you have any questions, feel free to poke me or one of my colleagues on IRC at #wikidata on Freenode.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.