Wikibase DataModel Services

I’m happy to announce the immediate availability of a new Wikibase library: Wikibase DataModel Services (which I’ll in this blog post refer to as DMS).

Rationale behind the library

The main motivation for introducing this new library is to reduce technical debt and draw more solid architectural boundaries in the Wikibase code.

Near the beginning of the Wikibase project we had all our code in the Wikibase.git repository. This could be subdivided into 3 big parts: Wikibase Repository (MediaWiki extension), Wikibase Client (MediaWiki extension) and Wikibase Lib. The latter got introduced to hold code needed by both Wikibase Client and Wikibase Repository. While on the surface that was a reasonable idea, a lot of the things that could go wrong with it unfortunately did. No real boundaries were created in the code, resulting in a tightly coupled blob that even included global state and circular dependencies on Wikibase Client and Wikibase Repository. There also was no contract for what should or can go into Wikibase Lib. Things needed by only one of the MediaWiki extensions were added, on the grounds of them being potentially useful elsewhere, while similar code was left in one of the extensions.

About a year into the project we realized that things were going in the wrong direction. Even though there was some disagreement on the extent of the problem, the consensus to get rid of Wikibase Lib emerged over time. The first big step there was the extraction of Wikibase DataModel, which resulted in the moved-out code improving in quality significantly more than what was left behind. This set the stage for the creation of additional components such as the Wikibase DataModel Serialization one that has replaced a chunk of Wikibase Lib code. Unfortunately a lot of the code in Wikibase Lib does not belong to a whole as cohesive as the serialization component. This has led to such code not being moved out, and indeed, has resulted in many new classes being added to Wikibase Lib over time, all but negating extraction work.

This is not that hard to understand when considering the dilemma faced by people introducing new code needed by both Wikibase Client and Wikibase Repository. Either it needs to go into Wikibase Lib, or a component further down the dependency graph such as Wikibase DataModel. I’ve certainly added several classes to Wikibase DataModel since that seemed to be the best place to put them at the time, polluting this component with DataModel related services that nevertheless are not needed for defining the data model itself. A third place where code inappropriately found its home are the MediaWiki extensions themselves. Code dealing with domain logic or otherwise being application independent is best kept devoid of framework binding.

All of this taken together suggested the creation of a new library sitting between Wikibase DataModel and the MediaWiki extensions. A library to collect the functionality for which no cohesive component can be created, or for which such creation is not justified. One concern comes up right away: won’t this new general library become a dump ground and ball of mud like Wikibase Lib? To avoid any such fate, we carefully defined the requirements code must satisfy before it is allowed into the component. Such code must…

  • Be using Wikibase DataModel
  • Not belong to a more specific component (such as the Serialization component)
  • Not introduce heavy dependencies to this component (database, framework, big libraries, etc)
  • Not be presentation code

Current state: Wikibase DataModel Services 1.1

This library is of particular interest to third parties as it makes code that used to be bound to the Wikibase Client and Wikibase Repository MediaWiki extensions reusable. The following list contains the newly available classes and interfaces:

  • DataValue\ValuesFinder
  • Entity\EntityPrefetcher
  • Entity\EntityRedirectResolvingDecorator
  • Entity\NullEntityPrefetcher
  • EntityId\EntityIdFormatter
  • EntityId\EntityIdLabelFormatter
  • EntityId\EscapingEntityIdFormatter
  • EntityId\PlainEntityIdFormatter
  • EntityId\SuffixEntityIdParser
  • Lookup\EntityLookup
  • Lookup\EntityRedirectLookup
  • Lookup\EntityRetrievingDataTypeLookup
  • Lookup\EntityRetrievingTermLookup
  • Lookup\LabelDescriptionLookup
  • Lookup\LanguageLabelDescriptionLookup
  • Lookup\TermLookup
  • Statement\StatementGuidValidator
  • Tern\PropertyLabelResolver
  • Term\TermBuffer

These have all been moved from Wikibase Lib. DMS also contains code that uses to be in Wikibase DataModel, and got moved out in version 4.0:

  • Entity diffing and patching functionality in Services\Diff
  • EntityIdParser and basic implementations in Services\EntityId
  • ItemLookup, PropertyLookup and PropertyDataTypeLookup interfaces
  • Statement GUID parser and generators in Services\Statement
  • ByPropertyIdGrouper

Into the future!

The 20 or so classes and interfaces we moved from Wikibase Lib are just the start. We’re taking an incremental approach to moving over the code to avoid needing to maintain two copies and synchronize changes from Wikibase Lib to their moved copies. So new releases with additional functionality can be expected in the near future.

As with the other Wikibase libraries, contributions are very welcome, and can be done without much setup work or the need to understand our entire codebase. You can find instructions of how to install the library and run its tests in its README file. Changes relevant to users of the library are always mentioned in the RELEASE-NOTES.

3 thoughts on “Wikibase DataModel Services”

  1. Congrats, it’s preety neat to have such components available for other softwares 🙂

    Talking about reuse : I’d love a Lua interface in Wikidata to this 🙂 This would factor stuffs like formatters and would make Wikidata more consistent, would allow community to make more tests, formating and so on …

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.