Generic Entity handling code

In this blog post I outline my thinking on sharing code that deals with different types of Entities in your domain. We’ll cover what Entities are, code reuse strategies, pitfalls such as Shotgun Surgery and Anemic Domain Models and finally Bounded Contexts.

Why I wrote this post

I work at Wikimedia Deutschland, where amongst other things, we are working on a software called Wikibase, which is what powers the Wikidata project. We have a dedicated team for this software, called the Wikidata team, which I am not part of. As an outsider that is somewhat familiar with the Wikibase codebase, I came across a writeup of a perceived problem in this codebase and a pair of possible solutions. I happen to disagree with what the actual problem is, and as a consequence also the solutions. Since explaining why I think that takes a lot of general (non-Wikibase specific) explanation, I decided to write a blog post.

DDD Entities

Let’s start with defining what an Entity is. Entities are a tactical Domain Driven Design pattern. They are things that can change over time and are compared by identity rather than by value, unlike Value Objects, which do not have an identity.

Wikibase has objects which are conceptually such Entities, though are implemented … oddly from a DDD perspective. In the above excerpt, the word entity, is confusingly, not referring to the DDD concept. Instead, the Wikibase domain has a concept called Entity, implemented by an abstract class with the same name, and derived from by specific types of Entities, i.e. Item and Property. Those are the objects that are conceptually DDD Entities, yet diverge from what a DDD Entity looks like.

Entities normally contain domain logic (the lack of this is called an Anemic Domain Model), and don’t have setters. The lack of setters does not mean they are immutable, it’s just that actions are performed through methods in the domain language (see Ubiquitous Language). For instance “confirmBooked()” and “cancel()” instead of “setStatus()”.

The perceived problem

What follows is an excerpt from a document aimed at figuring out how to best construct entities in Wikibase:

Some entity types have required fields:

  • Properties require a data type
  • Lexemes require a language and a lexical category (both ItemIds)
  • Forms require a grammatical feature (an ItemId)

The ID field is required by all entities. This is less problematic however, since the ID can be constructed and treated the same way for all kinds of entities. Furthermore, the ID can never change, while other required fields could be modified by an edit (even a property’s data type can be changed using a maintenance script).

The fact that Properties require the data type ID to be provided to the constructor is problematic in the current code, as evidenced in EditEntity::clearEntity:

…as well as in EditEntity::modifyEntity():

Such special case handling will not be possible for entity types defined in extensions.

It is very natural for (DDD) Entities to have required fields. That is not a problem in itself. For examples you can look at our Fundraising software.

So what is the problem really?

Generic vs specific entity handling code

Normally when you have a (DDD) Entity, say a Donation, you also have dedicated code that deals with those Donation objects. If you have another entity, say MembershipApplication, you will have other code that deals with it.

If the code handling Donation and the code handing MembershipApplication is very similar, there might be an opportunity to share things via composition. One should be very careful to not do this for things that happen to be the same but are conceptually different, and might thus change differently in the future. It’s very easy to add a lot of complexity and coupling by extracting small bits of what would otherwise be two sets of simple and easy to maintain code. This is a topic worthy of its own blog post, and indeed, I might publish one titled The Fallacy of DRY in the near future.

This sharing via composition is not really visible “from outside” of the involved services, except for the code that constructs them. If you have a DonationRepository and a MembershipRepository interface, they will look the same if their implementations share something, or not. Repositories might share cross-cutting concerns such as logging. Logging is not something you want to do in your repository implementations themselves, but you can easily create simple logging decorators. A LoggingDonationRepostory and LoggingMembershipRepository could both depend on the same Logger class (or interface more likely), and thus be sharing code via composition. In the end, the DonationRepository still just deals with Donation objects, the MembershipRepository still just deals with Membership objects, and both remain completely decoupled from each other.

In the Wikibase codebase there is an attempt at code reuse by having services that can deal with all types of Entities. Phrased like this it sounds nice. From the perspective of the user of the service, things are great at first glance. Thing is, those services then are forced to actually deal with all types of Entities, which almost guarantees greater complexity than having dedicated services that focus on a single entity.

If your Donation and MembershipApplication entities both implement Foobarable and you have a FoobarExecution service that operates on Foobarable instances, that is entirely fine. Things get dodgy when your Entities don’t always share the things your service needs, and the service ends up getting instances of object, or perhaps some minimal EntityInterface type.

In those cases the service can add a bunch of “if has method doFoobar, call it with these arguments” logic. Or perhaps you’re checking against an interface instead of method, though this is by and large the same. This approach leads to Shotgun Surgery. It is particularly bad if you have a general service. If your service is really only about the doFoobar method, then at least you won’t need to poke at it when a new Entity is added to the system that has nothing to do with the Foobar concept. If the service on the other hands needs to fully save something or send an email with a summary of the data, each new Entity type will force you to change your service.

The “if doFoobar exists” approach does not work if you want plugins to your system to be able to use your generic services with their own types of Entities. To enable that, and avoid the Shotgun Surgery, your general service can delegate to specific ones. For instance, you can have an EntityRepository service with a save method that takes an EntityInterface. In it’s constructor it would take an array of specific repositories, i.e. a DonationRepository and a MembershipRepository. In its save method it would loop through these specific repositories and somehow determine which one to use. Perhaps they would have a canHandle method that takes an EntityInterface, or perhaps EntityInterface has a getType method that returns a string that is also used as keys in the array of specific repositories. Once the right one is found, the EntitiyInterface instance is handed over to its save method.

This delegation approach is sane enough from a OO perspective. It does however involve specific repositories, which begs the question of why you are creating a general one in the first place. If there is no compelling reason to create the general one, just stick to specific ones and save yourself all this not needed complexity and vagueness.

In Wikibase there is a generic web API endpoint for creating new entities. The users provide a pile of information via JSON or a bunch of parameters, which includes the type of Entity they are trying to create. If you have this type of functionality, you are forced to deal with this in some way, and probably want to go with the delegation approach. To me having such an API endpoint is very questionable, with dedicated endpoints being the simpler solution for everyone involved.

To wrap this up: dedicated entity handling code is much simpler than generic code, making it easier to write, use, understand and modify. Code reuse, where warranted, is possible via composition inside of implementations without changing the interfaces of services. Generic entity handling code is almost always a bad choice.

On top of what I already outlined, there is another big issue you can run into when creating generic entity handling code like is done in Wikibase.

Bounded Contexts

Bounded Contexts are a key strategic concept from Domain Driven Design. They are key in the sense that if you don’t apply them in your project, you cannot effectively apply tactical patterns such as Entities and Value Objects, and are not really doing DDD at all.

“Strategy without tactics is the slowest route to victory. Tactics without strategy are the noise before defeat.” — Sun Tzu

Bounded Contexts allow you to segregate your domain models, ideally having a Bounded Context per subdomain. A detailed explanation and motivation of this pattern is out of scope for this post, but suffice to say is that Bounded Contexts allow for simplification and thus make it easier to write and maintain code. For more information, I can recommend Domain-Driven Design Destilled.

In case of Wikibase there are likely a dozen or so relevant subdomains. While I did not do the analysis to create a comprehensive picture of which subdomains there are, which types they have, and which Bounded Contexts would make sense, a few easily stand out.

There is the so-called core Wikibase software, which was created for Wikidata.org, and deals with structured data for Wikipedia. It has two types of Entities (both in the Wikibase and in the DDD sense): Item and Property. Then there is (planned) functionality for Wiktionary, which will be structured dictionary data, and for Wikimedia Commons, which will be structured media data. These are two separate subdomains, and thus each deserve their own Bounded Context. This means having no code and no conceptual dependencies on each other or the existing Big Ball of Mud type “Bounded Context” in the Wikibase core software.

Conclusion

When standard approaches are followed, Entities can easily have required fields and optional fields. Creating generic code that deals with different types of entities is very suspect and can easily lead to great complexity and brittle code, as seen in Wikibase. It is also a road to not separating concepts properly, which is particularly bad when crossing subdomain boundaries.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.