My adventures with autoloading in PHP

This post has as audience developers and will provide readers with insights on how to cleanly autoload classes in PHP.

For a long long time, I’ve been one of those MediaWiki developers that just added classes and file names to $wgAutoloadClasses, without really knowing how this made class loading work magically. Sure, I was aware there is this autoloder thing somewhere inside of MediaWiki core, that somehow registers this information with a native PHP function. I however had no clue about how this is typically done, what the up and downsides of the MediaWiki approach are, and what alternatives exist.

Luckily for me, my shell of ignorance in this area was cracked a little over a year ago, as I started working on standalone code. Writing such standalone code was a much bigger awakening from ignorance, though that is definitely a topic of its own. By creating code independent of MediaWiki, I was forced to find a replacement for this $wgAutoloadClasses thing I had become so used to. I went through several steps in how to do this before arriving at my current approach. I’ll briefly outline them and explain what made me switch to a better alternative for each.

The first step was having something quite equivalent to $wgAutoloadClasses. An associative array mapping fully qualified class names to file paths. This is quite a maintenance hassle. Every time you add, move or remove a file, the array needs to be updated. As you can imagine, this is merge conflict hell.

The second approach decreased the maintenance hassle by introducing the assumption that the file name could be derived from the class name. The nice thing about this is that it enforces consistency between file and class names, and also forces all classes that are references from other files to reside in their own file. Both these things where in the coding conventions of my projects already, though without these rules actually being used, nothing would scream at you when you violated them, and thus inconsistency gradually grew over time. One still needed to maintain a list though, and merge conflict hell remained.

And then I learned of PSR-0. If you are not familiar with this standard yet as a PHP developer, read it, you are not being serious with yourself if you don’t know it by now. It is the now generally accepted standard on how to autoload PHP code. My second step was already much in the direction of PSR-0, and the main difference in the third one was that I replaced the lists of classes by some small boilerplate code that mapped all classes in a namespace to their PSR-0 derived file paths. This is an example of such code from Diff version 0.9:

There are some things to note about this particular example.

First of all, I had something like this in all libraries I maintained. If you look closely here, you will see that I forgot to update the name of a variable, which still mentions “QueryEngine”, which I’m now assuming is the component where I copied this code from when I started WikibaseDatabase.

Secondly, it has a bunch of code specific to the namespace used. If there where multiple top level namespaces used, this code would become quite a bit more verbose. Another factor adding to the verbosity is the exclusion of the test namespace.

A final issue with this code is that it will load any class in the Wikibase\Database namespace from the location this code computes. If it is not there, you get an error. This happened quite a few times by misspelling class names in test case, which then causes a “file not found” error, coming from this autoloding code, leaving it up to you to find where it is actually wrong. On top of that annoyance, the code as is, essentially prevents any extension to this component from defining classes in Wikibase\Database, as WikibaseDatabase would try to load them from its own src/ folder.

For some time I’ve been thinking of just ditching these piles of ugly boilerplate autoloading code, and just requiring the consumer of my packages to autoload things as specified in the package definition (ie composer.json). I got this idea after seeing a lot of different libraries do this exact same thing.

What made me hesitant to do so is that binding to any tool, like binding to any framework, is not particularly nice. I want my libraries to be usable without forcing people to also use some particular tool, in this case Composer. Though of course they are not really forced to use Composer. They are only forced to make use of the information in the package definition, which happens to be in Composer format. This can be easily interpreted without using Composer though, as it is a simple json format. Or alternatively the consumer can simply copy the information in whatever autoloading system they use. This also does mean the autoloading system used by the consumer needs to support the kind of loading the library uses. For instance, MediaWiki would be in trouble for most libraries, as it only has classmap autoloading.

A second concern that kept me from doing this is that some kind of action further then including the library entry point is needed to have its autoloading work. Since I mostly work with MediaWiki extensions, and have several libraries that pretend to be MediaWiki extensions, this is problematic, so to the lack of modern autoloading facilities in MediaWiki. Recently I added support for installing extensions via Composer into MediaWiki, though this does require a change in workflow for users. So most people are still using the old workflow, in which the Composer support is not of help to them. I got over this concern by creating some libraries where I simply decided to not care about MediaWiki, and decided that if MediaWiki extensions wanted to use them, then they’d have to deal with any restrictions forced upon them by MediaWiki, rather then my libraries themselves.

Here you have an example of the autoload section of the first release of DataValues Geo, which is using my now preferred autoloading approach:

As you can see, there still is an entry point file, which will always be included. It however no longer contains an autoloader, it just defines a version constant.

The psr-0 mapping causes the autoloader to look for classes in the DataValues namespace in the src/DataValues folder. One great thing also being that it will not cause any issues if a class in this namespace is not in there, as indeed is the case for most such classes. And if you are wondering about performance – Composer has an optimization feature, and there are quite a few psr-0 loaders out there that do as well.

And that is really all you need. No need for lists that are maintenance nightmares and cause pointless merge conflicts. No need for inventing your own boilerplate code that you copy and paste between projects. All the work can be handled by great tools provided by the PHP community, and that you can use in your project today, if you do the little effort to specify where your classes can be found in your package definition.

I’m looking forward to Composer supporting a “namespace prefix” option as defined by the proposed PSR-4 standard. When using the PSR-0 loading support of Composer and you have all classes of your library in Foo\Bar\Baz\Bah, then you have to put ClassName into src/Foo/Bar/Baz/Bah/ClassName.php, while you likely want this to be mapped onto src/ directly. In such cases I’m currently still using a custom autoloader in the entry point file, or using the automatic classmap support Composer provides.

To end this blog post, a tip to PHPUnit users: create a bootsrapping file with something as follows:

This allows people to download your project, change into its directory, type “phpunit”, and have the tests run. This also makes setting up a Continuous Integration system one step more trivial, and prevents you from having headaches if your code uses a classmap that needs to be regenerated after adding classes.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.