Introduction to Iterators and Generators in PHP

In this post I demonstrate an effective way to create iterators and generators in PHP and provide an example of a scenario in which using them makes sense.

Generators have been around since PHP 5.5, and iterators have been around since the Planck epoch. Even so, a lot of PHP developers do not know how to use them well and cannot recognize situations in which they are helpful. In this blog post I share insights I have gained over the years, that when sharing, always got an interested response from colleague developers. The post goes beyond the basics, provides a real world example, and includes a few tips and tricks. To not leave out those unfamiliar with Iterators the post starts with the “What are Iterators” section, which you can safely skip if you can already answer that question.

What are Iterators

PHP has an Iterator interface that you can implement to represent a collection. You can loop over an instance of an Iterator just like you can loop over an array:

Why would you bother implementing an Iterator subclass rather than just using an array? Let’s look at an example.

Imagine you have a directory with a bunch of text files. One of the files contains an ASCII NyanCat (~=[,,_,,]:3). It is the task of our code to find which file the NyanCat is hiding in.

We can get all the files by doing a glob( $path . '*.txt' ) and we can get the contents for a file with a file_get_contents. We could just have a foreach going over the glob result that does the file_get_contents. Luckily we realize this would violate separation of concerns and make the “does this file contain NyanCat” logic hard to test since it will be bound to the filesystem access code. Hence we create a function that gets the contents of the files, and ones with our logic in it:

While this approach is decoupled, a big drawback is that now we need to fetch the contents of all files and keep all of that in memory before we even start executing any of our logic. If NyanCat is hiding in the first file, we’ll have fetched the contents of all others for nothing. We can avoid this by using an Iterator, as they can fetch their values on demand: they are lazy.

Our TextFileIterator gives us a nice place to put all the filesystem code, while to the outside just looking like a collection of texts. The function housing our logic, findTextWithNyanCat, does not know that the text comes from the filesystem. This means that if you decide to get texts from the database, you could just create a new DatabaseTextBlobIterator and pass it to the logic function without making any changes to the latter. Similarly, when testing the logic function, you can give it an ArrayIterator.

I wrote more about basic Iterator functionality in Lazy iterators in PHP and Python and Some fun with iterators. I also blogged about a library that provides some (Wikidata specific) iterators and a CLI tool build around an Iterator. For more on how generators work, see the off-site post Generators in PHP.

PHP’s collection type hierarchy

Let’s start by looking at PHP’s type hierarchy for collections as of PHP 7.1. These are the core types that I think are most important:

  •  iterable
    • array
    • Traversable
      • Iterator
        • Generator
      • IteratorAggregate

At the very top we have iterable, the supertype of both array and Traversable. If you are not familiar with this type or are using a version of PHP older than 7.1, don’t worry, we don’t need it for the rest of this blog post.

Iterator is the subtype of Traversable, and the same goes for IteratorAggregate. The standard library iterator_ functions such as iterator_to_array all take a Traversable. This is important since it means you can give them an IteratorAggregate, even though it is not an Iterator. Later on in this post we’ll get back to what exactly an IteratorAggregate is and why it is useful.

Finally we have Generator, which is a subtype of Iterator. That means all functions that accept an Iterator can be given a Generator, and, by extension, that you can use generators in combination with the Iterator classes in the Standard PHP Library such as LimitIterator and CachingIterator.

IteratorAggregate + Generator = <3

Generators are a nice and easy way to create iterators. Often you’ll only loop over them once, and not have any problem. However beware that generators create iterators that are not rewindable, which means that if you loop over them more than once, you’ll get an exception.

Imagine the scenario where you pass in a generator to a service that accepts an instance of Traversable:

The service class in which doStuff resides does not know it is getting a Generator, it just knows it is getting a Traversable. When working on this class, it is entirely reasonable to iterate though $things a second time.

This blows up if the provided $things is a Generator, because generators are non-rewindable. Note that it does not matter how you iterate through the value. Calling iterator_to_array with $things has the exact same result as using it in a foreach loop. Most, if not all, generators I have written, do not use resources or state that inherently prevents them from being rewindable. So the double-iteration issue can be unexpected and seemingly silly.

There is a simple and easy way to get around it though. This is where IteratorAggregate comes in. Classes implementing IteratorAggregate must implement the getIterator() method, which returns a Traversable. Creating one of these is extremely trivial:

If you call getIterator, you’ll get a Generator instance, just like you’d expect. However, normally you never call this method. Instead you use the IteratorAggregate just as if it was an Iterator, by passing it to code that expects a Traversable. (This is also why usually you want to accept Traversable and not just Iterator.) We can now call our service that loops over the $things twice without any problem:

By using IteratorAggregate we did not just solve the non-rewindable problem, we also found a good way to share our code. Sometimes it makes sense to use the code of a Generator in multiple classes, and sometimes it makes sense to have dedicated tests for the Generator. In both cases having a dedicated class and file to put it in is very helpful, and a lot nicer than exposing the generator via some public static function.

For cases where it does not make sense to share a Generator and you want to keep it entirely private, you might need to deal with the non-rewindable problem. For those cases you can use my Rewindable Generator library, which allows making your generators rewindable by wrapping their creation function:

A real-world example

A few months ago I refactored some code part of the Wikimedia Deutschland fundraising codebase. This code gets the filesystem paths of email templates by looking in a set of specified directories.

This code made the class bound to the filesystem, which made it hard to test. In fact, this code was not tested. Furthermore, this code irked me, since I like code to be on the functional side. The array_walk mutates its by-reference variable and the assignment at the end of the loop mutates the return variable.

This was refactored using the awesome IteratorAggregate + Generator combo:

Much easier to read/understand code, no state mutation whatsoever, good separation of concerns, easier testing and reusability of this collection building code elsewhere.

See also: Use cases for PHP generators (off-site post).

Tips and Tricks

Generators can yield key value pairs:

You can use yield in PHPUnit data providers.

You can yield from an iterable.

Thanks for Leszek Manicki and Jan Dittrich for reviewing this blog post.

Leave a Reply