nrposner

Affordances in Library API Design

I was reading David Nicholson's Domain-driven software design is a good idea, still a couple days ago, and it struck a few thoughts loose in my head.

By way of summary, though you should read the original, Nicholson introduces the concept of 'Domain-Driven Design' coined by Eric Evans in his 2003 book of the same name, though the general concept is older.

In Nicholson's rendition, DDD involves software engineers and domain experts collaborating, not just to produce software, but to produce a pidgin language that enables closer collaboration and the instantiation of domain-level concepts into the code.

... you have a feeling that he is almost an anthropologist, going into this unfamiliar tribe of electrical engineers so he can learn their culture. I think this is a familiar feeling for anyone who has tried to translate some real-world domain into software, even if it’s part of a culture they feel like they belong to. Second, you really get a feel for his process. If you have ever gone through the process of designing software for some real-world domain, I bet the story really resonates with you.

The post goes on to explore why 'write code in terms of your domain' isn't quite as simple or straightforward as it seems—it can't quite be boiled down to existing frameworks like eXtreme Programming/Agile, nor is it intrinsically tied to styles like Object-Oriented Programming (OOP), though Evans was writing and providing examples in that style.

There's a lot to like here, but I also want to dig a little deeper into the discussion of library API design, brought up in this article and in some of the articles it references.


The section that stuck in my head addresses an objection Alex Chabot-Leclerc, providing feedback to Nicholson, asking why DDD isn't just a constrained flavor of OOP. I can see how one would come by that perception based on Nicholson's description, since 'write code in terms of your domain' does rhyme with rendering real-world concepts and objects as distinct classes with associated methods. But I think there's more to it here, even beyond Nicholson's response.

First, let me note that I'm quite firmly in the anti-OOP camp as far as programming styles go; I've never met an abstract base class I didn't want to refactor out of existence, I totally bounced off the style in my college Python coursework, and to this day I avoid using classes like the plague. I'm more partial to Data-Oriented Design (DOD), which conceives programming chiefly as the processes of performing transformations upon data (with the odd side effect thrown in as a treat).

Nevertheless, there are reasons OOP caught on as a style, and I think part of the reason is that it provides easier 'hooks' into something like DDD. If you're starting from a procedural or imperative style, you're much more likely to search for computer-shaped solutions to a given problem (which bleeds into the APIs you design), whereas if you're starting from an OOP style, you start by asking 'what objects?' which naturally should lead you to a conversation with your domain experts.

That is also, however, one of the chief strikes against OOP. When you start by trying to render the structure of your program itself in the likeness of some real world set of concepts—whether it's CS 101 pap about a Garage class which is a container for a Vehicle class that superclasses Cars, Trucks and Motorbikes which all share a 'wheels' property, or something more sophisticated—you've already placed constraints on your concrete implementation which lead you away from the most straightforward and efficient solution to your problem. The 'computer-shaped' solution is bound to be more efficient, even in seemingly trivial cases, than the 'anthropomorphized' solution—that's the natural outcome of your code running on a computer!

If it sounds like I'm setting up a dichotomy here, it's only to show how much this isn't a dichotomy in a couple sections—it would be extremely silly to conclude that performant code is somehow opposed to intuitively-structured code. Isn't that the benefit of encapsulation and implementation hiding?

Rather, I think this should guide our thinking with regard to designing APIs. Nicholson links Ben Hoyt's article on Designing Pythonic Library APIs. The advice there applies for library design in general, not just in Python, but I think it leaves some more transcendental principles on the table.

As a classic example of bad API design, it brings up the urllib module from the Python standard library:

manager = urllib.request.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, 'https://httpbin.org/', 'usr', 'pwd')
handler = urllib.request.HTTPBasicAuthHandler(manager)
opener = urllib.request.build_opener(handler)
response = opener.open('https://httpbin.org/basic-auth/usr/pwd')

In contrast, here's the much more user-friendly requests library:

response = requests.get(
    'https://httpbin.org/basic-auth/usr/pwd', 
    auth=('usr', 'pwd')
)

Shorter? Check. Easier to read and understand? Double check. And given my personal tastes, I'm happy not constructing multiple layers of classes just to make an http request.

But I think there's a more transcendental principle here, which we can elucidate by reference to affordances.

This term comes to us from psychology, by way of UX design, and refers to the perceived uses of an object derived from its shape. From physical objects like hammers and teapots, to the visual language of web interfaces, affordances imply desired behavior. When visible affordances and actual functionality line up, use of a tool is easy, and when they're at cross-purposes, use feels perverse.

It should be possible to extend the same notion to library APIs, though perhaps not immediately obvious how.

Take the example above: what do the affordances of the urllib and requests libraries look like, and which is better? As far as the example above shows, urllib has more classes and methods—is that necessarily worse, or better? How do we evaluate them in these terms?

I might be regurgitating settled concepts from actual UX/psychology, or perhaps I'm totally butchering it, but it seems natural to me to think about these things in terms of semantically complete operations, which is a term I just made up.

A semantically complete operation is a transformation upon data that begins in a conceptually coherent form, and ends in a conceptually coherent form.

Instead of just straightforwardly explaining what this means and why I think it's relevant, let's go on another tangent. You've read my blog, don't pretend you didn't expect this.

Semantically Complete Operations

By Way of a Lengthy Digression

There's an anecdote (I can't for the life of me find the original source, so let's treat this as a fable) about an MIT professor's introductory CS course. On the first day, before any lectures, the professor gave the students a pop quiz. The contents of the quiz themselves were very easy, just basic algebra and logic, but the professor could nevertheless use it to predict, with great confidence, which students would be at the top of the class and which would be at the bottom by the end of the semester.

Not with the grade on the quiz, mind—it was so easy that everyone would get perfect scores—but with the student's time to completion. The students who finished quickly would reliably be at the top of the class by the final, and vice versa, even before a single lecture had been given.

The reasoning here is that the speed of completion served as a proxy for the student's fluency and comfort with underlying concepts. Even though the slow finishers could eventually get to the end by taking things one step at a time, advancing requires more than that. If you put me on ice skates, I could eventually get from one end of the rink to another, moving carefully and taking things one step at a time. But I'd never be able to skate competently so long as I was doing that—the individual steps need to become second nature so that they can be chained together into more sophisticated movements, and those must become second nature so that they can chain into even more complex motions. Alyssa Liu does not expend mental effort or space on the little things. They've long since been sublimated into instinct, and she can instead focus on far more complex movements.

The same applies to all aspects of human endeavor, from athletics to the arts to mathematics, and the same goes for programming.

To take another example: a student in their first week of linear algebra has to expend mental effort performing a dot product. Once that's practiced and understood, the same student can perform it as part of an orthonormal projection, and they will not only be able to perform the operation fluently, but they'll understand what the operation means and how the individual components contribute to it. The orthonormal projection has become a semantically complete operation, which can be performed in a single mental 'pass' the same way we can take a step without thinking about it.

This digression is necessary to highlight that what constitutes a single step is not a property of the domain alone, but of the domain and the user together. DDD is necessarily a collaborative processes performed with other, concrete people with expertise in that domain. It does not exist in abstract isolation from actual practitioners.

Back to it

What makes the requests library's API superior to urllib in the previous example?

manager = urllib.request.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, 'https://httpbin.org/', 'usr', 'pwd')
handler = urllib.request.HTTPBasicAuthHandler(manager)
opener = urllib.request.build_opener(handler)
response = opener.open('https://httpbin.org/basic-auth/usr/pwd')
response = requests.get(
    'https://httpbin.org/basic-auth/usr/pwd', 
    auth=('usr', 'pwd')
)

It's not just that requests is shorter and more readable, though those are certainly advantages. It performs a coherent, semantically complete operation with a single method and set of inputs. We start from a form of data (url, auth) which the user understands as being conceptually unified, and end with another form of data that the user also understands as being conceptually unified (an http response), via a coherent transformation (.get()).

In contrast, the urllib API forces the user to fragment their conceptual landscape and contort it around implementation details unique to this API.

The vast, vast majority of users who want to get an HTTP response do not internally model this process the way urllib does, with a Manager that manually configures authentication which is passed to a Handler which is passed to an Opener that must then .open() the response. Perhaps a network engineer thinks in these terms, and maybe a well-intentioned network engineer designed this API to provide their users with lots of control and flexibility. But experts always overestimate how much fine control their median user actually wants or needs, and this comes at a cost to usability.

Also, at what point exactly did a signal actually leave my computer and come back with external information? Which verb corresponds to that step? Is it .open()? Because that implies something is already on my computer and I'm just accessing it. Where is the verb???

In contrast to all this confusion, there's requests.get(). I have a request. I get a response. Done.

For the typical user, 'pass in a url and maybe some auth then get the response' is the level they think about http. It's what they need in the moment. Even if they do have deeper understanding of networking, thinking in those more granular terms is only slowing them down. .get() is a single command corresponding to what feels, intuitively, like a single step with a coherent beginning and end point, which can itself be composed into more complex operations. Forcing them to decompose that unified step into granular operations (and in so doing, navigate more of your specific vocabulary and classes) is like forcing them to breathe manually in the middle of a song, or move each joint separately when dancing.


If we accept affordances as a valid construct in API design, and accept that some notion of semantic completeness/coherence is useful for bridging these concepts, what does this imply?

Going back to Hoyt's article, he's right to stress the Pythonic habit of defining classes as Nouns and methods as .verbs(). I think this quite neatly matches the intentions a user can come to your API with. They Have a Thing, and they want to Do Something with it—often to make another Thing. I find that this is also quite friendly to notions of Data-Oriented Design, perhaps more so than to OOP. There's data, transformations, and occasionally side effects (printing the data to terminal, etc). Transformations produce more data, and transformations are valid on some data but not others.

Each Noun is a possible entry point into your API, and each each constructor verb tells your user what kind of data can be used to enter it—each verb from that point may mutate the Noun, produce a side effect, or create a new Noun, with all its own options.

Though just because it can, doesn't mean it should. The mutating-method pattern has a place, but is that place inside your API? If your users are in a position to be consistently modifying individual fields of a Noun, then sure. Otherwise, let them define all fields in up-front construction and leave it be. A given method could mix these operations, transforming data and then displaying it, but should you design it that way? Maybe yes, maybe no. It's going to vary by design needs.

The art of good API design, specifically through a process of Domain-Driven Design, must develop an understanding of what Nouns and verbs the domain-expert users will actually have need of: what data do they generally start with, to inform what kinds of constructors you need to provide, and what do they need to do with it once it's constructed?

None of this precludes performant 'computer-shaped' implementations within your Nouns and verbs. It just means that those efficient, potentially verbose or obscure operations should be chained together and wrapped into a top-level API that operates in semantically complete steps.

There may also be some virtue in making those sub-steps part of your public API, for the users who do actually want to read through your concrete implementation to get lower-level control. But those are power-user tools. Your median user will benefit from clear and distinct Nouns whose utility and purpose are clear based on their domain knowledge, and associated methods that directly relate to the typical actions they want to perform using that Noun. This is a good place to actually use nested namespaces. The top-level, user-facing API should be re-exported from the project root, while the internals, even if you do make them public, should remain at the module level, since your power user knows what they're doing and this avoids cluttering up the root namespace.

The Inevitable Discussion of Typing

I think this manner of thinking naturally complements strong, pervasive typing.

This may be a matter of personal bias; I am a partisan of strong, static type systems, and consider Python's dynamic-ish typing to be one of its greatest weaknesses and the reason I avoid using it when possible. Virtually no function is a valid transformation over all data types, and any sensible API is going to be quite opinionated about what kind of data it accepts and returns.

Dynamic typing doesn't change this, it just passes the buck to manual, verbose type-checking and hand-written documentation. If I had a nickel for every time a refactor failed because the documentation was wrong and a function was actually accepting both scalar and vector types instead of just vector types in production, I'd have two nickels. Which isn't a lot, but it shouldn't have happened once.

I conceive of an API's Nouns as a walled garden: each constructor is a gateway into that walled garden, by way of primitive types, and from there, its associated verbs can transport you around that walled garden (to other custom Nouns) or out of it (transforming custom data types back into primitives).

Crucially, you don't actually need all that many user-facing Nouns: let's not replicate urllib's mistakes by forcing the user to memorize an entire forest of new Nouns just to accomplish basic tasks. Unless your API is truly wide-ranging, a small handful will be enough for 99% of user needs. Consider pandas or polars, which users primarily manipulate on the levels of dataframes and series. Construct a series or dataframe from primitive types to enter the walled garden, do your useful, nicely-typed work, and then export final results (or use them as inputs for another API).


Let's bring this all back together.

We start from a discussion of Domain-Driven Design, in which software engineers and domain experts (sometimes, but not always, the same people) collaborate to produce a mutual pidgin language that allows the domain to be modeled in code. We claim that this language can be evaluated in terms of affordances, with types and the transformations performed on them being understood as useful, purposeful items whose shape informs their intended use. This has implications for the design of library APIs, which should prioritize developing public Nouns and verbs that lend themselves to their users' anticipated needs by ensuring that the purpose of each Noun and verb is legible and semantically complete.

I think this all mostly holds together. I worry that the concept of a 'semantically complete step' is still underdefined here, and I've probably just blundered about replicating existing concepts I'm unaware of. But that's what blog posts are for.