Earth-shattering project ideas. Resumes. Things in between.
Much has been, and will be, written on how to document computer programs. What follows is my own feeble contribution to the mix.
Maybe we’ve all had the experience of coming back to some bag-nasty forgotten code we wrote six months ago. Maybe you cursed the anonymous author until you realized who it was. Maybe you flailed about in flabbergasted disbelief. Maybe you hung your head in shame.
Maybe you even resolved to do something about it, but never felt like you had time?
There’s a case to be made that new graduates won’t actually have had this experience just yet. Why? Well, they work on a project for a semester and then it’s history. Next semester; next project. Very little motivation to look back at old code.
Later, after you stay at the same project for a while, you eventually hit that milestone of identifying with the proverbial “maintainer”.
In point of fact, documentation is an investment. Like our retirement account, it works best if we:
Can we leverage that investment to reap rewards of speed, quality, reliability, and maintainability? Yes, we can.
The sections that follow are organized from low-level to high-level. You’ll find low-level documentation easier to work on in very small bites. The higher-level stuff will require greater collaboration with team-members but will probably have a proportionally higher pay-off for progress made.
Most languages these days have some sort of convention for API documentation extracted directly from the source code. That’s a good place for it, because you don’t have to switch tools and anyway the commentary is right there next to the code. Let’s call this the doc-comment, regardless of how it’s physically done in your favorite programming contraption.
The effort to compose documentation at this level as you write the code is rather akin to that of “developing proof and program in tandem” which Dijkstra advocated. In my eyes, the “correctness criteria” for a subroutine is given in the doc-comment and then the body code is required to behave as advertised.
I’m asking that you write a quality doc-comment in synchrony with writing any new function. (You’ll often find it’s easier and more informative than writing a unit test.) For modifications to existing functions, it’s sufficient to leave things even slightly better than you found them: maybe improve the aspect of the doc-comment that most pertains to the change you made. And for bug-hunts? Well, if you learn something along the way, this is a good time to write it down.
You can argue the point of code/comment getting out of sync: the “trust only code” idea. That’s wrong: The code is the code. The doc-comment is the requirement, stated in human language, which the code is meant to implement. The requirement is law. The code is behavior. If they disagree, the fault lies most probably in the code. Still, you should trace your requirements and testing before making changes willy-nilly.
It should probably have a structure like this:
Element | Examples/Cases/Details |
---|---|
Plain English Command Form | Tell what the function does from the caller’s perspective. “Fetch a pail of water.” or “Return twice the argument.” |
Optional Hyperlink | If the function implements something published in an academic paper or book, link to a reference. If it’s an xUnit test case related to a bug or feature request, link to the corresponding ticket in your tracking system. |
Signature Clarifications | Semantics of parameters and return values/exceptions. |
Contract | Preconditions, postconditions, or performance guarantees. |
Fine Print | Any further nuance that must be kept in mind when calling this function. (e.g. Remember to later call XYZ.) |
Policy * | If the function implements a defined policy, then explain what that policy is, and why that policy applies (or link to a canonical reference). |
Background | Any help to understand why this function exists or why you might want to call it. (e.g. it was found to reflect a common pattern of control and factored out.) |
Where it makes sense, doctest
-style examples may be interspersed into the above structure to illustrate points.
Two things are important to leave out:
Now you have a clear specification. The corresponding code should follow naturally.
The real meaning of “policy” in the doc-comment is for when the behavior of a function is specifically dictated by some system requirement. For instance, if you’re never supposed to have more than six people in the fitting room because the corporate loss prevention department said so in circular A-12-26-B, then somewhere between [event:let-me-in] and [event:right-this-way] is the number 6, and that number implements policy so the surrounding function (or module, etc…) had best link back to circular A-12-26-B!
Policy should be distinguished from the arbitrary or design-driven selection of mechanism.
For instance: Suppose you’re writing a function to assign players to teams. Do you:
Sometimes it’s up to you. The caller doesn’t care; she just needs teams and the mechanism is up to you. In this case, the callee decides mechanism. As such, you’ve got some thinking to do: Which way of allocating teams is best in this context? Why? Is there a source or standard behind your reasoning? Document it – but where? If you consider this a private implementation detail subject to possible change, then put it as a normal comment, and explain as much. If you consider this to be a done decision which clients may potentially rely on, then it’s part of the postcondition. And if there’s a local rule that says you have to do this a certain way, then mention that rule in the policy section of the doc-comment.
There are going to be cases where the caller expects you to specifically follow one of these strategies. If so, then that expectation forms a part of the API and choice of mechanism comes from the caller, so you are implementing mechanism (not policy). Again, include the choice as part of the postcondition in the contract. Incidentally, you may also want to name the function in some way that reflects the expectation.
Assuming your doc-comment is written decently, remaining code comments have basically two purposes:
Clarify what (some section of code) is doing, explained at a higher level.
In this case, stop and see if you can extract a function. It doesn’t matter how small, or whether it’s only called once ever. You’ve identified a meaningful unit of abstract computation. Stop worrying and learn to love functional abstraction. It’s our primary means to conquer complexity.
Clarify why (some section of code) is designed this particular way, either in contrast to something (apparently) simpler or just as pure educational content.
First, decide if the added complexity is worthwhile. If the answer is yes, then GREAT! Leave this in. Don’t go overboard, but this is the kind of commentary that makes maintainers happy.
There is no how. The how is in the code, and the code only – although you may be confusing this with case #1, above.
Things like these should go in a class’s docstring:
Additionally, just below the docstring you should probably make note of any internal class-invariants which your implementation relies on for correctness, because these are not always obvious just from looking at method body code.
In modern functional programming, you hear of “type-classes”. This concept is more like an abstract data type, but expressed in terms of the operations any concrete manifestation of that type must support. You can think of them as roughly equivalent to what Java calls an “interface”. To that end, they appear in this section.
Most of the documentation needs will be similar between the abstraction and the concrete implementation, because in any event the caller treats either as an abstraction. However, these overt abstractions have an additional audience: those who are implementing a derived concrete class.
Therefore, please include (if not obvious from context):
The key to understanding a module’s role in a larger system is to understand the API it presents, for that API is an abstraction boundary.
A module’s API is best understood as a collection of abstract data-types (ADTs) and the operations between them, which collectively the module implements and exports. (A concise, coherent, orthogonal set of such abstractions reflects a high-quality API.)
Therefore, module-level documentation should:
I can’t resist throwing in a little advice:
The seminal primary source on modularity is probably On the criteria to be used in decomposing systems into modules, David L. Parnas, 1971, Carnegie Mellon University, Pittsburgh, PA It’s a fine read as an academic paper, not too long, and the points are every bit as valid 50 years later despite the archaic notation in the examples. (One comment: In 1971 there were still people worried about the overhead of a subroutine call, and Parnas spent a paragraph or two addressing that fear. In modern systems, that overhead is the least of your worries.)
TL;DR: divide module boundaries according to separation of concerns, not jobs, and you’ll have a flexible, sustainable system. Violations of this principle will prevent you from having nice things. This concept applies at all levels of modularity: between individual statements, stanzas, functions, module files, packages, services, and even between applications.
The function and purpose of project-level documentation is to take the reader on a journey of exposition, from ignorance to awareness to understanding to evaluation, of project requirements, issues, design/architecture, deployment, maintenance, and shortcomings.
Recall that the right way to divide modules is separation of concerns (not tasks). I believe the same separation inspires the smart way to organize documentation: distinct concerns and the connections (or gaps) between them.
What is a concern? A concern is anything that adds complexity to solving a problem, whether high-level or low, whether essence or accident.
You can think of any particular atomic fragment of documentation as falling along a horizontal axis representing the subject matter (or concern) it deals with, and a vertical axis of how sophisticated the reader is (so far) with that subject.
Most probably, a process of progressive refinement will yield a general outline, or hierarchy of separated concerns similar to the Dewey decimal system. Within each subheading, the body text should elaborate the journey-of-exposition for that particular concern, with sub-sub-headings and cross-references as appropriate.
Because well-factored code will reflect just such a hierarchy, you should be able to embody much of your outline through the structuring conventions of your source language. Thus, the sections above about doc-comments. However, some things transcend code. Also, there are bound to be tricky interfaces betweeen concerns which merit special care and commentary. Many of those topics are listed below:
What’s the overall mission, and our relationship to it?
What principles and values are core to project success?
How do we think about, and balance:
Interpersonal: Free clue: Meet these people.
What are the major “system metaphors”?
Note any confusing vocabulary or words overloaded with many definitions. Do you have jargon specific to your company, task force, project, or team?
Describe the system boundary: What systems and actors provide and consume the inputs and outputs?
In what environment must the system work? Linux? The surface of another planet?
A project might reasonably be broken into sub-projects based on scope, technology, or other factors. At this level, pretty pictures (e.g. functional block diagram) are nice but not absolutely necessary. Note how the team talks about these distinctions.
What subprojects are there? What are they for, and how do they communicate?
For each (sub-)*box
in the functional-block diagram, ask and answer the following questions:
For each intelligible subprocess:
Between subprocesses:
What schemas, structures, and file formats are particularly relevant? How do they fit into the system?
For relational databases, you probably want three views:
For non-relational and/or document-oriented data stores, you will have some concept of an accessor key for a chunk of related data. That key might look like a pathname, or it may have no particular structure, but one way or another you have a mental model of navigating from zero to data. Effective documentation for a given data store must answer these questions:
Also, you’ll want to note:
Dependency inversion strategy: We’d like to be able to “mount a scratch monkey” in many ways, so how are real and mock bits constructed and composed, as far as each caste of sub-process?
What about process management: creating, monitoring, signaling, and collecting results of child processes? Is there a utility API for this?
This is the kind of thing you’ll have in a slide deck. Most teams might not think to write it down, but the culture of a team is worth recording. Here are some questions to ask:
Overall group dynamics:
Specific to development teams:
Also, an interesting interview would be to ask each member of the team