The Art of Unix Programming

https://arp242.net/the-art-of-unix-programming/

More of the Unix philosophy was implied not by what these elders said but by what they did and the example Unix itself set. Looking at the whole, we can abstract the following ideas:

Rule of Modularity: Write simple parts connected by clean interfaces. Rule of Clarity: Clarity is better than cleverness. Rule of Composition: Design programs to be connected to other programs. Rule of Separation: Separate policy from mechanism; separate interfaces from engines. Rule of Simplicity: Design for simplicity; add complexity only where you must. Rule of Parsimony: Write a big program only when it is clear by demonstration that nothing else will do. Rule of Transparency: Design for visibility to make inspection and debugging easier. Rule of Robustness: Robustness is the child of transparency and simplicity. Rule of Representation: Fold knowledge into data so program logic can be stupid and robust. Rule of Least Surprise: In interface design, always do the least surprising thing. Rule of Silence: When a program has nothing surprising to say, it should say nothing. Rule of Repair: When you must fail, fail noisily and as soon as possible. Rule of Economy: Programmer time is expensive; conserve it in preference to machine time. Rule of Generation: Avoid hand-hacking; write programs to write programs when you can. Rule of Optimization: Prototype before polishing. Get it working before you optimize it. Rule of Diversity: Distrust all claims for “one true way”. Rule of Extensibility: Design for the future, because it will be here sooner than you think.

Rule of Clarity: Clarity is better than cleverness. Because maintenance is so important and so expensive, write programs as if the most important communication they do is not to the computer that executes them but to the human beings who will read and maintain the source code in the future (including yourself).

In the Unix tradition, the implications of this advice go beyond just commenting your code. Good Unix practice also embraces choosing your algorithms and implementations for future maintainability. Buying a small increase in performance with a large increase in the complexity and obscurity of your technique is a bad trade — not merely because complex code is more likely to harbor bugs, but also because complex code will be harder to read for future maintainers.

Code that is graceful and clear, on the other hand, is less likely to break — and more likely to be instantly comprehended by the next person to have to change it. This is important, especially when that next person might be yourself some years down the road.

Never struggle to decipher subtle code three times. Once might be a one-shot fluke, but if you find yourself having to figure it out a second time — because the first was too long ago and you've forgotten details — it is time to comment the code so that the third time will be relatively painless. – Henry Spencer

Rule of Separation: Separate policy from mechanism; separate interfaces from engines. In our discussion of what Unix gets wrong, we observed that the designers of X made a basic decision to implement “mechanism, not policy” —to make X a generic graphics engine and leave decisions about user-interface style to toolkits and other levels of the system. We justified this by pointing out that policy and mechanism tend to mutate on different timescales, with policy changing much faster than mechanism. Fashions in the look and feel of GUI toolkits may come and go, but raster operations and compositing are forever.

Thus, hardwiring policy and mechanism together has two bad effects: It makes policy rigid and harder to change in response to user requirements, and it means that trying to change policy has a strong tendency to destabilize the mechanisms.

On the other hand, by separating the two we make it possible to experiment with new policy without breaking mechanisms. We also make it much easier to write good tests for the mechanism (policy, because it ages so quickly, often does not justify the investment).

This design rule has wide application outside the GUI context. In general, it implies that we should look for ways to separate interfaces from engines.

One way to effect that separation is, for example, to write your application as a library of C service routines that are driven by an embedded scripting language, with the application flow of control written in the scripting language rather than C. A classic example of this pattern is the Emacs editor, which uses an embedded Lisp interpreter to control editing primitives written in C. We discuss this style of design in Chapter 11.

Another way is to separate your application into cooperating front-end and back-end processes communicating through a specialized application protocol over sockets; we discuss this kind of design in Chapter 5 and Chapter 7. The front end implements policy; the back end, mechanism. The global complexity of the pair will often be far lower than that of a single-process monolith implementing the same functions, reducing your vulnerability to bugs and lowering life-cycle costs.

Rule of Robustness: Robustness is the child of transparency and simplicity. Software is said to be robust when it performs well under unexpected conditions which stress the designer's assumptions, as well as under normal conditions.

Most software is fragile and buggy because most programs are too complicated for a human brain to understand all at once. When you can't reason correctly about the guts of a program, you can't be sure it's correct, and you can't fix it if it's broken.

It follows that the way to make robust programs is to make their internals easy for human beings to reason about. There are two main ways to do that: transparency and simplicity.

For robustness, designing in tolerance for unusual or extremely bulky inputs is also important. Bearing in mind the Rule of Composition helps; input generated by other programs is notorious for stress-testing software (e.g., the original Unix C compiler reportedly needed small upgrades to cope well with Yacc output). The forms involved often seem useless to humans. For example, accepting empty lists/strings/etc., even in places where a human would seldom or never supply an empty string, avoids having to special-case such situations when generating the input mechanically. – Henry Spencer

One very important tactic for being robust under odd inputs is to avoid having special cases in your code. Bugs often lurk in the code for handling special cases, and in the interactions among parts of the code intended to handle different special cases.

Rule of Representation: Fold knowledge into data, so program logic can be stupid and robust. Even the simplest procedural logic is hard for humans to verify, but quite complex data structures are fairly easy to model and reason about. To see this, compare the expressiveness and explanatory power of a diagram of (say) a fifty-node pointer tree with a flowchart of a fifty-line program. Or, compare an array initializer expressing a conversion table with an equivalent switch statement. The difference in transparency and clarity is dramatic. See Rob Pike's Rule 5.

Data is more tractable than program logic. It follows that where you see a choice between complexity in data structures and complexity in code, choose the former. More: in evolving a design, you should actively seek ways to shift complexity from code to data.

Rule of Least Surprise: In interface design, always do the least surprising thing. (This is also widely known as the Principle of Least Astonishment.)

The easiest programs to use are those that demand the least new learning from the user — or, to put it another way, the easiest programs to use are those that most effectively connect to the user's pre-existing knowledge.

Therefore, avoid gratuitous novelty and excessive cleverness in interface design. If you're writing a calculator program, ‘+’ should always mean addition! When designing an interface, model it on the interfaces of functionally similar or analogous programs with which your users are likely to be familiar.

Pay attention to your expected audience. They may be end users, they may be other programmers, or they may be system administrators. What is least surprising can differ among these groups.

Pay attention to tradition. The Unix world has rather well-developed conventions about things like the format of configuration and run-control files, command-line switches, and the like. These traditions exist for a good reason: to tame the learning curve. Learn and use them.

(We'll cover many of these traditions in Chapter 5 and Chapter 10.)

The flip side of the Rule of Least Surprise is to avoid making things superficially similar but really a little bit different. This is extremely treacherous because the seeming familiarity raises false expectations. It's often better to make things distinctly different than to make them almost the same.

– Henry Spencer

Rule of Silence: When a program has nothing surprising to say, it should say nothing. One of Unix's oldest and most persistent design rules is that when a program has nothing interesting or surprising to say, it should shut up. Well-behaved Unix programs do their jobs unobtrusively, with a minimum of fuss and bother. Silence is golden.

It can be useful to have logs/progress bars to know how a long-running program is chugging along.

Compactness Compactness is the property that a design can fit inside a human being's head. A good practical test for compactness is this: Does an experienced user normally need a manual? If not, then the design (or at least the subset of it that covers normal use) is compact.

Compact software tools have all the virtues of physical tools that fit well in the hand. They feel pleasant to use, they don't obtrude themselves between your mind and your work, they make you more productive — and they are much less likely than unwieldy tools to turn in your hand and injure you.

Compact is not equivalent to ‘weak’. A design can have a great deal of power and flexibility and still be compact if it is built on abstractions that are easy to think about and fit together well. Nor is compact equivalent to ‘easily learned’; some compact designs are quite difficult to understand until you have mastered an underlying conceptual model that is tricky, at which point your view of the world changes and compact becomes simple. For a lot of people, the Lisp language is a classic example of this.

Nor does compact mean ‘small’. If a well-designed system is predictable and ‘obvious’ to the experienced user, it might have quite a few pieces. – Ken Arnold Very few software designs are compact in an absolute sense, but many are compact in a slightly looser sense of the term. They have a compact working set, a subset of capabilities that suffices for 80% or more of what expert users normally do with them. Practically speaking, such designs normally need a reference card or cheat sheet but not a manual. We'll call such designs semi-compact, as opposed to strictly compact.

The concept is perhaps best illustrated by examples. The Unix system call API is semi-compact, but the standard C library is not compact in any sense. While Unix programmers easily keep a subset of the system calls sufficient for most applications programming (file system operations, signals, and process control) in their heads, the C library on modern Unixes includes many hundreds of entry points, e.g., mathematical functions, that won't all fit inside a single programmer's cranium.

The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information[Miller] is one of the foundation papers in cognitive psychology (and, incidentally, the specific reason that U.S. local telephone numbers have seven digits). It showed that the number of discrete items of information human beings can hold in short-term memory is seven, plus or minus two. This gives us a good rule of thumb for evaluating the compactness of APIs: Does a programmer have to remember more than seven entry points? Anything larger than this is unlikely to be strictly compact.

Among Unix tools, make(1) is compact; autoconf(1) and automake(1) are not. Among markup languages, HTML is semi-compact, but DocBook (a documentation markup language we shall discuss in Chapter 18) is not. The man(7) macros are compact, but troff(1) markup is not.

Among general-purpose programming languages, C and Python are semi-compact; Perl, Java, Emacs Lisp, and shell are not (especially since serious shell programming requires you to know half-a-dozen other tools like sed(1) and awk(1)). C++ is anti-compact — the language's designer has admitted that he doesn't expect any one programmer to ever understand it all.

Some designs that are not compact have enough internal redundancy of features that individual programmers end up carving out compact dialects sufficient for that 80% of common tasks by choosing a working subset of the language. Perl has this kind of pseudo-compactness, for example. Such designs have a built-in trap; when two programmers try to communicate about a project, they may find that differences in their working subsets are a significant barrier to understanding and modifying the code.

Noncompact designs are not automatically doomed or bad, however. Some problem domains are simply too complex for a compact design to span them. Sometimes it's necessary to trade away compactness for some other virtue, like raw power and range. Troff markup is a good example of this. So is the BSD sockets API. The purpose of emphasizing compactness as a virtue is not to condition you to treat compactness as an absolute requirement, but to teach you to do what Unix programmers do: value compactness properly, design for it whenever possible, and not throw it away casually.

dtoma/taoup.md

Select an option

No results found

Select an option

No results found