"The Boolean Trap" "Boolean parameters are wrong"
TODO(uucidl): add examples for every definition.
boolean type: (bit) minimum amount of information: { 0, 1 } usually represented in text as { true, false }.
Consider an interface: T1 proc(T0)
Usage code:
// Context S0
T0 a;
T1 b0 = proc(a); // with effect: { e0 }
effect: observable change not tracked by manifest variables. Like changing a global value, sending data to the network, touching a file on disk, displaying graphics.
Let's add a boolean parameter to it: T1 proc(T0,bool)
// In same context S0
T0 a;
bool c;
T1 b1 = proc(a, c); // with effect on S0: e1
Let's assume that c == false
is the fixed point, where the API is stable, and c == true
is the new behavior:
// In same context S0
T0 a;
bool c; // c == false
T1 b1 = proc(a, c); // with effect on S0: e1
// stable case: e1 == e0 && b1 == b0
First, the effect of this change is to break the usage code.
Unless the host language supports default values or keyword arguments, the call to proc has to be changed. And even if it has not been changed, it still may require compilation/binding-linking.
If c == true
is the new behavior, the effect of the boolean can affect the return value as well as the effect of proc:
Let's review the effect of the boolean on the return value and effect:
// In same context S0
T0 a;
bool c; // c == true
T1 b1 = proc(a, c); // with effect on S0: e1
// 3 different cases:
// 1. e0 == e1 && b1 != b0
// 2. e0 != e1 && b1 == b0
// 3. e0 != e1 && b1 != b0
This highlights the legibility impact of a boolean parameter. The call site says only true or false, without giving more details about how that value affects the return value or side-effects or both.
Consider if we went from:
step_count = move_forward(40);
To:
step_count = move_forward(40, true); // this could mean anything
step_count = move_forward(40, false);
What would you as a reader understand now?
do(false); // inverse?
do_this_and_that(false); // which one is affected?
Let's see what we could do instead of adding the boolean parameter.
- (easy case) new entry point that takes b0 and turn it into b1
// state S0
T0 a;
T1 b;
b = proc(a); // b == b0, effect e0
b = to_b1(b); // b == b1
2 + 3. (harder cases) For 2 & 3, it depends whether e0 is allowed to be observed or not.
If e0 is allowed to be observed then a new entry point may produce e1 from the state left by e0(S0) and usage code becomes:
// in state S0
T0 a;
T1 b;
b = proc(a); // b = b0, with effect on S0: e0
b = proc_e1(a); // effect e1, b = b1
// state S1
If however e0 should not occur at all, which is the usual case where the boolean is added, we have some options.
If you can split proc in an effectful part and non effectful part, then creating finer grained entry points can allow users to implement the desired behavior:
// equivalent to b = proc(a);
b = proc_v(a)
proc_e0(a);
// new behavior:
b = proc_v(a)
proc_e1(a);
Alternatively, if the effects are more complicated to decompose, and if the boolean starts to creep into more effectful parts, a state for the API could be added instead:
// in state S0
want_effect_e1(true);
b = proc(a1);
c = proc(a2);
d = proc(a3);
want_effect_e1(false);
This reduces legibility of the interface, making it less context-free due to the existence of implicit dependencies between calls. It does preserve the stability of the interface at the expense of more state management pushed onto the new case. Which is maybe fine if the new case is specific or rare!
The entry point is being changed because a new usage has been discovered.
The question now is: What is the most common usage? If we consider the default case (fixed point) to be still the most common case, and the boolean represents a divergence (rare) from the common case. This is likely the case when the boolean is added late. Or do we have two equally common cases? What makes us think then that there are only two cases.
We're trying to preserve the same number of entry points stable, but we're also making the existing, most common usages suffer, by changing the existing entry point. What are we trying to save? Entry points, documentation.
Why would you want to preserve creating more entry points? To preserve accessibility of the interface by keeping the API small.
So what do we know about this new case? That it is a rare case. It should be documented as such. It should be put into a special, separate part of the documentation, to keep the common interface clear.
The case where the most common case was mistaken and actually the old and new are equally frequent. In this case it would seem justified to deprecate the old entry point and add a new one.
Additonally booleans themselves only encode one bit of information. Are we sure we won't need a third case later? Shouldn't we immediately go for a bitset flag argument, able to represent a larger set of values? Managing the entry points count and keeping low by introducing parameters with types that are more open to later additions.
Going for a boolean case now would introduce yet another breakage later on as the new cases get discovered.