Types for Nonempty Lists

This is a writeup of the issues I found changing the type signatures of max and min to require at least one input. The idiom of applying these functions to "rest" args means a number of other type signatures need to be enhanced to propagate assurances that list are nonempty.

Discussion

The discussion (last active 2013) linked from GitHub issue 496 includes the question of whether people should be asked to change from the idiom of applying max to a list, and also the question of to what extent Typed Racket can deduce that the applications are safe.

I looked at this question some and this report includes some conclusions and enough details for another person to pick up the project.

There is one recommendation which does not fit elsewhere: standardize a type abbreviation for a nonempty list. Perhaps (Nonempty-Listof A).

Branches

Conclusions

To summarize situations in which Typed Racket does not deduce a list is nonempty:

Early exit with an error: (when (empty? xs) (signal-error ...) This would particularly an issue if signal-error were not in the same compilation unit.
Lists known to be the same length, with only one explicitly checked nonempty: this situation occurs in statistics (counts and weights) and graphics (coordinates x/y/z).
Lists are derived from sequences, for which there is no type-level way of specifying they are nonempty.

Details

The utility function cumulative-sum in the math/statistics package takes in a list and produces a nonempty list.
The type signature for reverse does not indicate that it preserves nonempty lists.
The helper function for foldl must have the correct nonempty-preserving signature.

The changes required to propagate information about nonempty lists seem extensive. I have some notes below.

Partial List

I read Section 4.10, and I listed the functions that I think would need updated signatures if one wanted to track non-emptiness of lists. The list is (to me) suprisingly short. I am leaving this list here as a map for someone who wanted to be thorough.

map -- completed in PR
reverse -- completed in PR
sort: easy, two more lines
cons: one of the type signatures is (-> a b (Pairof a b)). Is TR now smart enough to substitute b=(Listof a) and deduce that cons produces nonempty lists from this signature alone? If so the other is redundant.
build-list: easy
make-list: easy
add-between: easy
append: type signature already preserves nonempty lists (in first argument)
append*: this may not be realistic to preserve nonempty lists unless every item in the input list is required to be nonempty?
flatten: type system probably cannot encode a general condition that gives a nonempty output here. Look for use before doing.
remove-duplicates: easy
cartesian-product: easy, but look for use before doing. Any empty list in the input should produce an empty output, so the strongest signature would say all inputs nonempty lists produces a nonempty list.

In addition, forms of for, especially for/list, would need annotation by the user - so a convenient way to describe the output type would be good.

Map and Reverse

The type of map preserves nonempty lists of 1, but not nonempty lists when more than one input is given! This is fixed in a PR.

Welcome to Racket v7.7.0.9.
> map
- : (All (c a b ...)
      (case->
       (-> (-> a c) (Pairof a (Listof a)) (Pairof c (Listof c)))
       (-> (-> a b ... b c) (Listof a) (Listof b) ... b (Listof c))))
#<procedure:map>

Less strangely, reverse does not have a type signature that preserves nonempty lists. (Added in PR.)

Early Exit

Typed Racket is able to infer that the list xs is nonempty in the else clause after a test:

(if (empty? xs)
    (raise-argument-error ...)
    (do-work-on-nonempty xs))

Typed Racket is not able to infer that the list xs is nonempty after a when test that terminates the function.

(when (empty? xs) (raise-argument-error ...))
(do-work-on-nonempty xs)

When I encountered this problem, I transformed the early exit into one branch of a cond, so the types were known to be nonempty in the a later branch. I used cond instead of if because it was easier to bracket the existing multi-line code with cond.

The early exit idiom is sometimes combined with the issue that multiple lists are known to be the same length, so only one of them is tested to be nonempty.

Multiple lists of the same length

This section explains two instances where multiple lists are known to be of the same length, so the reader can see what would be required for Typed Racket to deduce or use this information automatically.

Plotting package (plot-lib)

vector-field-render-fun in plot2d/point.rkt needs to know that all of the lists involved are nonempty. The key step is a for*/lists that creates 7 lists of the same length. After checking to see if one of them is empty, the remainder are used without checking them.
vector-field-render-fun, more: for the compiler to prove that the lists are all nonempty, it needs to know that linear-seq is returning a nonempty list. (The samples parameter is a positive integer here, so this is deducible from the existing linear-seq with sufficient interprocedural analysis, but...)
linear-seq in common/sample.rkt has a signature that permits the number of samples to be 0, in which case the empty list is returned. On one hand I understand the reasoning --- mathematically, this makes sense---, but that is going to make it quite a bit harder for the compiler to understand that in all other cases, the empty list is not produced. Solving the problem in this particular instance is "easy": another type signature can cover the case when the num parameter is known to be a Positive-Integer.

Statistics package (math/statistics)

This section shows a detail where the compiler would in theory be able to deduce that two lists have the same length, and hence are either both empty or nonempty, but does not.

Examine the statistics-utils.rkt file from the math/statistics package. There are two lists of values returned from sequences->weighted-samples. They are the same length, but the compiler does not know this.

The information to make a more detailed analysis is available to the compiler. The penultimate step in sequences->weighted-samples is to call a function that produces an error unless the lengths are the same.

This becomes an issue in the function sequences->normalized-weighted-samples, where the code tests to see if the first returned value is nonempty and then assumes (knows) the second is also nonempty. Applying max (with a signature requiring at least one element in the list) to the second value (needlessly) results in a type error.

Corrections

Multi-input map now preserves nonempty lists.
Reverse now preserves nonempty lists.
plot-lib/plot/math.rkt: rect-join and rect-meet now work correctly given no inputs, producing (empty-rect 0) and (unknown-rect 0) respectively.

Other Fixed Issues

This section documents the other fixes needed before Racket would build with raco setup without errors.

plot-lib/plot/plot2d/line.rkt: (the density function) inserted magic 1e-308 as default h when xs is empty (should be impossible)
plot-lib/plot/common/kde.rkt: (the kde function) in the else clause, calling sort-samples with a nonempty xs produces two nonempty lists of the same length (see notes below), so the application of max to the weights will always be ok. Fix: I inserted a 0.0 as the default for max-dist to appease the compiler.
plot-lib/plot/common/plot-device.rkt: The max-label-x-size function needs to know (somehow) that the labels input is nonempty. Fix: the max size is 0 in the empty case.
plot-lib/plot/plot3d/flv3-center: Should require at least one vector. I think this is a mistake/carelessness in the code. Fix: require a nonempty input list.
math/math-lib/.../matrix/matrix-operator-norm.rkt: I think the issues with matrices stem from (1) are 0 dimensional matrices allowed? (2) row or column-wise application of the L_1 norm: if you allow 0 dimensional matrices, you need to say what L_1 of that matrix is (presumably -inf.0 if that's allowed).

(matrix-op-inf-norm, matrix-op-1norm): Fix: I added a default max of -inf.0 if there are no columns. This should cause the assertion in the code to fail. (Is that desirable?)

maueroats/commentary-typed-racket-issue-496.md