This is a writeup of the issues I found changing the type signatures
of max
and min
to require at least one input. The idiom of
applying these functions to "rest" args means a number of other type
signatures need to be enhanced to propagate assurances that list are
nonempty.
The discussion (last active 2013) linked from GitHub issue 496 includes the question of whether people should be asked to change from the idiom of applying max to a list, and also the question of to what extent Typed Racket can deduce that the applications are safe.
I looked at this question some and this report includes some conclusions and enough details for another person to pick up the project.
There is one recommendation which does not fit elsewhere: standardize
a type abbreviation for a nonempty list. Perhaps (Nonempty-Listof A)
.
To summarize situations in which Typed Racket does not deduce a list is nonempty:
- Early exit with an error:
(when (empty? xs) (signal-error ...)
This would particularly an issue ifsignal-error
were not in the same compilation unit. - Lists known to be the same length, with only one explicitly checked nonempty: this situation occurs in statistics (counts and weights) and graphics (coordinates x/y/z).
- Lists are derived from sequences, for which there is no type-level way of specifying they are nonempty.
- The utility function
cumulative-sum
in themath/statistics
package takes in a list and produces a nonempty list. - The type signature for
reverse
does not indicate that it preserves nonempty lists. - The helper function for
foldl
must have the correct nonempty-preserving signature.
The changes required to propagate information about nonempty lists seem extensive. I have some notes below.
I read Section 4.10, and I listed the functions that I think would need updated signatures if one wanted to track non-emptiness of lists. The list is (to me) suprisingly short. I am leaving this list here as a map for someone who wanted to be thorough.
- map -- completed in PR
- reverse -- completed in PR
- sort: easy, two more lines
- cons: one of the type signatures is
(-> a b (Pairof a b))
. Is TR now smart enough to substituteb=(Listof a)
and deduce thatcons
produces nonempty lists from this signature alone? If so the other is redundant. - build-list: easy
- make-list: easy
- add-between: easy
- append: type signature already preserves nonempty lists (in first argument)
- append*: this may not be realistic to preserve nonempty lists unless every item in the input list is required to be nonempty?
- flatten: type system probably cannot encode a general condition that gives a nonempty output here. Look for use before doing.
- remove-duplicates: easy
- cartesian-product: easy, but look for use before doing. Any empty list in the input should produce an empty output, so the strongest signature would say all inputs nonempty lists produces a nonempty list.
In addition, forms of for
, especially for/list
, would need
annotation by the user - so a convenient way to describe the output
type would be good.
The type of map
preserves nonempty lists of 1, but not nonempty
lists when more than one input is given! This is fixed in a PR.
Welcome to Racket v7.7.0.9.
> map
- : (All (c a b ...)
(case->
(-> (-> a c) (Pairof a (Listof a)) (Pairof c (Listof c)))
(-> (-> a b ... b c) (Listof a) (Listof b) ... b (Listof c))))
#<procedure:map>
Less strangely, reverse
does not have a type signature that
preserves nonempty lists. (Added in PR.)
The plot-lib
package contains a number of type errors once max
is
changed to require at least one input argument. The first obstacle is
the cumulative-sum
function, which needs to have a type signature
indicating that it will always output a nonempty list.
(: cumulative-sum (-> (Listof Real) (List* Real (Listof Real))))
As mentioned above, there are two issues. The first issue is that the
original foldl
accumulator function has a signature which does not
assert it always produces a nonempty list (it does) - blame the standard cons
signature? The second issue
is that reverse
does not have a signature asserting it takes
nonempty lists to nonempty lists.
Typed Racket is able to infer that the list xs is nonempty in the else clause after a test:
(if (empty? xs)
(raise-argument-error ...)
(do-work-on-nonempty xs))
Typed Racket is not able to infer that the list xs is nonempty after a
when
test that terminates the function.
(when (empty? xs) (raise-argument-error ...))
(do-work-on-nonempty xs)
When I encountered this problem, I transformed the early exit into one
branch of a cond
, so the types were known to be nonempty in the
a later branch. I used cond
instead of if
because it was easier to
bracket the existing multi-line code with cond
.
The early exit idiom is sometimes combined with the issue that multiple lists are known to be the same length, so only one of them is tested to be nonempty.
This section explains two instances where multiple lists are known to be of the same length, so the reader can see what would be required for Typed Racket to deduce or use this information automatically.
vector-field-render-fun
inplot2d/point.rkt
needs to know that all of the lists involved are nonempty. The key step is afor*/lists
that creates 7 lists of the same length. After checking to see if one of them is empty, the remainder are used without checking them.vector-field-render-fun
, more: for the compiler to prove that the lists are all nonempty, it needs to know thatlinear-seq
is returning a nonempty list. (Thesamples
parameter is a positive integer here, so this is deducible from the existinglinear-seq
with sufficient interprocedural analysis, but...)linear-seq
incommon/sample.rkt
has a signature that permits the number of samples to be 0, in which case the empty list is returned. On one hand I understand the reasoning --- mathematically, this makes sense---, but that is going to make it quite a bit harder for the compiler to understand that in all other cases, the empty list is not produced. Solving the problem in this particular instance is "easy": another type signature can cover the case when thenum
parameter is known to be aPositive-Integer
.
This section shows a detail where the compiler would in theory be able to deduce that two lists have the same length, and hence are either both empty or nonempty, but does not.
Examine the statistics-utils.rkt
file from the math/statistics
package.
There are two lists of values returned from
sequences->weighted-samples
. They are the same length, but the
compiler does not know this.
The information to make a more detailed analysis is available to the
compiler. The penultimate step in sequences->weighted-samples
is to
call a function that produces an error unless the lengths are the
same.
This becomes an issue in the function
sequences->normalized-weighted-samples
, where the code tests to see
if the first returned value is nonempty and then assumes (knows) the
second is also nonempty. Applying max (with a signature requiring at
least one element in the list) to the second value (needlessly)
results in a type error.
- Multi-input map now preserves nonempty lists.
- Reverse now preserves nonempty lists.
plot-lib/plot/math.rkt
:rect-join
andrect-meet
now work correctly given no inputs, producing(empty-rect 0)
and(unknown-rect 0)
respectively.
This section documents the other fixes needed before Racket would
build with raco setup
without errors.
-
plot-lib/plot/plot2d/line.rkt
: (thedensity
function) inserted magic 1e-308 as defaulth
whenxs
is empty (should be impossible) -
plot-lib/plot/common/kde.rkt
: (thekde
function) in the else clause, callingsort-samples
with a nonemptyxs
produces two nonempty lists of the same length (see notes below), so the application ofmax
to the weights will always be ok. Fix: I inserted a 0.0 as the default formax-dist
to appease the compiler. -
plot-lib/plot/common/plot-device.rkt
: Themax-label-x-size
function needs to know (somehow) that thelabels
input is nonempty. Fix: the max size is 0 in the empty case. -
plot-lib/plot/plot3d/flv3-center
: Should require at least one vector. I think this is a mistake/carelessness in the code. Fix: require a nonempty input list. -
math/math-lib/.../matrix/matrix-operator-norm.rkt
: I think the issues with matrices stem from (1) are 0 dimensional matrices allowed? (2) row or column-wise application of the L_1 norm: if you allow 0 dimensional matrices, you need to say what L_1 of that matrix is (presumably -inf.0 if that's allowed).(
matrix-op-inf-norm
,matrix-op-1norm
): Fix: I added a default max of -inf.0 if there are no columns. This should cause the assertion in the code to fail. (Is that desirable?)