phoe/text.md

Last active February 7, 2020 13:22

Star () You must be signed in to star a gist
Fork () You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/phoe/335fecfdc195bddd47ab0928b0e62e52.js"></script>
Save phoe/335fecfdc195bddd47ab0928b0e62e52 to your computer and use it in GitHub Desktop.

Download ZIP

loop return value

Raw

text.md

So recently a ~~religious debate~~ intense discussion happened on #lisp about whether the following form:

;; Form 1
(loop for i from 1 to 5 finally (return i))

should return 5 or 6 (or, in other words, (1+ 5) - this notation is important as it will be used later).

I argue that it is invalid for it to return 6 and 5 must be returned instead.

Let us assume that (declaim (optimize (safety 3))) is in effect. In other words, we are working on safe code.

An of-type declaration is allowed in for-as-arithmetic variable bindings in LOOP and it is meant to be used for optimization. Adding a type declaration should not have any change on the meaning of the program, other than the fact that it must signal a type error in safe code. Therefore, form 1 is equivalent to

;; Form 2
(loop for i of-type fixnum from 1 to 5 finally (return i))

This is true, since, according to the standard, The type fixnum is required to be a supertype of (signed-byte 16). Numbers from 1 to 5, and even 6, are therefore all fixnums.

The intent of the above form is that the user wants to do something five times - for integers from 1 to 5. Let us modify this loop - for instance, the user might want to do something for all fixnums. This gives us:

;; Form 3
(loop for i of-type fixnum from most-negative-fixnum to most-positive-fixnum finally (return i))

This LOOP form is logically consistent, since the value of i is between most-negative-fixnum and most-positive-fixnum, and therefore falls within the range of fixnum. (The intent is also clear - the user wants to iterate over all fixnums.)

In the LOOP epilogue, we have the form (return i). Therefore the variable i is accessed which is of type fixnum. Its value therefore must be of type fixnum, due to the OF-TYPE declaration.

This means that it is impossible for the value of i to be (1+ most-positive-fixnum), since this value is not of type fixnum. It would also mean that this value was assigned to the variable i accessible in the loop epilogue, which is a type violation.

A TYPE declaration is not allowed to change the semantics of the program since CLHS 3.3.1 states:

In general, an implementation is free to ignore declaration specifiers except for the declaration, notinline, safety, and special declaration specifiers.

Accessing i in the loop prologue is valid, as the forms inside the LOOP form (in particular, the epilogue) are in scope of the bindings, so they have access to the variables. According to CLHS 6.1.1.4:

A loop macro form expands into a form containing one or more binding forms (that establish bindings of loop variables) and a block and a tagbody (that express a looping control structure). The variables established in loop are bound as if by let or lambda.

CLHS Declaration TYPE states:

During the execution of any reference to the declared variable within the scope of the declaration, the consequences are undefined if the value of the declared variable is not of the declared type.

During the execution of any setq of the declared variable within the scope of the declaration, the consequences are undefined if the newly assigned value of the declared variable is not of the declared type.

CLHS 1.5.2 states:

Conforming code shall not depend on the consequences of undefined or unspecified situations.

The implies that that LOOP with OF-TYPE invokes undefined behaviour if the OF-TYPE is incorrect, even in safe code. Therefore the question is whether the code from form 3 is conforming or not. I argue that it is conforming, since the values that the variable is permitted to take and meant to be taken inside the iteration are fully within the fixnum type; the implementation is not required to assign a non-fixnum value to the variable i in order to successfully perform the iteration. (This agrees with the likely user intent of this LOOP call - that the user wants to evaluate some code for every fixnum.)

If this code conforms, then the value of i must not exceed the fixnum range, which requires that (1+ most-positive-fixnum) must NOT be assigned to the variable i, which means that finally (return i) must NOT return (1+ most-positive-fixnum), which in turn implies that most-positive-fixnum must be returned, which in turn implies that the LOOP variable must not be assigned an out-of-range value.

This behaviour in turn translates to looping from 1 to 5, at which point 5 must be returned, which completes the proof.

white-flame commented Nov 3, 2019 •

edited

Loading

There's a real logic problem in the last few paragraphs. Your assumption of conformance is that the value remains a fixnum "inside the iteration" (presumably meaning inside the loop body clauses), but the finally clause occurs outside the iteration (specifically, in the epilogue) and thus outside the scope of that assumption. The scope of the iteration variable extends past the iteration and into the prologue and epilogue. The only way the code is conforming is if the type declaration properly describes its value in all of the prologue, iterations, and epilogue. Since no statement about the epilogue's value is made in your assumption, yet the problem is about defining the bounds inclusive of the epilogue value, no logical implication seems to occur.

Also, the situation should be addressed where the to value is overshot instead of exactly reached. This current example can be processed to your desired implication by equality testing before stepping (which is defined in the glossary as mutating the iteration variable), while the first example in 6.1.2.1.1 cannot. It must perform a > test on the post-stepped value, which affects whether the (effectively) to 10 by 2 iterated variable should contain 9 or 11 in the epilogue. Both the overshot and exactly-reached situations must be necessarily handled by the same code, as it's generally unpredictable how it will terminate.

Also also, since so many implementations return 6 instead of 5, there might be different rationale from the spec that justifies the existing behavior. That should be explored here as well, if this document is a basis for changing behavior. If returning 6 in this case is also defensible according to the spec, and the arguments of the gist hold, it would render the CLHS inconsistent for this matter: Whether the final stepping occurs would be reasonably undefined, instead of defined as being unstepped in the epilogue (e.g. 5 in this example).

To my reading, I don't think the CLHS specifically demands whether a final stepping beyond the to limit shall occur or must not occur, although there's much to support the final stepping (returning 6) to the point where I would consider that behavior standard:

The final list in 6.1.1.6 implies unconditionally performing the stepping, although "generally" may render it non-binding.
The term "reaches" in 6.1.2.1 could be read differently between the exactly-reached and overshot situations, and "can be stepped" seems to be an either/or choice in unconditional behavior.
6.1.2.1.1 uses "succeeding iteration" and "end value" in ways that aren't 100% clear regarding this distinction, but the former again seems like unconditional stepping.

I would prefer that the main focus be on what the spec says instead of general programmer expectations. But if we are including the latter, there also exists a common and general programming precedent in ensuring the iteration variable can be stepped to hold an exit condition value. If it is bounded to contain only non-exit values, it will generally loop infinitely. This often happens to beginners when trying to iterate to the maximum value of a fixed-size integer by looping while the value is <= the max. All values will continue to loop, and there is no available range to step the value into an exit condition.

A common way in many programming languages to avoid the final stepping such that there need be no exit condition value in the top-of-loop processing is to break out before stepping past any problematic fenceposts:

(loop for x of-type fixnum from most-negative-fixnum
  ;; ...loop body...
  until (= x most-positive-fixnum))

This also manages the post-iteration value of the iteration variable, as the default iteration mechanisms usually step past the limit into exit condition territory. Within the loop body, of course, the iteration variable's value always stays within the expected bounds as agreeably presented in the gist, regardless of the post-iteration handling.

CL's do also completes a mutating stepping before looping back and testing the stepped values, as another precedent for returning 6.

bhaible commented Nov 3, 2019

It is not convincing to make an argument about program A by saying "let me modify the program to program B, then look at what the standard says about B, then from that draw conclusions about program A". That's because the standard is not free of contradictions; there are some known mistakes in it.

Especially, citing a sentence that starts with "In general ..." is not going to provide a convincing argument.

If you were to build an argument about what the standard says about program A directly, that would be more convincing.

bhaible commented Nov 4, 2019

It is well-known since at least 2004 that accessing FOR-AS variables in the FINALLY body leads to unexpected results. See https://www.cliki.net/Issue%20LOOP-FINALLY-VARIABLES .

Author

phoe commented Nov 4, 2019

Copying the full comment from @bhaible from the CLISP bug tracker.

There are two more related cases:

> (loop for i from 1 to 5 finally (return i))
6
> (loop for i from 1 upto 5 finally (return i))
6
> (loop for i from 1 below 6 finally (return i))
6

ANSI CL 6.1.2.1.1 http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_6-1-2-1-1.html says: "The variable var is bound to the value of form1 in the first iteration and is stepped[1] by the value of form3 in each succeeding iteration, or by 1 if form3 is not provided." Therefore here, the variable I is set to the values 1, 2, 3, ... consecutively.

The question is whether the last value attained is 5 or 6.

The text regarding TO and UPTO in http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_6-1-2-1-1.html is not precise: For UPTO it says "the loop terminates when the variable var passes the value of form2" which - to a certain degree - indicates that FOR I FROM 1 UPTO 5 will set I to 6.

The text regarding BELOW in http://www.ai.mit.edu/projects/iiip/doc/CommonLISP/HyperSpec/Body/sec_6-1-2-1-1.html, on the other hand, is clearly contradictory: "These keywords stop iteration just before the value of the variable var reaches the value supplied by form2" but also "iteration continues until the value var is stepped to the exclusive or inclusive limit supplied by form2".

I think all three cases should be handled in the same way - either use a temporary variable for the stepping in all three cases, or use an assignment to var during the stepping, in all three cases.

Since SBCL produces the the same results as CLISP for the three cases above, I vote for doing nothing.

Users need to be aware that accessing the loop variables in the FINALLY clause is - and always has been - hairy. This is also true for the other iteration controls, such as FOR I ACROSS or FOR I BEING EACH HASH-KEY and so on.

white-flame commented Nov 5, 2019 •

edited

Loading

I think all three cases should be handled in the same way - either use a temporary variable for the stepping in all three cases, or use an assignment to var during the stepping, in all three cases.

While this is an imported comment from elsewhere, it bears repeating that the glossary defines stepping as mutating the iteration variable:

v.t. (an iteration variable) to assign the variable a new value at the end of an iteration, in preparation for a new iteration.

However, this definition matches DO and the notion of "end of an iteration" breaks down for LOOP. Iteration clauses appear only at the beginning of a loop, and an iteration with regards to that clause itself is intended to perform the test at the end of its execution, and the step before (6.1.1.6). But as far as mutating the variable versus using a temporary variable, it clearly denotes the former.