Skip to content

Instantly share code, notes, and snippets.

@shelby3
Last active June 24, 2016 10:08
Show Gist options
  • Save shelby3/3829dbad408c9c043695e006a1581d19 to your computer and use it in GitHub Desktop.
Save shelby3/3829dbad408c9c043695e006a1581d19 to your computer and use it in GitHub Desktop.

So Rust has global references ownership rules, which only allow one (borrowed) mutable reference to the same object to be active in the current block scope. The objective is to prevent race conditions on the object. But from the discussion we had on the forum, a DAG tree of disjointness is not a general model of imperative programming and afaics in the general case requires dependent typing (which is known to exclude Turing-completeness). Thus I abandoned Rust's global borrowing rules, because it seems like a pita of false promises that is rarely (20% of the time?) applicable.

But I proposed a more localized use of borrow tracking which I also referred to in later posts{1} that afaics can help free temporary objects without relying on GC in many cases.

I don't intend to use my proposal to prevent race conditions due two or more borrowed mutable references existing in the same block scope (which as I stated above Rust prevents and which I think is a PL design error). In the case where all these borrows are "outer-only" immutable, my proposal can guarantee the owner has an exclusive mutable copy of the reference w.r.t. modifying the polymorphic type of the reference. This I think can be useful for the case where we want to assume the owner has the only copy of a reference to the collection in its block scope, so that my first-class unions idea can be used to add new element types to a collection without needing to require that the collection is an immutable data structure (since immutable data structures are generally at least O(log(n)) slower). In other words, we can force immutable access on a mutable data type every where except for the owner's block scope, so this gives us the best of both mutable data structures and immutability for prevent race conditions.

I was thinking about how asynchronous programming (e.g. yield) impacts my proposal. For example, in JavaScript yield of a Promise will allow the current thread to run another stack which was waiting on a Promise event which has been fulfilled and is at the top of the event queue. In this way, JavaScript is able to achieve concurrency without the slower overhead and bugs of synchronization primitives such as mutex guards. But this form of asynchronous concurrency can't create race conditions in my proposal, because the stack is not made reentrant, i.e. each asynchronous DAG of execution is on a separate stack. And my proposal will not pass borrowed references between threads of execution (in either form, neither OS threads nor asynchronous DAG of execution).

So I conclude my proposal is robust.

{1} [quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/41] 3: GC every where except ability to use borrow checking to track lifetimes of temporary objects to free them without burdening the GC in many cases. And copying some of v8's techniques for improving the robustness of handling temporary objects for those cases that fall through to the GC. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/51] Remember upthread I mentioned the idea of possibly employing borrow checking in a very limited way (that would be silent no annotations in most cases) which didn't extend its tentacles into a total ordering, yet afaics might enable to declare some references to be unique (non-aliased) mutable, so that these could be compile-time deallocated (not given to the GC) and afaics these guarantees would also eliminate the need for the immutable restriction on adding elements to collections under my proposed complete solution to the expression problem.

Thus I think the GC would be entirely out of the picture when interface to FFI in that case, but I need to dig into the details to be sure. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/37] I do roughly envision a possibly very significant advantage of compile-time checking the borrowing of memory lifetimes local to the function body, if that can help offload from GC the destruction of temporary objects and if it won't require lifetime parameters. And that is because thrashing the GC with temporary objects appears to be one of the most significant weaknesses of GC. That is only enforcing a partial order, and not attempting to compile-time check a global consistency which can't exist. In short, I don't want a type system that silently lies to me very often. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/28] That is why I am pondering about localizing the borrow checker to just function bodies, so that we can get some of its benefits without having lifetime checking grow tentacles into everything where it can't even model general semantics. [/quote]

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

@keean wrote:

I think that is somewhat backwards. To get scalability you don't want local impurity to escape. You can allow multiple mutable references locally, because you can reason about small code fragments better.

Apologies I hadn't completely stated my proposal. You will be able to declare borrows as mutable or immutable.

I just mean that for the case I documented of needing to enforce one exclusive mutable owner (never borrowed) then it would only work with functions that take immutable borrows. Perhaps it would be more flexible if we could distinguish between inner and outer mutability, where inner means anything which doesn't change the polymorphic type of the reference, e.g. adding elements of the same type as the collection, mutating elements, etc,. In fact, I think that is really essential, so I need to think about how to add that to my proposal. Now I am adding complexity, so I need to make sure it is an 80/20 priority.

I can write an in-place quick-sort which uses multiple mutable references to a collection internally, and I can easily reason about this and test correctness.

Couldn't it be just as performant if it builds a new copy of the collection instead of requiring the collection to be mutable? What cases require not building a new copy?

What messes things up is when I quick-sort a collection that is part of a priority queue (for example) and some other unconnected function is messed up because it holds a mutable reference.

So what you need is localised chaos, where you can reason about code without having to worry about spooky action at a distance.

Agreed localized is separation-of-concerns.

As such I am happy to use raw pointers locally, but I need a way to be sure no other code holds any references. Thinking like this rusts ruled make more sense.

Afaics, this should be the caller's decision. If the caller wants to give only your library (function) an exclusive "inner" mutable reference borrow, then the caller should be able to indicate it and have the compiler enforce it. I believe that is what my proposal accomplishes.

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

@shelby3 wrote:

I hadn't considered moving exclusive access to mutability. If that turns out to be a needed case, then I will need to augment my proposal with it.

If as in my proposal we are deciding to use GC pretty much every where it isn't simple (localized, i.e. no lifetime parameters) for us to track lifetimes, then it seems (virtually) all moves are subsumed by GC references, except where we need to move a mutable reference to another OS thread. Seems in my proposal we only need to track moves for that case of giving a mutable reference to another thread. And these mutable references could I suppose be either GC or lifetime tracked.

@keean wrote:

@shelby3 wrote:

Couldn't it be just as performant if it builds a new copy of the collection instead of requiring the collection to be mutable? What cases require not building a new copy?

No that uses double the memory, which means less stuff can fit in the cache. When cache dominates performance having everything compact makes sense, also cache line hits are more likely. Also we do not have to allocate any new memory. Allocating memory with malloc is slow, and allocating in a GC language is fast, but lots of temporary objects is not good for GC either. In general for performance we should prefer to do everything in-place except where we want a copy of the the data as it was before the operation. Even then storing the inverse operation in an undo queue can result in better performance.

With your feedback I see yet another confirmation that immutability is generally slower. So immutability every where is not desirable.

However there is still a performance benefit to tracking immutability. For example, if we are doing some paritionable operation on a collection such as fold, map or even iterators, and if we know we don't modify any existing elements of the collection, then we can parallelize the computation. We can't safely parallelize if we modify the elements because we don't know if the elements have a circular reference to the collection that contains them and thus whether mutating them thus modifies the collection we want to partition over.

Rust tracks every reference and thus can guarantee that elements can't have a mutable reference to their container, but this requires a global lifetime tracking with lifetime parameters which I've already criticized.

The classic problem is the vector, which reallocates when the vector needs to expand. This means that sometimes writing to the vector will invalidate any iterators to the vector. We would ideally like to make it an error to call any method on the vector that is going to invalidate a held iterator without slowing the runtime code down. It would be better if only dangerous functions were excluded...

Yeah it seems to me we need finer grained immutability tracking:

  • Mutating the polymorphic type which must be exclusive since we don't have the anti-pattern subclassing, e.g. adding a new element type to the first-class union type parameter.
  • Mutating any instance of the polymorphic type, e.g. mutating elements of the collection.
  • Permission to invoke certain interfaces (traits).

So thus if your iterator library restricts against permission to invoke certain trait bounds, then no other borrowed reference in the same (or derivative) block scope could use those trait bounds. I think that would entirely solve the problem without Rust's conflation of immutability with the global consistency (no partial orders allowed) and Rust's conflation of the different aspects of an object that programmers want to enforce immutability on.

The key distinction is instead of trying to track allocation lifetimes every where they go (with lifetime parameters), we instead only track allocation lifetimes for borrows that don't leak into return values and everything else gets delegated to GC. And for mutability we only track borrows that don't leak into return values and we track 3 different aspects of mutability, so that we can have more than one mutable reference to the same collection for example. For mutability that leaks into return values, we punt and require these to be fully mutable so the programmer won't be able to enforce immutability in that case and needs to restructure the code if necessary to enforce immutability. I would need to look at examples to see if this is 80/20. But it can't be worse than a language that doesn't track immutability at all, e.g. most every popular language (other than Rust and Haskell, but these aren't popular).

So I am NOT proposing that allocation lifetime and mutability checking are separate concerns, because a GC reference can be copied any where so it leaks, thus only a borrowed function argument can have mutability restriction annotations.

Thus I modify my proposed syntax as follows.

Note that normally adding a new element type to the first-class union type parameter can't be type checked in a programming language unless the data type is a single-linked list, equivalently immutable data structure, or the data type is copied. Thus otherwise the only way to achieve adding a new element type to the first-class union type parameter, is to have an annotation for a method that declares it can change the polymorphic type parameter of the data type it is mutating, e.g. for arrays, vectors, doubly-linked lists, etc.. So we need a modifies prologue on the function declarations stating the new type of the type parameter(s) of self that results. These need to be methods on the struct not trait interfaces, so they can be enforced (see mention of modules below).

  1. Instead of let, val for entirely immutable (rare) and var for partially or entirely mutable.
  2. Function arguments are by default var. Prepend with val to make them entire immutable (rare).
  3. Partial mutability for the first of the aforementioned bulleted cases can be inferred if the data type implementation has any modifies methods. I suppose we might want to be able to track mutability of each type parameter separately, but for now I am going to simplify and assume we lump all together so we can infer it. Note this inference has to be transitive to all trait implementations which call these modifies methods.
  4. To declare partial mutability for the second of the aforementioned bulleted cases, annotate with the type and with each immutable type parameter prepended with #.
  5. To declare partial mutability for the third of the aforementioned bulleted cases, place in the where clause an x: !TraitBound1|TraitBound2|....
  6. A borrowed reference remains the non-annotated Type and a GC reference (hopefully rarer) will be ::Type. So when we write these as a function definitions (not just a signature declaration in a trait), we write x: Type and x ::Type respectively. This terse syntax is chosen to cut down on the noisy verbosity, as well to enforce a consistency of spacing style to improve readability. Having the compiler check the spacing and the single or double colon, is a double consistency check against typos. When writing a local GC variable and we want to infer the type, we write for example var x ::= ....

In order to give the collection designer the control to insure that others can't create new trait implementations that, or otherwise directly, modify the data type so as to subvert the above mutability protections, we must have modules so we can restrict which traits have access to which capabilities.

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

Okay so it is easy to see how my proposal will work with fold and map and it doesn't even need the third bulleted variant of immutability because there is no way for the callback function to get a mutable reference to the collection unless it mutates the elements of the collection and/or takes a closure on the collection reference (so we need a way to declare that a function doesn't closure over any variables it mutates), but for iterators there is a complication in that we normally call a factory method to construct an iterator so we'd need to track lifetimes from input to output, which I am trying to avoid.

One solution is to use the paradigm I had described on the Rust forum for inverting the control over iterators in that we yield to the caller to iterate each iterator{1}. So then we are borrowing iterator references instead of constructing them and thus we only need the same protection mechanism as for fold and map. You had criticized this as being slow, but I think a smart compiler could inline the yields so that it would be the same performance. Then we could eliminate the third bulleted variant of immutability (a simplification!).

Afair, the other reason for wanting to have iterator factories, is so we can store indices into collection which are already known to be bounds checked. But if we allow this, then we totally break localization of immutability checking any way, so then we are back to Rust's gridlock paradigm of global enforcement and breaking out of it with unsafe pointers. So I would like to not allow iterator factories (programmer can always create a library for them but they will be unsafe and so better to use our recommended paradigm of inversion-of-control).

I think what we could do instead is store indices, and a maximum value so we only have to check the maximum once when we reuse the indices in a fold, map, or "filtered iteration".

Thus I am proposing we use my inversion-of-control paradigm for iterators:

  1. https://users.rust-lang.org/t/inversion-of-control-resource-management/5737

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

For fold, map, and the inversion-of-control iterators, we would still need the third of the aforementioned bulleted variants of mutability in the case where we don't just want to replace elements (e.g. sort them or map in place), but mutate the elements safely checking that we won't mutate the collection in an incompatible way. We want to distinguish the second of the aforementioned bulleted variants of mutability from the act of replacing instances of a type parameter.

The benefit of inversion-of-control iterators compared to factory constructed iterators, since the iterators are borrowed they can't leak out of the encapsulation. With factory constructed iterators, we need to track returned lifetimes.

But inversion-of-control iterators not only will require complex inlining to be efficient, they also have a more complicated composability.

I am thinking instead we could keep iterator factories and make the assignment of self to returned instance a special case, so then we don't need generalized lifetime parameters. We only need the compiler to check that self is an argument of a constructor whose instance is the result value. We could annotate the result type with #s and/or suffix with fn(...):Type:!TraitBound1|TraitBound2|... to indicate to the compiler that we wish the result to be a borrow on self. So then the caller is forced to borrow it. Note the relevant edit I made:

@shelby3 wrote:

When writing a local GC variable and we want to infer the type, we write for example var x ::= ...

Note that the # and/or !TraitBound1|TraitBound2|... restrictions don't prevent our code from invoking some code that would interfere with for example a sort by instancing new iterators. And we can't have our iterator factories restrict the caller from calling iterator factories. The algorithm (e.g. sort) will need to require that the function signatures it accepts as input callbacks, include the iterator factory trait on the !TraitBound1|TraitBound2|... list. This is what makes this solution work wherein Rust can't enforce such a restriction if there are overlapping mutable slices and thus Rust is unsafe in that case. Perhaps all algorithms can be implemented most efficiently without overlapping mutable slices? I doubt it! General imperative programming doesn't conform to a DAG tree of disjointness. Thus my model appears to be superior to Rust's. I think I should post about this on the forum.

Edit: thinking more about this, the # restriction is absolutely required, because otherwise without Rust's global mutability tracking, there is no way to stop an element instance from having a reference to the collection and mutating the collection bypassing our !TraitBound1|TraitBound2|... restriction. So I can't think of any safe way to mutate the elements of a collection in place. Even cloning would not be safe, thus # is not sufficient unless it requires pure functions turtles all the way down!


Tangentially I think it is also important to note that result values will normally be fn(...): Type when the result value was not explicitly constructed as a GC instance (which isn't necessary unless the body of the function stores a copy of the reference as a side-effect). Yet the caller can assign to GC instance, meaning the caller will call one of two versions of the same function, one which inputs the stack location to build the instance and the other where the function allocates on the GC heap.

@shelby3
Copy link
Author

shelby3 commented May 22, 2016

Thus thinking about this more, the only way to opt out of Rust's global lifetime checking and still insure safety of iteration and enumeration, is when applying functions that input the collection or any any element of the collection, then those functions must be pure functions and it is not sufficient to just require an immutable reference to the specific input. Trait bound exclusions won't prevent hidden access to a reference to the collection which mutates it.

You could I suppose allow the code employing the iterators to replace elements of the collection for as long as that doesn't have any other side-effects. Probably need the trait exclusions idea to type check this. But unless your collection is using unboxed elements turtles all the way down (which thus can't be referenced only cloned so then your pure function replacing them could be allowed to mutate the element), this won't help cache locality.

It seems safety requires choosing a tradeoff. With Rust's global lifetime checking we give up overlapping mutable slices and we need to incur the tsuris of lifetime checking every where so nothing leaks out any where. Otherwise, we must employ pure functions which (except for some very special case of unboxed), requires us to duplicate allocations of objects which is less efficient for cache locality.

I need to rush out to the gym 6am before traffic. I may have more insights later. Thoughts?

@shelby3
Copy link
Author

shelby3 commented May 23, 2016

@shelby3
Copy link
Author

shelby3 commented May 23, 2016

So I am thinking fn or pure for pure functions and proc for impure subroutines. Do you have any better idea how to label pure versus non-pure? I am thinking the use of fn for impure subroutines as Rust does, is actually an error in nomenclature.

Probably pure is a better choice to be explicit, since both Rust and JavaScript bastardized the meaning of function. I like forcing programmers to prefix proc to remind them they are not writing a pure function. Being both 4 characters in length, then they will align horizontally in traits.

Please note I had described a way to mutate in place with pure functions:

@shelby3 wrote:

You could I suppose allow the code employing the iterators to replace elements of the collection for as long as that doesn't have any other side-effects. ... But unless your collection is using unboxed elements turtles all the way down (which thus can't be referenced only cloned so then your pure function replacing them could be allowed to mutate the element), this won't help cache locality.

@shelby3 wrote:

Otherwise, we must employ pure functions which (except for some very special case of unboxed), requires us to duplicate allocations of objects which is less efficient for cache locality.

@shelby3
Copy link
Author

shelby3 commented May 23, 2016

@keean wrote:

@shelby3 wrote:

It is pure in the sense that give it the same inputs and always get the same output(s).

Yes, except 'pure functional' implies no mutation, so he uses the term Regular Procedure instead.

There is also a concept of a Regular Type, which is one that behaves like a value (so like a builtin Int for example), That means you can assign it, copy it and compare it for equality like you can with any of the builtin types in C/C++.

An iterator is a common irregular type, as it cannot be copied.

Agreed, pure functional implies that even the inputs are not mutated, i.e. no side-effects. But this is a total order because it restricts the caller, which turtles all the way up imposing design constraints such as inability to mutate in place.

Regular procedures are really inner purity, but I can't think of a good four letter keyword that implies that. So best I have thus far is wipe.

Edit: My alternative idea instead of wipe is pour since we are pouring the outputs into the constrained mold of the inputs which conveys it produces something in a constrained way. I wanted a four letter word meaning 'overwrite' or 'replace'.

All could begin with 'p': pour, proc, and pure.

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

@shelby3 wrote:

Thus I modify my proposed syntax as follows.

Note that normally adding a new element type to the first-class union type parameter can't be type checked in a programming language unless the data type is a single-linked list, equivalently immutable data structure, or the data type is copied. Thus otherwise the only way to achieve adding a new element type to the first-class union type parameter, is to have an annotation for a method that declares it can change the polymorphic type parameter of the data type it is mutating, e.g. for arrays, vectors, doubly-linked lists, etc.. So we need a modifies prologue on the function declarations stating the new type of the type parameter(s) of self that results. These need to be methods on the struct not trait interfaces, so they can be enforced (see mention of modules below).

  1. Instead of let, val for entirely immutable (rare) and var for partially or entirely mutable.
  2. Function arguments are by default var. Prepend with val to make them entire immutable (rare).
  3. Partial mutability for the first of the aforementioned bulleted cases can be inferred if the data type implementation has any modifies methods. I suppose we might want to be able to track mutability of each type parameter separately, but for now I am going to simplify and assume we lump all together so we can infer it. Note this inference has to be transitive to all trait implementations which call these modifies methods.
  4. To declare partial mutability for the second of the aforementioned bulleted cases, annotate with the type and with each immutable type parameter prepended with #.
  5. To declare partial mutability for the third of the aforementioned bulleted cases, place in the where clause an x: !TraitBound1|TraitBound2|....
  6. A borrowed reference remains the non-annotated Type and a GC reference (hopefully rarer) will be ::Type. So when we write these as a function definitions (not just a signature declaration in a trait), we write x: Type and x ::Type respectively. This terse syntax is chosen to cut down on the noisy verbosity, as well to enforce a consistency of spacing style to improve readability. Having the compiler check the spacing and the single or double colon, is a double consistency check against typos. When writing a local GC variable and we want to infer the type, we write for example var x ::= ....

In order to give the collection designer the control to insure that others can't create new trait implementations that, or otherwise directly, modify the data type so as to subvert the above mutability protections, we must have modules so we can restrict which traits have access to which capabilities.

So the above changes to:

  1. When we add a new data type to the first-class union of the self type of a collection not using an immutable data structure such as linked list (where for example we can return a new constructed head with a new type), then the compiler needs to enforce that this operation is only allowed when the caller holds the exclusive reference (non-aliased) to the collection. Instead of a modifies keyword as previously proposed, the method which adds elements (allowing a new data type to be added to the union type) should return the new quantified collection type, and self should be the return value. The compiler can note that the self type is being cast to the new type, allow the modification and enforce that the result value is assigned by the caller to the var that holds the exclusive reference to the collection.

  2. Contrary to what I wrote previously for #3, we can't infer that assignment of a newly constructed instance of a collection requires an exclusive ownership of the reference, because a) we don't know whether the intent is to ever call the method mentioned in #1, and b) such method may be a trait and we don't know all the trait implementations in advance of using them. Do we need a syntax to indicate that a reference should be an exclusive mutable reference, which will exclude it being a GC reference? Well such a reference can be borrowed (just like all non-GC and GC references can) but distinguished from other non-GC references because it not allowed to be assigned to be to another variable in the same scope. Well this is analogous to the C restrict keyword, so it is what we'd probably prefer to be the default. So we want a syntax to indicate that a mutable non-GC reference can be aliased. Note that this syntax would never be used with val because they are immutable so aliasing them is harmless (if we never allow an immutable reference to be moved to a mutable reference). Thus I propose alias instead of var when aliasing is allowed.

    Note the compiler should check to enforce no aliasing for var. Thus C style pointer addition and subtraction will only be supported for a borrowed alias and only when the referenced data structure is a non-resizeable array.

  3. A borrowed reference remains the non-annotated Type and a GC reference (hopefully rarer) will be ::Type. So when we write these as a function definitions (not just a signature declaration in a trait), we write x: Type and x ::Type respectively. This terse syntax is chosen to cut down on the noisy verbosity, as well to enforce a consistency of spacing style to improve readability. Having the compiler check the spacing and the single or double colon, is a double consistency check against typos. When writing a local GC variable and we want to infer the type, we write for example alias x ::= .... Note that GC instances are always either alias or val, never var because var implies we can track the lifetime, so useless to make it GC. Note from #2 that borrowing is orthogonal to aliasing in that borrowed references can alias if for example all of them (including the source borrowed from) are alias (or val). Other scenarios include source is var and the borrowed references in a nested block scope are alias. Thus the way to construct multiple mutate iterators from a var reference to a collection, is to create a nested block scope (or function which is also a nested blocked scope) to alias the reference.

  4. We've added three types of subroutines: pour, proc, and pure. Input arguments are always val for pure functions and they may not be optionally prefixed with var nor alias. Input borrowed arguments for pour and proc are var by default and may be optionally prefixed with alias or val. Input GC arguments for pour and proc are alias by default and may be optionally prefixed with val but never var.

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

Slamdunk on why Rust's global lifetimes are a design error:

https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/64

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

Note that abandoning Rust's lifetime parameters, i.e. tracking lifetimes through return values, means the lifetime parameters example would have to be handled with GC in my proposal.

But I think that is an example of an incidental data structure that Sean Parent warns not to do. For example, if a pour procedure wanted to input a collection of items and return which one is chosen by the algorithm, it can input an iterator and mutate that input as output (and note I am assuming mutation of elements is selected by type of iterator, e.g. ForwardIterator and MutableForwardIterator). So it would still be possible to implement it without GC in my proposal.

I forget to mention in my proposal, can't allow borrowed (non-GC) inputs to leak into return values. But can borrow input mutables and mutate them as outputs in pour or proc procedures.

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

I am thinking most of the functionality of C macros can be provided by having default arguments, function overloading, and inline functions. For example #define H (state->h) could be implemented with a closure inline proc H() = state->h and then use site remains H[0] for as long as we adopt Scala's rule of application of zero arguments with optional parenthesis.

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

@keean wrote:

@shelby3 wrote:

I forget to mention in my proposal, can't allow borrowed (non-GC) inputs to leak into return values. But can borrow input mutables and mutate them as outputs in pour or proc procedures.

This is an important realisation. As long as you pass objects by reference and return by mutable reference you don't need lifetimes. You have almost designed my system :slight_smile: The next step is to realise you can convert a return value into a pass by reference argument by mechanical rewrite of the abstract syntax tree.

The final step is to generalise this into an extended function call semantics where every function call can include an additional piece of metadata where the caller pre-allocates the space for the return value before calling, and passes a reference to that space as an additional argument.

Remember I had stated that your design proposal which you described as inspired by Ada, seemed to be similar to what I was a thinking about for a more localized (optional) borrowing without lifetime parameters for the cases where we don't need GC. One of my points of distinction was we also need optional GC (hopefully rarely so most temporary objects are freed without GC).

I still have a problem with my proposal in that there is no way to construct an iterator with a factory method by borrowing a reference to the collection without using something like a lifetime parameter to signify that the output shares the lifetime and mutability alias with the input collection. I can invert-the-control and put the code that wants to operate on the iterator inside of the iterator factory's nested block by inputting that code in as a callback function so that the constructed iterator is never returned. But this would be less composable because it would conflate the traits for different types of iterators the caller might want to employ together in the same algorithm.

Your proposed idea of allowing the caller to specify the preallocated reference for the return value doesn't solve the lifetime and mutability aliasing tracking. Rather it only solves them if everything is unboxed+cloned and is an optimization where the return value doesn't have to be cloned when assigned by the caller.

Thus afaics, I must support some way to track lifetime and mutability aliasing from inputs to outputs. But afaics there is no need for lifetime parameters, only an annotation on the borrowed input argument (GC-only input arguments ::Type are never tracked) of the procedure (pour and proc) signature indicating the input tracks to the output for lifetime and mutability aliasing. I propose <-Type, so for the implementation declaration (i.e. not in traits) that is x <-Type.

A question remains should pure functions allow tracking an input to an output? Is the aliasing of the lifetime and mutability a side-effect that impacts the relevant purity invariance?

Note as I wrote before GC instances are always alias or immutable val (because if they were var their lifetime is tracked via borrowing thus no need to make them GC), so the lifetime tracking (which also tracks the mutability aliasing) for an iterator factory would be applicable only when the caller is supplying a non-GC input reference for the collection.

One of the big distinctions compared to Rust is I am proposing there are no moves (except for moving to a different OS thread), thus there is no need to track lifetimes from input to output for the case of moving, e.g. when adding an element to a collection. In my proposal, the add method takes a GC-only input argument for the element to be added or it takes an unboxed *Type which cloned (copied) to the collection and is a borrowed argument:

trait AddBoxed<*ElementType>
  pour add(Self, *ElementType): Self|()

impl Array<*ElementType> : AddBoxed<*ElementType>
  pour add(self, x: *ElementType): Self|()
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x // this invokes Clone trait
      self

Note that the implementation doesn't need to be type annotated because the types can be inferred from the trait declaration:

trait AddBoxed<*ElementType>
  pour add(Self, *ElementType): Self|()

impl Array<*ElementType> : AddBoxed<*ElementType>
  add(self, x)
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x // this invokes Clone trait
      self

For non-boxed case, we can add new element types to the first-class union:

trait Add<ElementType>
  // If `A` is not already in the `ElementType`, `Self` must be `var`
  // and return assigned to that `var` because an element type is
  // added to the first-class union
  pour add<A>(Self<ElementType>, ::A): Self<ElementType|A>|()

impl Array<ElementType> : Add<ElementType>
  add(self, x)
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x
      self

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

@shelby3 wrote:

@keean wrote:

Not macros! Sorry I have issues with macros :) if the language and type system is good enough, macros are not necessary.

I have been thinking the same way. Macros (especially powerful ones as Rust has) make code unreadable. So I explained that the language can do the main valid reasons we need C macros.

AST macros make code more difficult to read (especially for open source), except in the case of a DSL that is frequently used such that learning the DSL is worth the effort of the reader because of clarity of expression enabled by the DSL, e.g. a compile-time type SQL or HTML parser integrated with expressions from our language. The ability to create internal DSLs is more powerful:

http://www.paulgraham.com/avg.html

But Lisp's syntax is a fully generalized tree so that any desired structure for DSL code can be manipulated in a macro which is written in the same Lisp syntax:

What makes lisp macros so special

A compromise is a friendlier syntax which is sufficiently flexible to write many embedded DSLs but isn't as grotesque as Lisp. I should look into supporting Scala's syntax of allowing method names to be declared as infix operators and symbolic operators to be overloaded, with both right-to-left and left-to-right precedence variants, along with optional lazy evaluation:

http://hedleyproctor.com/2014/11/writing-a-dsl-with-scala-parser-combinators/

http://www.codecommit.com/blog/scala/the-magic-behind-parser-combinators

Scala also supports consuming a block as an input argument of the () (aka Unit) type in order to write DSL such as:

loop {
  i = i + 1
  v[i] = 0
}

Beyond that are AST macros which enable some of the full power of Lisp's macros:

https://news.ycombinator.com/item?id=3216202

http://www.scala-lang.org/old/node/12494.html#comment-56437

https://www.reddit.com/r/haskell/comments/1929xn/are_lispstyle_macros_a_code_smell/c8k7wl8

http://debasishg.blogspot.com/2009/07/macros-preprocessors-and-dsl.html

But these more powerful AST macros create the opportunity for "code smell":

http://www.scala-lang.org/old/node/12494.html#comment-56529

https://news.ycombinator.com/item?id=3216387

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

@keean wrote:

My advice would be to stick to functions :) You don't want to have to look to the definition to see how arguments are going to be handled.

Which of my annotations on procedure signatures is this referring to? Are you suggesting I should not allow regular an irregular procedures and only allow pure functions? Or are you commenting about whether to allow AST macros?

@keean wrote:

My own system was going to borrow from smalltalk I think, so you could use brackets to change how code is evaluated, with '{' '}' effectively declaring a zero argument closure. This allows functions to have macro like behaviour (in that they take their arguments unevaluated). In this way you can simulate the behaviour of things like 'if':

if(bool, {x = x + 1}, {x = x - 1});[quote="keean, post:298, topic:5751"

Scala has => on function declarations to declare that an argument is lazy evaluated. It also has the Unit type so code blocks can be function inputs, which afaik is evaluated in the scope where the function block is expressed as a statement. These features can be useful for creating DSLs in Scala as I mentioned in my prior post.

@keean wrote:

Would not evaluate the arguments until they are used within the function, rather than pre-evaluating. The conclusion is first class functions and closures have all the power of 'C' style macros.

Agreed, I implied that earlier with my example for H. So you must be referring to my comments about macros and DSLs.

@keean wrote:

I would allow operator overloading like C++ allows function call

So you mean being able to apply a function call operator to any object instance. Scala has this by defining the apply method of an object.

@keean wrote:

array lookup and other operators to be overloaded, but the uses must be consistent with the operator. So for example overloading '+' should only be allowed for commutative operations. This means '+' for string concatenation is not allowed.

Scala reuses the function call operator (...) for array lookup because it uses [...] for type parameters. I'd prefer to use <...> for type parameters and use [...] for array lookup.

Scala allows operator overloading. Afair it doesn't enforces commutivity on + or *. It does allow any symbol soup for operators and even allows names to be operators. Prefixing with : gives the operator right-to-left precedence grouping. The symbol soup operators tend to make Scala look like gobblyedgook, such as the Scalaz library. Would you agree that only certain symbols can be overloaded?

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

If ever allowing AST macros, it could be forced that each file that employes them has to have a special import to signify it. And this would be required to link to a Github or RFC for the macro, so that the open source community would frown on using macros that had not gained considerable community support, documentation, and adoption.

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

One major safety issue with Nim is that sending garbage collected pointers between threads will segfault, last I checked (unless you use the Boehm GC). Moreover, Nim is essentially entirely GC'd if you want (thread-local) safety; there is no safety when not using the GC. So Nim is in a completely different category from Rust; Rust's "core strength" is memory safety without garbage collection, which Nim (or any other industry language, save perhaps ATS) does not provide at all.

I think that summarizes one thing my (our) proposed PL intends to do better than both Rust and NIm.

See this:

https://gradha.github.io/articles/2015/02/goodbye-nim-and-good-luck.html

Remember I said we need moves for passing between threads.

Note that apparently Nim has added a shared mutable heap since that linked post.

https://news.ycombinator.com/item?id=9051800

I've been going through the academic papers, forerunners, and source-code for Rust's static memory routines and borrowing semantics. My hope and suspicion is that it can be added to Nim without core changes to the language like lifetimes. It's definitely not a guarantee, but with lots of experience in both languages now I feel very strongly that adding region-based-memory-management to Nim is possible while adding Nim's clarity, abstractions, and efficiency to Rust feels impossible.

I'm not so sure. The trickiest part of getting memory safety without garbage collection working is not the lifetimes but the borrow check, which relies on inherited mutability and, most importantly, the lack of aliasable mutable data. The APIs and libraries of garbage collected imperative languages invariably depend on aliasable, mutable memory. Consider something as simple as a tree or graph data structure with mutable nodes. Or consider taking two mutable references to different indices of an array, or splitting an array into mutable slices with dynamically computed indices. These are all things you (presumably) can do today in Nim, and a borrow checker would break them. The likelihood that the library APIs depend on being able to do it is very high.I never say never: you could implement multiple types of references, some GC'd and some not, and copy the Rust borrowing semantics. But they would be incompatible with most existing APIs and libraries. I don't think it can be realistically retrofitted onto a language without breaking most APIs: aliasable, mutable data is just too common.

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

@keean wrote:

Have you looked at Nim? Looks like it can do what you want:

proc onlyIntOrString[T: int|string](x, y: T) = discard

onlyIntOrString(450, 616) # valid
onlyIntOrString(5.0, 0.0) # type mismatch
onlyIntOrString("xy", 50) # invalid as 'T' cannot be both at the same time

That doesn't allow T to be a union. It forces quantification to a single type at the call (use) site:

By default, during overload resolution each named type class will bind to exactly one concrete type.

@shelby3
Copy link
Author

shelby3 commented May 26, 2016

@keean wrote:

@shelby3 wrote:

They didn't get the if-else correct, because : is lost in the soup of verbiage when used as the replacement for ternary:

http://nim-lang.org/docs/tut1.html#control-flow-statements-if-statement

I chose instead if (...) ... else ....

The ':' syntax for block statements like 'if' comes directly from Python, which is the most widely used language that uses significant whitespace, so its not that surprising they decided to go with what people might be familiar with.

I changed that aspect of my grammar to the same as Nim/Python, except no colon : when followed by scoping indentation:

statements:
  scope
  statement EOS statements    // EOS (end of statement) is issued before an INDENT or OUTDENT, else before a '\n' plus whitespace to the same column of current scope. Larger indents are not an EOS.

scope:
  INDENT statements OUTDENT   // INDENT and OUTDENT are a '\n' plus whitespace to the 2 space indented or any multple of 2 spaces outdented column.

condition-expression:
  'if' expression expr-or-scope { elseif } [ 'else' expr-or-scope ]
  expression

elseif:
  'elseif' expression expr-or-scope

expr-or-scope:
  ':' expression              // Scoping indentation required for nested `if` and assignment, same as for Nim. http://nim-lang.org/docs/tut1.html#statements-and-indentation
  EOS scope                   // Difference from Nim and Python is colon `:` not required when followed by scoping indentation.

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

Try to avoid the dangling else problem:

http://www.parsifalsoft.com/ifelse.html

I did:

@shelby3 wrote:

expr-or-scope:
  ':' expression              // Scoping indentation required for nested `if` and assignment, same as for Nim. http://nim-lang.org> /docs/tut1.html#statements-and-indentation

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

There was a comment on LTU today that has made me see the union type thing in a different light.

You can consider multi-method dispatch to be a form of first class pattern matching, and I think it will always result in a union type (not an intersection type) if you force all instances of the same method to have the same return type so:

fn size(x : String) -> i32 
fn size(x : i32) -> i32

Is modelled as something like

fn size args -> i32 {
    match args {
        (x : String) => ... ,
        (y : i32) => ...
    }
}

And the type would be:

size : String \/ i32 -> i32

Of course this is really a type class without the class:

trait Sizable {
    fn size(self) -> i32;
}

impl Sizable for String {
    fn size(self) -> i32 {...}
}

impl Sizable for i32 {
    fn size(self) -> i32 {...}
}

I think there is a reasonable argument that ad-hoc multi-methods are like untyped languages, and that type-classes bring some discipline and order to overloading.

In other words, as you write larger programs, ad-hoc multi-methods overloading will limit overall program size and complexity. Naming your type-classes is basically the same argument as naming your types. It seems to me someone who favours named types will eventually conclude that named type-classes are the right thing. If you prefer ad-hoc multi-method overloading you will probably prefer an untyped language?

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

There was a comment on LTU today that has made me see the union type thing in a different light.

Some smart folks over at LtU, but I find often the comments there are not well developed and often incorrect. Or I mean trying to compete with dozens of minds simultaneously seems to not really refine the discussions well. It is just all over the place. Also because the PL design space is so vast.

I am thinking I probably prefer a design environment of talking with a fewer number of experts with mutual dedication to analyzing all the details with pedantic detail and commitment to not allow incorrect or poorly developed thoughts to linger.

@keean wrote:

You can consider multi-method dispatch to be a form of first class pattern matching, and I think it will always result in a union type (not an intersection type) if you force all instances of the same method to have the same return type

For performance, I would prefer those overloads to be resolved at compile-time. The compiler should mangle the function names to create two different functions.

I've read in the past that Scala folks think function overloading was a bad idea, but I need to locate and study the reasons they think that.

@keean wrote:

I think there is a reasonable argument that ad-hoc multi-methods are like untyped languages, and that type-classes bring some discipline and order to overloading.

Disagree. I view union types as a way to delay at compile-time the point in the program where the type class is applied. Without first-class anonymous (or nominal/named) union types, then we are forced to prematurely erase the data types to a trait object in order to build a collection (or any data structure) with the type parameter being a union of data types. Nominal unions force much additional boilerplate as compared to first-class, anonymous unions.

So unions aren't throwing away typing, nor are they replacing typeclasses. Rather they delay the point at which we apply the typeclass to the function bounds. This enables us to add new data types and new functions orthogonally, being a (afaik the first) complete solution to Wadler's Expression Problem.

@keean wrote:

If you prefer ad-hoc multi-method overloading you will probably prefer an untyped language?

Disagree with the premise.

Untyped (i.e. unityped) languages allow the freedom to do the typing in your head and not enforced by the compiler, thus expression of semantics is entirely unihibited, but this comes at a cost of human error and lack of communication of the semantics in open source to other readers.

Delaying the coercion of data types to typeclasses (or shudder to super types in an OOP subtyping anti-pattern), by retaining the union of explicit data types is compile-time typing. It is not about throwing away types, but rather about opening the extension in both dimensions of Wadler's stated Expression Problem (which as he wrote at the turn of the century was an old problem in computer science).

P.S. I am working on another software project for a few months, so I will set this PL work aside interim, but will remain interested in discussions. I may not reply as quickly though. I am still very serious about this PL project. Want to come back to it asap. Hope we maintain contact. Any way, we are connected on LinkedIn, so I could message you there if ever you were not available here.

@shelby3
Copy link
Author

shelby3 commented Jun 24, 2016

http://btcbase.org/log/2016-06-23#1487987

mircea_popescu: nothing actionable or useful that i can discern. just the simple observation that a) code the size of the body of work of respected past intellectuals is not maintainable by teams or individuals, while the writings were maintainable by the author solo ; b) the correct solution to scale is not increase in rigidity, but flexibility. which is how skyscrapers work or japan defends from earthquakes.
asciilifeform: or how about a lift that maybe-goes down or maybe-up
mircea_popescu: so it seems to me that perhaps we could say "obviously" the more strictly defined a language gets, the more catastrophic the necessary collapse of its large scale systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment