Skip to content

Instantly share code, notes, and snippets.

@shelby3
Last active June 24, 2016 10:08
Show Gist options
  • Save shelby3/3829dbad408c9c043695e006a1581d19 to your computer and use it in GitHub Desktop.
Save shelby3/3829dbad408c9c043695e006a1581d19 to your computer and use it in GitHub Desktop.

So Rust has global references ownership rules, which only allow one (borrowed) mutable reference to the same object to be active in the current block scope. The objective is to prevent race conditions on the object. But from the discussion we had on the forum, a DAG tree of disjointness is not a general model of imperative programming and afaics in the general case requires dependent typing (which is known to exclude Turing-completeness). Thus I abandoned Rust's global borrowing rules, because it seems like a pita of false promises that is rarely (20% of the time?) applicable.

But I proposed a more localized use of borrow tracking which I also referred to in later posts{1} that afaics can help free temporary objects without relying on GC in many cases.

I don't intend to use my proposal to prevent race conditions due two or more borrowed mutable references existing in the same block scope (which as I stated above Rust prevents and which I think is a PL design error). In the case where all these borrows are "outer-only" immutable, my proposal can guarantee the owner has an exclusive mutable copy of the reference w.r.t. modifying the polymorphic type of the reference. This I think can be useful for the case where we want to assume the owner has the only copy of a reference to the collection in its block scope, so that my first-class unions idea can be used to add new element types to a collection without needing to require that the collection is an immutable data structure (since immutable data structures are generally at least O(log(n)) slower). In other words, we can force immutable access on a mutable data type every where except for the owner's block scope, so this gives us the best of both mutable data structures and immutability for prevent race conditions.

I was thinking about how asynchronous programming (e.g. yield) impacts my proposal. For example, in JavaScript yield of a Promise will allow the current thread to run another stack which was waiting on a Promise event which has been fulfilled and is at the top of the event queue. In this way, JavaScript is able to achieve concurrency without the slower overhead and bugs of synchronization primitives such as mutex guards. But this form of asynchronous concurrency can't create race conditions in my proposal, because the stack is not made reentrant, i.e. each asynchronous DAG of execution is on a separate stack. And my proposal will not pass borrowed references between threads of execution (in either form, neither OS threads nor asynchronous DAG of execution).

So I conclude my proposal is robust.

{1} [quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/41] 3: GC every where except ability to use borrow checking to track lifetimes of temporary objects to free them without burdening the GC in many cases. And copying some of v8's techniques for improving the robustness of handling temporary objects for those cases that fall through to the GC. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/51] Remember upthread I mentioned the idea of possibly employing borrow checking in a very limited way (that would be silent no annotations in most cases) which didn't extend its tentacles into a total ordering, yet afaics might enable to declare some references to be unique (non-aliased) mutable, so that these could be compile-time deallocated (not given to the GC) and afaics these guarantees would also eliminate the need for the immutable restriction on adding elements to collections under my proposed complete solution to the expression problem.

Thus I think the GC would be entirely out of the picture when interface to FFI in that case, but I need to dig into the details to be sure. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/37] I do roughly envision a possibly very significant advantage of compile-time checking the borrowing of memory lifetimes local to the function body, if that can help offload from GC the destruction of temporary objects and if it won't require lifetime parameters. And that is because thrashing the GC with temporary objects appears to be one of the most significant weaknesses of GC. That is only enforcing a partial order, and not attempting to compile-time check a global consistency which can't exist. In short, I don't want a type system that silently lies to me very often. [/quote]

[quote=https://users.rust-lang.org/t/inversion-of-control-resource-management/5737/28] That is why I am pondering about localizing the borrow checker to just function bodies, so that we can get some of its benefits without having lifetime checking grow tentacles into everything where it can't even model general semantics. [/quote]

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

@keean wrote:

@shelby3 wrote:

I forget to mention in my proposal, can't allow borrowed (non-GC) inputs to leak into return values. But can borrow input mutables and mutate them as outputs in pour or proc procedures.

This is an important realisation. As long as you pass objects by reference and return by mutable reference you don't need lifetimes. You have almost designed my system :slight_smile: The next step is to realise you can convert a return value into a pass by reference argument by mechanical rewrite of the abstract syntax tree.

The final step is to generalise this into an extended function call semantics where every function call can include an additional piece of metadata where the caller pre-allocates the space for the return value before calling, and passes a reference to that space as an additional argument.

Remember I had stated that your design proposal which you described as inspired by Ada, seemed to be similar to what I was a thinking about for a more localized (optional) borrowing without lifetime parameters for the cases where we don't need GC. One of my points of distinction was we also need optional GC (hopefully rarely so most temporary objects are freed without GC).

I still have a problem with my proposal in that there is no way to construct an iterator with a factory method by borrowing a reference to the collection without using something like a lifetime parameter to signify that the output shares the lifetime and mutability alias with the input collection. I can invert-the-control and put the code that wants to operate on the iterator inside of the iterator factory's nested block by inputting that code in as a callback function so that the constructed iterator is never returned. But this would be less composable because it would conflate the traits for different types of iterators the caller might want to employ together in the same algorithm.

Your proposed idea of allowing the caller to specify the preallocated reference for the return value doesn't solve the lifetime and mutability aliasing tracking. Rather it only solves them if everything is unboxed+cloned and is an optimization where the return value doesn't have to be cloned when assigned by the caller.

Thus afaics, I must support some way to track lifetime and mutability aliasing from inputs to outputs. But afaics there is no need for lifetime parameters, only an annotation on the borrowed input argument (GC-only input arguments ::Type are never tracked) of the procedure (pour and proc) signature indicating the input tracks to the output for lifetime and mutability aliasing. I propose <-Type, so for the implementation declaration (i.e. not in traits) that is x <-Type.

A question remains should pure functions allow tracking an input to an output? Is the aliasing of the lifetime and mutability a side-effect that impacts the relevant purity invariance?

Note as I wrote before GC instances are always alias or immutable val (because if they were var their lifetime is tracked via borrowing thus no need to make them GC), so the lifetime tracking (which also tracks the mutability aliasing) for an iterator factory would be applicable only when the caller is supplying a non-GC input reference for the collection.

One of the big distinctions compared to Rust is I am proposing there are no moves (except for moving to a different OS thread), thus there is no need to track lifetimes from input to output for the case of moving, e.g. when adding an element to a collection. In my proposal, the add method takes a GC-only input argument for the element to be added or it takes an unboxed *Type which cloned (copied) to the collection and is a borrowed argument:

trait AddBoxed<*ElementType>
  pour add(Self, *ElementType): Self|()

impl Array<*ElementType> : AddBoxed<*ElementType>
  pour add(self, x: *ElementType): Self|()
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x // this invokes Clone trait
      self

Note that the implementation doesn't need to be type annotated because the types can be inferred from the trait declaration:

trait AddBoxed<*ElementType>
  pour add(Self, *ElementType): Self|()

impl Array<*ElementType> : AddBoxed<*ElementType>
  add(self, x)
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x // this invokes Clone trait
      self

For non-boxed case, we can add new element types to the first-class union:

trait Add<ElementType>
  // If `A` is not already in the `ElementType`, `Self` must be `var`
  // and return assigned to that `var` because an element type is
  // added to the first-class union
  pour add<A>(Self<ElementType>, ::A): Self<ElementType|A>|()

impl Array<ElementType> : Add<ElementType>
  add(self, x)
    if (self.length >= self.max_length) ()
    else
      self[self.length++] = x
      self

@shelby3
Copy link
Author

shelby3 commented May 24, 2016

@shelby3 wrote:

@keean wrote:

Not macros! Sorry I have issues with macros :) if the language and type system is good enough, macros are not necessary.

I have been thinking the same way. Macros (especially powerful ones as Rust has) make code unreadable. So I explained that the language can do the main valid reasons we need C macros.

AST macros make code more difficult to read (especially for open source), except in the case of a DSL that is frequently used such that learning the DSL is worth the effort of the reader because of clarity of expression enabled by the DSL, e.g. a compile-time type SQL or HTML parser integrated with expressions from our language. The ability to create internal DSLs is more powerful:

http://www.paulgraham.com/avg.html

But Lisp's syntax is a fully generalized tree so that any desired structure for DSL code can be manipulated in a macro which is written in the same Lisp syntax:

What makes lisp macros so special

A compromise is a friendlier syntax which is sufficiently flexible to write many embedded DSLs but isn't as grotesque as Lisp. I should look into supporting Scala's syntax of allowing method names to be declared as infix operators and symbolic operators to be overloaded, with both right-to-left and left-to-right precedence variants, along with optional lazy evaluation:

http://hedleyproctor.com/2014/11/writing-a-dsl-with-scala-parser-combinators/

http://www.codecommit.com/blog/scala/the-magic-behind-parser-combinators

Scala also supports consuming a block as an input argument of the () (aka Unit) type in order to write DSL such as:

loop {
  i = i + 1
  v[i] = 0
}

Beyond that are AST macros which enable some of the full power of Lisp's macros:

https://news.ycombinator.com/item?id=3216202

http://www.scala-lang.org/old/node/12494.html#comment-56437

https://www.reddit.com/r/haskell/comments/1929xn/are_lispstyle_macros_a_code_smell/c8k7wl8

http://debasishg.blogspot.com/2009/07/macros-preprocessors-and-dsl.html

But these more powerful AST macros create the opportunity for "code smell":

http://www.scala-lang.org/old/node/12494.html#comment-56529

https://news.ycombinator.com/item?id=3216387

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

@keean wrote:

My advice would be to stick to functions :) You don't want to have to look to the definition to see how arguments are going to be handled.

Which of my annotations on procedure signatures is this referring to? Are you suggesting I should not allow regular an irregular procedures and only allow pure functions? Or are you commenting about whether to allow AST macros?

@keean wrote:

My own system was going to borrow from smalltalk I think, so you could use brackets to change how code is evaluated, with '{' '}' effectively declaring a zero argument closure. This allows functions to have macro like behaviour (in that they take their arguments unevaluated). In this way you can simulate the behaviour of things like 'if':

if(bool, {x = x + 1}, {x = x - 1});[quote="keean, post:298, topic:5751"

Scala has => on function declarations to declare that an argument is lazy evaluated. It also has the Unit type so code blocks can be function inputs, which afaik is evaluated in the scope where the function block is expressed as a statement. These features can be useful for creating DSLs in Scala as I mentioned in my prior post.

@keean wrote:

Would not evaluate the arguments until they are used within the function, rather than pre-evaluating. The conclusion is first class functions and closures have all the power of 'C' style macros.

Agreed, I implied that earlier with my example for H. So you must be referring to my comments about macros and DSLs.

@keean wrote:

I would allow operator overloading like C++ allows function call

So you mean being able to apply a function call operator to any object instance. Scala has this by defining the apply method of an object.

@keean wrote:

array lookup and other operators to be overloaded, but the uses must be consistent with the operator. So for example overloading '+' should only be allowed for commutative operations. This means '+' for string concatenation is not allowed.

Scala reuses the function call operator (...) for array lookup because it uses [...] for type parameters. I'd prefer to use <...> for type parameters and use [...] for array lookup.

Scala allows operator overloading. Afair it doesn't enforces commutivity on + or *. It does allow any symbol soup for operators and even allows names to be operators. Prefixing with : gives the operator right-to-left precedence grouping. The symbol soup operators tend to make Scala look like gobblyedgook, such as the Scalaz library. Would you agree that only certain symbols can be overloaded?

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

If ever allowing AST macros, it could be forced that each file that employes them has to have a special import to signify it. And this would be required to link to a Github or RFC for the macro, so that the open source community would frown on using macros that had not gained considerable community support, documentation, and adoption.

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

One major safety issue with Nim is that sending garbage collected pointers between threads will segfault, last I checked (unless you use the Boehm GC). Moreover, Nim is essentially entirely GC'd if you want (thread-local) safety; there is no safety when not using the GC. So Nim is in a completely different category from Rust; Rust's "core strength" is memory safety without garbage collection, which Nim (or any other industry language, save perhaps ATS) does not provide at all.

I think that summarizes one thing my (our) proposed PL intends to do better than both Rust and NIm.

See this:

https://gradha.github.io/articles/2015/02/goodbye-nim-and-good-luck.html

Remember I said we need moves for passing between threads.

Note that apparently Nim has added a shared mutable heap since that linked post.

https://news.ycombinator.com/item?id=9051800

I've been going through the academic papers, forerunners, and source-code for Rust's static memory routines and borrowing semantics. My hope and suspicion is that it can be added to Nim without core changes to the language like lifetimes. It's definitely not a guarantee, but with lots of experience in both languages now I feel very strongly that adding region-based-memory-management to Nim is possible while adding Nim's clarity, abstractions, and efficiency to Rust feels impossible.

I'm not so sure. The trickiest part of getting memory safety without garbage collection working is not the lifetimes but the borrow check, which relies on inherited mutability and, most importantly, the lack of aliasable mutable data. The APIs and libraries of garbage collected imperative languages invariably depend on aliasable, mutable memory. Consider something as simple as a tree or graph data structure with mutable nodes. Or consider taking two mutable references to different indices of an array, or splitting an array into mutable slices with dynamically computed indices. These are all things you (presumably) can do today in Nim, and a borrow checker would break them. The likelihood that the library APIs depend on being able to do it is very high.I never say never: you could implement multiple types of references, some GC'd and some not, and copy the Rust borrowing semantics. But they would be incompatible with most existing APIs and libraries. I don't think it can be realistically retrofitted onto a language without breaking most APIs: aliasable, mutable data is just too common.

@shelby3
Copy link
Author

shelby3 commented May 25, 2016

@keean wrote:

Have you looked at Nim? Looks like it can do what you want:

proc onlyIntOrString[T: int|string](x, y: T) = discard

onlyIntOrString(450, 616) # valid
onlyIntOrString(5.0, 0.0) # type mismatch
onlyIntOrString("xy", 50) # invalid as 'T' cannot be both at the same time

That doesn't allow T to be a union. It forces quantification to a single type at the call (use) site:

By default, during overload resolution each named type class will bind to exactly one concrete type.

@shelby3
Copy link
Author

shelby3 commented May 26, 2016

@keean wrote:

@shelby3 wrote:

They didn't get the if-else correct, because : is lost in the soup of verbiage when used as the replacement for ternary:

http://nim-lang.org/docs/tut1.html#control-flow-statements-if-statement

I chose instead if (...) ... else ....

The ':' syntax for block statements like 'if' comes directly from Python, which is the most widely used language that uses significant whitespace, so its not that surprising they decided to go with what people might be familiar with.

I changed that aspect of my grammar to the same as Nim/Python, except no colon : when followed by scoping indentation:

statements:
  scope
  statement EOS statements    // EOS (end of statement) is issued before an INDENT or OUTDENT, else before a '\n' plus whitespace to the same column of current scope. Larger indents are not an EOS.

scope:
  INDENT statements OUTDENT   // INDENT and OUTDENT are a '\n' plus whitespace to the 2 space indented or any multple of 2 spaces outdented column.

condition-expression:
  'if' expression expr-or-scope { elseif } [ 'else' expr-or-scope ]
  expression

elseif:
  'elseif' expression expr-or-scope

expr-or-scope:
  ':' expression              // Scoping indentation required for nested `if` and assignment, same as for Nim. http://nim-lang.org/docs/tut1.html#statements-and-indentation
  EOS scope                   // Difference from Nim and Python is colon `:` not required when followed by scoping indentation.

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

Try to avoid the dangling else problem:

http://www.parsifalsoft.com/ifelse.html

I did:

@shelby3 wrote:

expr-or-scope:
  ':' expression              // Scoping indentation required for nested `if` and assignment, same as for Nim. http://nim-lang.org> /docs/tut1.html#statements-and-indentation

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

There was a comment on LTU today that has made me see the union type thing in a different light.

You can consider multi-method dispatch to be a form of first class pattern matching, and I think it will always result in a union type (not an intersection type) if you force all instances of the same method to have the same return type so:

fn size(x : String) -> i32 
fn size(x : i32) -> i32

Is modelled as something like

fn size args -> i32 {
    match args {
        (x : String) => ... ,
        (y : i32) => ...
    }
}

And the type would be:

size : String \/ i32 -> i32

Of course this is really a type class without the class:

trait Sizable {
    fn size(self) -> i32;
}

impl Sizable for String {
    fn size(self) -> i32 {...}
}

impl Sizable for i32 {
    fn size(self) -> i32 {...}
}

I think there is a reasonable argument that ad-hoc multi-methods are like untyped languages, and that type-classes bring some discipline and order to overloading.

In other words, as you write larger programs, ad-hoc multi-methods overloading will limit overall program size and complexity. Naming your type-classes is basically the same argument as naming your types. It seems to me someone who favours named types will eventually conclude that named type-classes are the right thing. If you prefer ad-hoc multi-method overloading you will probably prefer an untyped language?

@shelby3
Copy link
Author

shelby3 commented May 30, 2016

@keean wrote:

There was a comment on LTU today that has made me see the union type thing in a different light.

Some smart folks over at LtU, but I find often the comments there are not well developed and often incorrect. Or I mean trying to compete with dozens of minds simultaneously seems to not really refine the discussions well. It is just all over the place. Also because the PL design space is so vast.

I am thinking I probably prefer a design environment of talking with a fewer number of experts with mutual dedication to analyzing all the details with pedantic detail and commitment to not allow incorrect or poorly developed thoughts to linger.

@keean wrote:

You can consider multi-method dispatch to be a form of first class pattern matching, and I think it will always result in a union type (not an intersection type) if you force all instances of the same method to have the same return type

For performance, I would prefer those overloads to be resolved at compile-time. The compiler should mangle the function names to create two different functions.

I've read in the past that Scala folks think function overloading was a bad idea, but I need to locate and study the reasons they think that.

@keean wrote:

I think there is a reasonable argument that ad-hoc multi-methods are like untyped languages, and that type-classes bring some discipline and order to overloading.

Disagree. I view union types as a way to delay at compile-time the point in the program where the type class is applied. Without first-class anonymous (or nominal/named) union types, then we are forced to prematurely erase the data types to a trait object in order to build a collection (or any data structure) with the type parameter being a union of data types. Nominal unions force much additional boilerplate as compared to first-class, anonymous unions.

So unions aren't throwing away typing, nor are they replacing typeclasses. Rather they delay the point at which we apply the typeclass to the function bounds. This enables us to add new data types and new functions orthogonally, being a (afaik the first) complete solution to Wadler's Expression Problem.

@keean wrote:

If you prefer ad-hoc multi-method overloading you will probably prefer an untyped language?

Disagree with the premise.

Untyped (i.e. unityped) languages allow the freedom to do the typing in your head and not enforced by the compiler, thus expression of semantics is entirely unihibited, but this comes at a cost of human error and lack of communication of the semantics in open source to other readers.

Delaying the coercion of data types to typeclasses (or shudder to super types in an OOP subtyping anti-pattern), by retaining the union of explicit data types is compile-time typing. It is not about throwing away types, but rather about opening the extension in both dimensions of Wadler's stated Expression Problem (which as he wrote at the turn of the century was an old problem in computer science).

P.S. I am working on another software project for a few months, so I will set this PL work aside interim, but will remain interested in discussions. I may not reply as quickly though. I am still very serious about this PL project. Want to come back to it asap. Hope we maintain contact. Any way, we are connected on LinkedIn, so I could message you there if ever you were not available here.

@shelby3
Copy link
Author

shelby3 commented Jun 24, 2016

http://btcbase.org/log/2016-06-23#1487987

mircea_popescu: nothing actionable or useful that i can discern. just the simple observation that a) code the size of the body of work of respected past intellectuals is not maintainable by teams or individuals, while the writings were maintainable by the author solo ; b) the correct solution to scale is not increase in rigidity, but flexibility. which is how skyscrapers work or japan defends from earthquakes.
asciilifeform: or how about a lift that maybe-goes down or maybe-up
mircea_popescu: so it seems to me that perhaps we could say "obviously" the more strictly defined a language gets, the more catastrophic the necessary collapse of its large scale systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment