Skip to content

Instantly share code, notes, and snippets.

@lovely-error
Last active July 11, 2021 02:53
Show Gist options
  • Save lovely-error/2c464ae326e5aa033893993113778dc0 to your computer and use it in GitHub Desktop.
Save lovely-error/2c464ae326e5aa033893993113778dc0 to your computer and use it in GitHub Desktop.
A Big Advance

You were failed

Disclaimer

All info in this draft didnt come from my wild dreams. What I did is lurked for some research sitting in public university domains. It is amizing what highspeed unmetered internet connection and human with brains can do together. In short I just assembled some stuff to satisfy what I needed - A language for ontology software creation and how it ought to look in 21st century. You may see papers youself here

Intro

Let me be frank with you. All popular languages in use are bad. They all share a set of common design failures, plaguing some and sometimes all aspects of what programming discepline is meant to be - which is about making models that can be used by someone else to make more usefull stuff. Most of languages that have big repositories of useful frameworks, utilities etc do constrain the developer and its evolution in one way or another.

Evidence for claims

All problems can be separated into 4 major categories: adequacy of software artifact distribution, scale of unjustified language complexity and redundancy, restrictions of type sytem, and variety of libraries.

  • Python: venv can be just acceptable, but can be a nightmare. Conda is no better. No static cheking = more problems if your program consists of more than 20 loc. Monotyped = hard to reason about things. Decorators are bad.
  • C++: started as lowlevel swiss knife with objects, now it's a fiendish amalgamation of adhoc features (Much more are being added. There will probably be 3 new when you finish reading this). Sadly, has biggest library of tools for almost anything. But who cares, cuz you'd probably drop it after n-th encounter with cmake.
  • JS: one outstanding feature - each object is a record. INSANELY BLOATED, no focus put on what is actually powerful. npm has too much trash. No static checks. None of the derrivatives (typescript, coffescript etc) fixed problems.
  • Rust: half ot the language just doesnt work. .into() is just freaking dumb. Rust macros are cpp's preprocessor, and python's decorators, meaning they are bad because implicit code transformations are bad. Rust look like somebody's sideproject.

What happened?

A lot happened since the second half of past century: there is a big pile of research that presented multitude of new approaches to old and new problems, yet surprisingly few of them was put to serve the good. Among variaties of them, I want to put your focus on a small set which can be considered to do most evil.

It's quite short to elaborate and stems from number of missing requirements that was not foreseen at the time of concieval of those languages.

  1. Knowledge construction. People often - in open source comm, in particular - dont know all relevant information that influence the design of the system. This implies that new relations and properties must be added to a system dunring it's life and, to complicate this a bit, to frameworks with closed-sources as well. One famous attemp to solve this problem is oo paradigms, but as many have said: "OOP has failed us all!". Despite this claim had been made in a relatively old past, most of the popular langs out there have picked the worst form of oo which is object inheritance. The reason behing the hatered is that OO languages lock you into the code that is designed into a setting of closed-world assumption moddeling; obviosly this kills reuse. New langs like go, or rust relax this strictness, but still they have different set of their own problems and deficiencies.
  2. Concurent processes. Each complex system must have laws which govern eveyrhing observable that happens in it. Unfortunately, most of languages in mainstreem dont even have explicit mutability control. There are ugly workarounds.
  3. Artifact distrubition. No opersource back then. Frameworks was developed for companies to be used in companies.

In general, if system fails to correctly model (at least part of) mathematical discource, it is useless.

Guiding philosophy

I have looked at and tried some programming languages - from the past and present - and spoted good and bad sides in them. Among bad traits they posses, the most gnarly is lack of unification. Instead of producing a core set with which other constructions could be generated, langs are often designed as a set of adhoc features that was never consider to be destructured. It often lead to a pointless distractions in development and fluency accuisition caused by overwhelming excess of unnesecary features. Some other relevant points:

  • Language ought to be a small set of meaningful intercomplementary features which are to be used to create domain-specific tools.

  • No redundancy is allowed.

  • Anything important must be explicit, and never irritating.

  • Nothing implicit should be.

  • Complexity should not be viewed as problematic property if justified. It is about making really good tool for solving problems, without creating new along the way.

  • While being not simple, the language should not be harder than necessary, and not troublesome at all.

  • It is about reuse and about guarantees.

  • Reuse > Speed. When reuse must be achieved with a loss of performance, it is to be sacrificed if no other options are present.

  • Language must not be slow. The acceptable margin is 30% diviation from C. If its on bar of java, we can deem it a failure on go home.

  • No more than 1.5 ways to solve a problem is allowed. If there are, something very wrong is happening.

  • Everything should be expected. No surprises!

  • Visual apperance matters. Your eyes are being burnt by monitor light anyway, so at least you can feel good a little!

  • It shouldn't all be put into the language. Let's supporting tools, like IDEs and alikes, do things that they can do better.

The idea

There is plenty of findings, if you just were to look up the bookshelf!

My attempt has name Guise. GuisePL has features from 𝛌Pω, and due to extension of it with subtyping, modality and other useful constructions, the resulting expressive power is high, as well as usability. I believe that.

Any project is placed in a .gscc container. It has all information required to produce binary files from sources, whether these are to be executables or libs. It should be possible to prune redundant dependencies. There are two types of distributed artifacts: library and executable. When compiler resolves dependencies, it looks at feature image(things used in sources), calculates the feature diff between versions and choses appropriate ones. Any exec can be optionaly compiled by embeding all deps into bin file.

Artifact distribution may also look like this:

  • nlp driven even for docgen
  • each package may get a dedicated forum
  • opt paid model of development by creating a maintainance collective whose development will be shaped on a forum provided by package hosting platform. (Collective is a finite set of people. It is created by the project initiator, and later new people can be designated as part of collective. All earnings are split according to amout of contribution.)

Overview

Type System

Type system is named: Eclectic Type Framework. It is composed of varios features; among them are: 8 sorts that denote objects - actor type, record type, variant type, function type, closure type, mutability type markers, predicative sort of predicates for data objects that include 2 modalities that describe what can happen, and what must happen and 2 modalities that describe when it is the case: now or later, and all valid permutations of thereof; predicative sort of predicates for functional objects which include may and must modalities, and 2 specific modalities for predicates: precondition and postcondition; naturally, there are also all valid permutations of thereof. There are also many relations and axioms, but they deserve another, much longer and thorough documentual explanation.

System have polymorphism, type operators (curried application), dependent types, and subtyping for elements of all sorts (with some exceptions).

Because this framework has arbitrary values that can be equated only by evaluation, type inference is in general undecideable. It is also cannot be trivialy resolved on module boundry; For that, automated proving based on game semantics will be used. It is expected to work in most cases.

module A { fn 'make string': Nothing -> °String = { "a" } }
module B {
  use A
  fn 'id of #': (val: &String) -> &String where {
    must forall val : { precondition '('(val) contains ("a")') is equal to (.True)' }
  } = { val in val }
  fn main = Environment.'show ('id of ('make string'())') if possible'
}

Looking at the example above, it is easy to see that compiler could potentially resolve this case bacause all values that are involced in type signature formation are present at compile-time. It can infer that indeed the value produced by 'make string' will satisfy requirement of id. Which implies that this particular piece of code would compile if we follow a hipotetical reference semantic. Of course, in case when values are not inferable, either runtime checks must be present to ensure invariants dont sneak into programm, or another approach that is to automatically find a contradiction.

Records & Variants

Two most important schemes of data representation - rec and enum - are found everywhere in math and phylo. Both nominal and anonumous versions are present.

rec Person { name: String }
enum 'Compass Direction' { North, South, West, East }
enum 'Life path' { Loner, Social: rec {peers: Array{Element=Person}} }

-- enum elliminator
? <# expr #> | Option1 => <# expr #> | Option2 if <# cond #> => <# expr #> | _ = <# expr #>

-- nested ellimination 
? <# expr #>
	| Option1 => <# expr #> 
	| Option2 if <# cond #> => <# expr #> 
	| _ = (? <# expr #> | Option1 => <# expr #> | Option2 if <# cond #> => <# expr #> | _ = <# expr #>)

Object Algebra

The category of these object is a lattice-like structure with 4 binary operations. They are left associative and noncommutative. They can be used on types and instances. They does not collide with binary plus for numbers, because it does not exist; only methods and nonmixfix free functions are present in the language.

  • + that is used in absence of name collision
  • |+ that is used to transfer value/type from left operand
  • +| that is used to transfer value/type from right operand
  • - that is used to destroy values in left operand Instances of products can be concatenated, instances of summs cannot be concatenated.
// here presence of * means instance, absence means type or kind
rec* + rec*; rec + rec; rec* |+ rec*; rec* +| rec*;
enum + enum; enum |+ enum; enum +| enum; 
enum* + enum;

rec - rec; rec* - rec;
enum - enum;

Any variation of the following is an error:

enum* + enum*; rec* + rec; rec + enum; enum + enum*;
enum* - enum;

When applied to some types, ops of this object algebra have certain exceptions:

T - rec {...} // In case when T is not sub of rec {...}, - is idempotent
Nothing - rec {...} // Is idempotent
T + rec {...} // Is idemp when lhs <: rhs

Dep types are concatanated as follows:

rec that A + rec that B = rec that {>0} + rec that {>100}
//When B <: A, B is picked as a defining constraint, because B enough to imply A

  • they can be concatenated
rec Aged {age: Integer}; rec Named {name: String}
type 'Somebody that Can Participate in Group Activities' = Aged '+ Named
enum 'South American Parrots' {...}
enum 'African Parrots' {...}
type 'Parrots Available for Sale' = 'South American Parrots' '+ 'African Parrots'

a := enum {a, b: String} {.b="!"}
b := enum {c, b: Int} {.b=1}
c : enum {a, b: String, c} = a |+ b //should be error
'correct c' : a |+ b = a //now we're ok

name := "Thomas", age = 19
person : rec {name:String, age:Integer} = {name} + {age}
  • they can be contracted
type 'Some Parrots' = 'Parrots Available for Sale' - 'African Parrots'
'Some Parrots' == 'South American Parrots'

Language has two entities that can store type variables, and extraction operator.

type T = ... //stores saturated type
kind E = ... //stores type with all variables errased from it

p := ""
type String = #p // # is type extraction operator

with L { rec Poly {a: _ L} }
l "= L{a=""}
kind U = #l // rec Poly {a: _ *}

Subtyping relation exists:

  • for records
rec A { a: `i T } >: rec B { a: `k >: `i T, b: K }
// B can be used where A is expected
  • for variants
enum A { P } <: enum B { P, L }
// A can be used where B is expected
  • for mutability
type 'Imutable String' = String
type 'Mutable String' = &String
type 'Sendable String' = ^String
type 'MutSend String' = 'Mutable String' '+ 'Sendable String'
type 'New Born String' = °String
// rhs can be used where lhs is expected
T <: °T, &T <: °T, ^&T <: °T, ^T <: °T
^T <: ^&T, &T <: ^&T, T <: ^&T

clone operation can duplicate any data object
a := 0 // a is immutable
b := (clone a) as & // copy 0 and make it mutable
  • for types
kind L ?<: kind P
kind H !<: type M
type T ?<: type F
  • for predicates Since predicates in genral do not have relation of subsumption, the rule for subtyping is not trivial. If pred A is stricter that B, A is always picked. If predicates A and B are unrelated, their conjunction is used for refinement.

Context introduction

It is possible to define a context, during scope of which cerain properties hold. Among them can be : subtyping rules, mutability orders, and active cases

with T <: rec {} { ... } // T can be any type that is subtype of empty record
with `i <: °, `p >: `i { ... } // i T can have any mutability that is sub of 'newborn',
							   // and p can be any supertype of concrete mutability i					

Actors

They are great to build correct concurently executing code. Also they lead to a better decoupling of functionality.

with Request, `i <: °, `k >: `i {
  actor 'HTTP Server' {
	request : `i Request
	'respond to request' : (request: `k Request) -> Nothing = { ... }  
  }
}

Type Operators

These are type binders from the Cube

with Type {
  rec A { a: _ Type }
}
type B = A{Type: _ Collection{Element=Character}}
a := B{a="Hello, World!"}

The general syntax is:

with L { rec K {a: _ L } }
type P = K {L=String} //Explicit substitution, only recs and enums are acceptable

kind I = K {L <: Int} //Boundry specification. Types and concepts are acceptable

Dependent types

rec Person {name: String}
fn '# contains only Mandarin hieroglyphs': (name: String) -> °Boolean = { ... }
type 'Chinese Person' = Person that {
	must forall $.name : { '('(name) contains only Mandarin hieroglyphs') is equal to (.True)'}
}

Polymorphism

with 'Anything Named' <: rec {name: _ String}, `i >: & {
  fn 'perform some operation on #': (object: `i 'Anything Named') -> Nothing = { ... }
  rec 'My awesome record' { field: `i 'Anything Named' }
}

Modules

Maybe modules could be unified with records.

module 'Most Usefull Things in the World' {
	use Flashlight
	type World
	fn 'illuminate the world #': (world: World) -> Nothing
}
module 'Routine Exterminating Pack' {
	use 'Most Usefull Things in the World'{
		World = 'Planet Earth'
		'illuminate the world' = { ... }
	}
}

Moddeling

In Guise, Concepts define models.

with `mb >: &, T <: rec {'underlying storage': `mb 'Memory Buffer', 'number of elements': Integer}, 
Element, ''Insertion Result' >: enum {'insertion has completed succesefully', 'insertion has failed'}  {
	concept Set for T {
		with `i <: `mb {
			fn 'put new element inside': 
			(self: `i Self) -> (element: `i Item) -> (res: °'Insertion Result')
			where {
				must forall self : { 
          precondition '('(self.'underlying storage') is containing (element)') is equal to (.False)' 
          postcondition '('(self) is containing (element)') is equal to (.True)'
          postcondition match res | 'insertion has completed succesefully' = True | _ = False
        }
}}}}}

This is a condensed example, though is enough demonstrate that to abstract over collection, any implementation must behave according to specification. This allows you to use any valid data type where Set is expected. Rules here are that a data object must not already contain element for insertion and insertion must store inside exactly one element. As the result, a higher degree of reuse is possible.

It is reasonable to object that this cannot allow to say 'object before mutation should not contain object for insertion, and the object after must contain it', but it would require somehow access object from the past and from future at the same time, which is possible, but only in the world of reversible computation. And fon Neumann machines are not known to do such thing. So a witness is required if you would to do this kind of enforcement.

Cases

Cases bind methods to objects. This feature allow to decouple implementation of methods from objects themselves.

type 'NASA Standart Hash Set' = rec { ... } + ...
case 'Linear Insertion' of 'NASA Standart Hash Set' {
	fn 'insert # at # in #': Element -> Index -> Self -> Nothing = { ... }
}
case 'Linear Search' of 'NASA Standart Hash Set' {
	fn 'insert': ...
}

Than it ought to be used like this:

'star set' := 'NASA Standart Hash Set' . 'load from file (...)'
with 'Linear Search'  {
	star := 'find star by name ("Alpha Centauri") in ('star set')'
}
with 'Linear Insertion' {
	 'insert (star) at index (966) in ('another star set')'
}

There is no implicit conversion happening. Each case can access what is alredy present in a specified object. Cases do not destroy implementation as |+;+| would do.

Type cheking

Computation might not terminate, but this is ok. Also cheking equality of types depending on terms containing arbitrary expressions is undecideable. Yet, I believe that language should be capable of describing complex interactions and properties, it may not be nessecary to do so with rigour expected from proof assistants, since the main goal of this language is software creation. A good thing, though, is that there exist a decideable fragment which is called liquid types and it is expected that most uses of deptypes will be covered by them. Modal fragment, unfortunately, is undecideable in general, still, subset of it is.

Runtime

There are known techniques of making a decent runtime model. Swift and Rust are sources of insights and inspiration.

Data types

Each datatype instance is manipulated through a handle containing 3 pointer-sized words. First one is an adress of an object or object itself (if its pod), second one is a normalized accessor record which is a vector contains null-separated pattern-terminated 64 bit values whose upper 16 bits functions as a tag. Rest 48 bits represent a hash of a mangled name. Mangled names are part of a global space of a program.

When DT is generic it's reffered indirectly, and manipulated with associated accessor record.

Every dt header has a stable adress during its lifetime.

Records

Monomorphic records with only pod data are like C structs. When compiled without enabled dynamic features, compiler seeks all usages of subtyping and reorders fileds of recs such that look up is perforemed without one step of indirection. Each field of an instance is either pod type or adress.

Variants

Enums without payloads are just numbers. If they contain payload, than there is additional pointer to a location of that payload.

Actors

At runtime, each actor will be represented as a struct of 4 components:

  • u5 write head: same as read head, but for writing
  • u5 read head: 16 bit number with first 11 bits pointing to current rb, and rest 5 pointing to an element inside rb
  • [(u48, u16);32] ops: a ring buffers containing 32 slots for pointer sized words with upper 16 bits reserved. If no space is available, to put invocation on actor's queue, than calling cite waits until actor handles existing job and frees new slot.
  • [u64;n<=64] surfuncs: a fixed array with 64 pointers to functions. It can be striped to match the number of surface functions

Obviosly dense packing is possible. If all space in buffers is occupied then invocation of any surface function will be paused (will go into atomic compare wait loop). Each prelude thunk modifies actor numbers atomicaly.

The message queue will be organized as a ring buffer containing pointers (u48) to arguments. Upper 16 bits in each pointer is reserved for tags. Among them, 7 are used for actor specific purposes: 1 is a flag indicating that adress was visited, meaning it is a free slot in a buffer, and the rest 6 are for specifying which surface function of related actor should be invoked. Because of this design, actors cannot have more than 64 surface functions.

The runtime is going to schedule work using work-steeling algorithm. Each invocation of actor surface function put new pointer to argument. Which invoke associated function on the argument.

In no resilence mode, compiler could analyse topology of an actor network and put prefetch instructions where nesecary.

Is it possible to implement?

In short, it is extremly hard. The descision problem aint gonna solve itself, but there is a promissing potential solution, which is concolic execution paired with neural networks. Since information about the system is highly coherent, something like proof asistant based on game semantics could search for contradictions and make reasoning at least possible to a certain degree. Still the search time would be not small. In current state of affairs it might take dozens of hours to find spec violations if any; also this procedure may fail. Still one could easily opt out guaranties for faster compilation, or reuse certain proofs about code from common database.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment