Here is a timeline of development for a programming language I worked on 2020-2023
This is a POC for first ~1/3 of compiler for new language I'm working on... This demo is just a shell for part of compiler that handles:
- Lexing
- Parsing (weird bc lang is postfix)
- Some optimizations
- Majority of type-checking and validation
I wrote some prose while planning out the type system. This example is smilar to using named-tuple in Python but methods/getters are defined as overloaded operators.
(typo: should be seq .value and seq .count, not $seq )
Here's a more complex example with 'inheritance'/parametric types. Not sure if everything here is doable, but definitely a goal. Also i don't like having to do _t suffix... feel like there should be a way to do it as prefix instead, maybe capital letters?
Forgetting to escape/unescape identifiers is going to be a running theme for me with this language. It's too similar to my previous language -- YodaScript which almost always uses escaped identifiers. Line 25 should be $sequence_t.
I made some changes to my thinking for the typesystem. I know that I've definitely added some complexity but I think this design improves the consistency and adds some more useful tools instead of having types feeling out of place.
So I guess there are two distinct concepts
Types: Let us a way to check what data is/can be held by the value Classes: let us distinguish data of the same type but associated with a different purpose/functionality
So with this in mind, classes would be replaced with 'traits' which would just be numerical flags that you can add to a value to assign functionality implemented by checking to see if the argument has the given 'trait'. Some downsides but simplifies language a bit
So here's what that would look like, although I think this gives the user a more power and does a better job of exposing what's actually going on, there's some lost safety and way more boilerplate code so probably going to use plan from first post in thread
I made a demo over the weekend which adds types to the language.
Most of original plan implemented.
Logical provided by standard library. To clarify, the compiler does provide logic out of the box but the operators are user-level. Also again, sorry the language isn't easy to read... really picked the worst of both worlds for that by making it stack-based and functional in addition of the other weird stuff
Recursion was a big painpoint... I came up with a way to implement it while taking a university exam but implementation was not so straightforward.
The following demo shows the language getting used within a javascript program. It compiles a webassembly module from the PH code given in a string template literal.
Using short circuit logic without good optimizations led to some ugly WebAssembly being generated.
Recursion type inference algorithm:
- try to compile function as if non-recursive
- detect it's recursive, throw and start over
- trace to infer func type
- trace again, handling bindings for recursive call
and because there are no type annotations, we cannot recycle the inferred types. So every single recursive function invocation requires 3 attempts to compile it in order to determine all the context-sensitive datatypes and stuff
ocaml has a "rec" specifier for functions, if I were to adopt something this it would have made things slightly better (only 2 traces and less complexity), but I've already kinda done it the hard way so I prob should just finish it as is without doing weird stuff with syntax
This only supported tail recursion.
Some thinking was done to plan out the optimal way to pack the data for string literals into linear memory in order to reduce duplication. Strings are implemented as a pointer + an length as opposed to being null-terminated like in C.
Removed bug workaround:
I added namespaces. Plan is to make the import
keyword return a namespace macro. So unfortunately this ugly syntax is probably gonna get used a lot
- use: promotes everything within namespace
- use_some:
use
all that match first regex except those that match second - include: load a file as a namespace (later changed to require)
- rec: actually does stuff now and compiler trusts it (img 3 is when forgotten) :D
Images: demo mostly related
Support was added for non-tail-recursive macro recursion
So I haven't posted on lang here in a while. Previously it only supported tail-recursive functions but that's obviously pretty useless so I reverted that and now it only sometimes does TCO. It works by moving the recursive parts into helper functions.
Image 1: Notice that the compiler passes lexically scoped variables to the recursive helper function as parameters so that they stay in scope
Image 2: it now recognizes and evaluates recursive functions where all the arguments are constants at compile time. Also a more realistic example without using binaryen's optimizer which removes some excess variables
How it works:
- it moves relevant lexically scoped variables/parameters to parameters of the helper function (bad memory scaling but probably best option)
- Compiler recognizes and evaluates recursive constant expressions at compile time
I also added the defer
keyword to get around the constant propagation of part 2 in previous post in case someone was doing a really big calculation or something and intentionally wanted it to happen at runtime with the faster speeds of WASM... kinda niche.
I moved a bunch of internal compiler features to the standard library by adding tools for inline assembly.
Unfortunately these changes dramatically increased compile-time. I managed to determine that most of the problem was caused by the == operator which was overloaded to operate on a variety of datatypes as well as types themselves... This polymorphic behavior resulted in slower builds. Also writing the compiler in TypeScript/JS doesn't help.
Directly reading from and writing to linear memory isn't usually considered best practice by functional programmers but I wanted to make some demos where this would be useful.
So I wanted a really niche compiler macro that gave me some relevant data for debugging so I just added one which runs provided js code using eval().
2021.9.17 - Breakout/single player pong - https://github.com/dvtate/breakout
2021.10.2 - Flappy bird clone demo - https://github.com/dvtate/phlappy-bird
Planning a major syntax change in order to accomodate type signatures.
One issue is that not all values in the language have a type for example, type(I32)
is undefined but it's convenient being able to pass these to macros for metaprogramming (and standard library relies on this). Also only way to give case/fun types is w/ wrappers (ie - (()(I32): act )
).
Another example which actually shows how it can reduce boilerplate and even eliminate some of the usage of the metaprogramming features described above
(syntactic sugar for multiple assignment was the culprit)
This replaced the older pack
operator
Plan is to improve compile-times (by making identifiers not capture their scope) but will have to refactor how namespaces and modules work (pretty sure I was drunk when I designed the first one) but anyway I think it should be better
Plan: Before -> After
To clarify what I'm changing
10 $a =
5 { $a = { $a a } } @ @
# a - both say 5
:data
# $a - live: 5, with changes: 10
unescape :data
For whatever reason I thought that it would be intuitive to make escaped ids use the scope they were made in... but it's espensive and not useful
Added some syntactic sugar and a bunch of breaking changes. Also managed to reduce compile time by around 60%. I'm now updating all my demos so that they work again
I've just finished implementing the syntax change and updating the standard library and most of my demos. Compile time somehow decreased by half again (so now it's ~80% faster than before syntax change).
So I guess it's time for me to start moving towards a GC... sigh, this brings me so much pain to do... Currently the language is limited such that there's no need to have a GC (ie - everything statically allocated), but that's gonnna have to change it seems :/ Idk, just not super happy with this... Even tho it's basically just a more explicit version of what haskell does
Currently trying to think about all the different things I'm gonna need to store on the heap since planning out implementation for a moving GC was hurting my head
The issue with doing a moving GC in WASM is that part of the process is updating all the references to the objects stored in the nursery but WASM doesn't let you access values currently on the stack so the only real solution is to use a separate stack for refs or custom virt mem
Ok so I'm choosing to do the 'separate stack' approach. I now have like 75% of a plan for how to implement the runtime for the GC
I think there's enough work involved with implementing all the associated features that the compiler could double in size https://github.com/dvtate/postfix-haskell/blob/master/planning/implementation/lm.md
Wrote the GC runtime in WASM by hand (see link) now I just have to test it and then I'll have to figure out how to implement recursive types https://github.com/dvtate/postfix-haskell/blob/master/lib/rt.wat
Everything's working :D
Some general categories of bugs encountered:
- endianness
- forgetting to * sizeof(i32)
- doing math for i64 instead of i32
- forgettting to i32.load
- general typos
- etc.
I wrote a paper regarding this language for a class on programming language theory. https://github.com/dvtate/PLT-coursework/blob/master/project/CS595_final_project.pdf
The type-system is now expanded with types for the compile-time only constructs in image 2 albiet with limited contexts
With these changes I've removed 3 keywords and improved compile-time ~30%
Minor quality of life improvement for debugging types and stuff
Working on enums now, using the rust version of the word since it's shorter than the more accurate "tagged-union" and the language already uses a lot of words incorrectly altho maybe will rename them to "variant" eventually.
The code on lines 17-26 is hard so for now I'm just adding match
operator.
There are 3 parts to this:
- Convert Source to IR (ie: type system): not done but also not problematic
- Convert IR to WASM: Wow this is gonna have to be a pretty major refactor wtf
- Runtime (ie: GC): mostly planned, tested, and documented
This change would eliminate the system with two deteached stacks at the cost of some increased compiler complexity and potentially minor performance hits. It would definitely be more intutive at least and probably better memory efficiency.
Thinking about making this syntax change. It would allow recursion without binding to an identifier makes it easier to parse recursive enum types and improve compile time. Currently for classes and enums it generates the type every time it gets used. So lots of room for improvement.
Made a better demo and by that I of course mean I stole the idea from Haskell's understanding monads wikibook (image 1).
Also went on a bit of a tangent implementing math functions in the standard library (images 2-4).
Recursive datatypes working at compile-time. Might have runtime demo later tonight.
The weird macro syntax on line 12 will probably be removed/optional eventually
Still unsure about making a dedicated list syntax to hide this ugliness like haskell does
Also a runtime demo:
The generated code has a lot of room for optimizations as a result of how I've built the compiler. Thankfully there's a lot of cases where [A] negates [B] so eventually could probably add another compilation step
Also TCO didn't understand the match
keyword at the time.
Another idea I had was to add a form of scoping for locals. This way we can recycle variables thus reducing the amount of memory used and importantly for references it would allow objects to be free'd as soon as they're no longer needed.
I could emulate Haskell's String type by doing:
I32 $Char =
I32 List $String =
But below I explain why I probably won't make that the default string implementation.
Because in my lang, each node in a linked list has 128 bits of overhead (in addition to whatever actual data is being stored, chars would be represented as i32s), I'm thinking that the default string implementation should not be a linked list
Haskell has similar memory usage here (potentially worse, as they didn't include GC metadata)... But seems it's common to use different string implementations in Haskell due to this so plan is to just make equivalent to Data.Text the default
Made the list demo more interesting, also I started towards improving the memory efficiency by making some temporary locals generated by the compiler automatically removed so that they don't waste as much space since I noticed that wasm optimizers aren't able to do this by themselves
I don't forsee this language fulfilling my original goals (ie - bad fit for target use case) but it's so far along at this point that I kinda want to just keep working on it so I can do cooler demos. But would be nice to have a new long-term side-project.
I started on a big demo in this language -- a raytracer. But I never was able to get it working and I honestly doubt that I ever will. This project was very fulfilling for me to work on. I learned a lot and wish I had a similar project in my life now that would challenge me to think and innovate. I leave this devlog archive as a way for people to look back at how the language came to be.
Please read this related blog post about the language: https://blog.dvtt.net/posts/ph.2022.5.6