Skip to content

Instantly share code, notes, and snippets.

@dvtate
Last active August 1, 2023 21:37
Show Gist options
  • Save dvtate/095fff401517bc52f5718a67f39e520d to your computer and use it in GitHub Desktop.
Save dvtate/095fff401517bc52f5718a67f39e520d to your computer and use it in GitHub Desktop.
Postfix Haskell Devlog Archive

Postfix Haskell Devlog

Here is a timeline of development for a programming language I worked on 2020-2023

2020.10.7 - Idea and PoC

This is a POC for first ~1/3 of compiler for new language I'm working on... This demo is just a shell for part of compiler that handles:

  • Lexing
  • Parsing (weird bc lang is postfix)
  • Some optimizations
  • Majority of type-checking and validation

image

2020.10.19 - Types Prose

I wrote some prose while planning out the type system. This example is smilar to using named-tuple in Python but methods/getters are defined as overloaded operators.

image

(typo: should be seq .value and seq .count, not $seq )

Here's a more complex example with 'inheritance'/parametric types. Not sure if everything here is doable, but definitely a goal. Also i don't like having to do _t suffix... feel like there should be a way to do it as prefix instead, maybe capital letters?

image

Forgetting to escape/unescape identifiers is going to be a running theme for me with this language. It's too similar to my previous language -- YodaScript which almost always uses escaped identifiers. Line 25 should be $sequence_t.

2020.11.9 - More Types Prose

I made some changes to my thinking for the typesystem. I know that I've definitely added some complexity but I think this design improves the consistency and adds some more useful tools instead of having types feeling out of place.

image image So I guess there are two distinct concepts

Types: Let us a way to check what data is/can be held by the value Classes: let us distinguish data of the same type but associated with a different purpose/functionality

So with this in mind, classes would be replaced with 'traits' which would just be numerical flags that you can add to a value to assign functionality implemented by checking to see if the argument has the given 'trait'. Some downsides but simplifies language a bit

So here's what that would look like, although I think this gives the user a more power and does a better job of exposing what's actually going on, there's some lost safety and way more boilerplate code so probably going to use plan from first post in thread image

2020.11.16 - TypeSystem Prototype

I made a demo over the weekend which adds types to the language.

image

2020.11.22 - More TypeSystem Implementation

Most of original plan implemented.

image

2020.11.30 - Starting on Standard library

Logical provided by standard library. To clarify, the compiler does provide logic out of the box but the operators are user-level. Also again, sorry the language isn't easy to read... really picked the worst of both worlds for that by making it stack-based and functional in addition of the other weird stuff image image

2020.12.8 - VSCode Syntax Highlighting Plugin

image

2020.12.13 - TypeSystem Demo

image image

2020.12.7 - Recursion

Recursion was a big painpoint... I came up with a way to implement it while taking a university exam but implementation was not so straightforward. image image

2020.12.22 - Terrible Namespace Prose

image image

2020.12.28 - Copiling to WebAssembly

image image

2021.1.25 - Using binaryen to clean up WebAssembly

image

2021.1.29 - Inline PH within JS/TS

The following demo shows the language getting used within a javascript program. It compiles a webassembly module from the PH code given in a string template literal.

image

Using short circuit logic without good optimizations led to some ugly WebAssembly being generated.

image

2021.2.7 - Binaryen optimizer

image

2021.3.13 - RECURSION

Recursion type inference algorithm:

  1. try to compile function as if non-recursive
  2. detect it's recursive, throw and start over
  3. trace to infer func type
  4. trace again, handling bindings for recursive call

and because there are no type annotations, we cannot recycle the inferred types. So every single recursive function invocation requires 3 attempts to compile it in order to determine all the context-sensitive datatypes and stuff

ocaml has a "rec" specifier for functions, if I were to adopt something this it would have made things slightly better (only 2 traces and less complexity), but I've already kinda done it the hard way so I prob should just finish it as is without doing weird stuff with syntax

image image

This only supported tail recursion.

2021.5.2 - String Literals

Some thinking was done to plan out the optimal way to pack the data for string literals into linear memory in order to reduce duplication. Strings are implemented as a pointer + an length as opposed to being null-terminated like in C.

image image

2021.5.5 - Host environment bindings

image

2021.5.7 - Hello World demo

image

2021.5.9 - Fizzbuzz demo

image image

Removed bug workaround:

image

2021.7 - Namespaces, modules

image I added namespaces. Plan is to make the import keyword return a namespace macro. So unfortunately this ugly syntax is probably gonna get used a lot

  • use: promotes everything within namespace
  • use_some: use all that match first regex except those that match second
  • include: load a file as a namespace (later changed to require)
  • rec: actually does stuff now and compiler trusts it (img 3 is when forgotten) :D

Images: demo mostly related

image image image image

2021.7.22 - Polymorphic JS imports

image image

2021.8.26 - Non-TCO recursion

Support was added for non-tail-recursive macro recursion

image

So I haven't posted on lang here in a while. Previously it only supported tail-recursive functions but that's obviously pretty useless so I reverted that and now it only sometimes does TCO. It works by moving the recursive parts into helper functions.

gnome-shell-screenshot-0X8N80

Image 1: Notice that the compiler passes lexically scoped variables to the recursive helper function as parameters so that they stay in scope

gnome-shell-screenshot-M7HO80

Image 2: it now recognizes and evaluates recursive functions where all the arguments are constants at compile time. Also a more realistic example without using binaryen's optimizer which removes some excess variables

How it works:

  1. it moves relevant lexically scoped variables/parameters to parameters of the helper function (bad memory scaling but probably best option)
  2. Compiler recognizes and evaluates recursive constant expressions at compile time

I also added the defer keyword to get around the constant propagation of part 2 in previous post in case someone was doing a really big calculation or something and intentionally wanted it to happen at runtime with the faster speeds of WASM... kinda niche.

image image

2021.9.7 - Moving Features to Standard Library using Inline Assembly

I moved a bunch of internal compiler features to the standard library by adding tools for inline assembly.

image image

Unfortunately these changes dramatically increased compile-time. I managed to determine that most of the problem was caused by the == operator which was overloaded to operate on a variety of datatypes as well as types themselves... This polymorphic behavior resulted in slower builds. Also writing the compiler in TypeScript/JS doesn't help.

2021.9.13 - Hacking direct memory access

Directly reading from and writing to linear memory isn't usually considered best practice by functional programmers but I wanted to make some demos where this would be useful.

image

2021.9.16 - Custom Compiler Macros

So I wanted a really niche compiler macro that gave me some relevant data for debugging so I just added one which runs provided js code using eval().

image

2021.9.23 - Using direct memory access to make art

image image image image image image

2021.9.17 - Breakout/single player pong - https://github.com/dvtate/breakout

Screenshot from 2023-08-01 15-20-30

2021.10.2 - Flappy bird clone demo - https://github.com/dvtate/phlappy-bird

image image image

2021.10.14 - Significant Syntax Change Planning

Planning a major syntax change in order to accomodate type signatures.

image

One issue is that not all values in the language have a type for example, type(I32) is undefined but it's convenient being able to pass these to macros for metaprogramming (and standard library relies on this). Also only way to give case/fun types is w/ wrappers (ie - (()(I32): act )).

Another example which actually shows how it can reduce boilerplate and even eliminate some of the usage of the metaprogramming features described above

image

2021.10.18 - Identified source of slow compile time

(syntactic sugar for multiple assignment was the culprit)

2021.10.19 - Added dedicated tuple syntax (...)

This replaced the older pack operator

image

2021.10.19 - Improving Compile Times

Plan is to improve compile-times (by making identifiers not capture their scope) but will have to refactor how namespaces and modules work (pretty sure I was drunk when I designed the first one) but anyway I think it should be better

Plan: Before -> After

image image

To clarify what I'm changing

10 $a =
5  { $a = { $a a } } @ @
# a - both say 5
:data
# $a - live: 5, with changes: 10
unescape :data

For whatever reason I thought that it would be intuitive to make escaped ids use the scope they were made in... but it's espensive and not useful

Added some syntactic sugar and a bunch of breaking changes. Also managed to reduce compile time by around 60%. I'm now updating all my demos so that they work again

image

2021.10.23 - Syntax Change done

I've just finished implementing the syntax change and updating the standard library and most of my demos. Compile time somehow decreased by half again (so now it's ~80% faster than before syntax change).

image image

2021.11.8 - GC

So I guess it's time for me to start moving towards a GC... sigh, this brings me so much pain to do... Currently the language is limited such that there's no need to have a GC (ie - everything statically allocated), but that's gonnna have to change it seems :/ Idk, just not super happy with this... Even tho it's basically just a more explicit version of what haskell does

Currently trying to think about all the different things I'm gonna need to store on the heap since planning out implementation for a moving GC was hurting my head

image

The issue with doing a moving GC in WASM is that part of the process is updating all the references to the objects stored in the nursery but WASM doesn't let you access values currently on the stack so the only real solution is to use a separate stack for refs or custom virt mem

Ok so I'm choosing to do the 'separate stack' approach. I now have like 75% of a plan for how to implement the runtime for the GC

I think there's enough work involved with implementing all the associated features that the compiler could double in size https://github.com/dvtate/postfix-haskell/blob/master/planning/implementation/lm.md

Wrote the GC runtime in WASM by hand (see link) now I just have to test it and then I'll have to figure out how to implement recursive types https://github.com/dvtate/postfix-haskell/blob/master/lib/rt.wat

Everything's working :D

Some general categories of bugs encountered:

  • endianness
  • forgetting to * sizeof(i32)
  • doing math for i64 instead of i32
  • forgettting to i32.load
  • general typos
  • etc.

image image

2021.12.13 - Mechanization

I wrote a paper regarding this language for a class on programming language theory. https://github.com/dvtate/PLT-coursework/blob/master/project/CS595_final_project.pdf

2022.6.20 - Compile-time only types

The type-system is now expanded with types for the compile-time only constructs in image 2 albiet with limited contexts

With these changes I've removed 3 keywords and improved compile-time ~30%

image image

2022.6.22 - Pretty printing type

Minor quality of life improvement for debugging types and stuff image

2022.6.23 - Enums

Working on enums now, using the rust version of the word since it's shorter than the more accurate "tagged-union" and the language already uses a lot of words incorrectly altho maybe will rename them to "variant" eventually.

The code on lines 17-26 is hard so for now I'm just adding match operator.

image

2022.6.25 - Big refactor

There are 3 parts to this:

  • Convert Source to IR (ie: type system): not done but also not problematic
  • Convert IR to WASM: Wow this is gonna have to be a pretty major refactor wtf
  • Runtime (ie: GC): mostly planned, tested, and documented

image

2022.6.26 - Thinking about improved memory usage

This change would eliminate the system with two deteached stacks at the cost of some increased compiler complexity and potentially minor performance hits. It would definitely be more intutive at least and probably better memory efficiency.

image

2022.6.26 - Fixed-point combinator macro/type syntax

Thinking about making this syntax change. It would allow recursion without binding to an identifier makes it easier to parse recursive enum types and improve compile time. Currently for classes and enums it generates the type every time it gets used. So lots of room for improvement.

image

2022.7.19 - Enums Demo

Made a better demo and by that I of course mean I stole the idea from Haskell's understanding monads wikibook (image 1).

Also went on a bit of a tangent implementing math functions in the standard library (images 2-4).

image image image image

2022.7.3 - Made my day

image

2022.7.28 - Recursive DataTypes

Recursive datatypes working at compile-time. Might have runtime demo later tonight.

The weird macro syntax on line 12 will probably be removed/optional eventually

Still unsure about making a dedicated list syntax to hide this ugliness like haskell does

image

Also a runtime demo:

image

2022.7.29 - Potential for optimizations

The generated code has a lot of room for optimizations as a result of how I've built the compiler. Thankfully there's a lot of cases where [A] negates [B] so eventually could probably add another compilation step

image image

Also TCO didn't understand the match keyword at the time.

Another idea I had was to add a form of scoping for locals. This way we can recycle variables thus reducing the amount of memory used and importantly for references it would allow objects to be free'd as soon as they're no longer needed.

image

2022.8.5 - String/list planning

I could emulate Haskell's String type by doing:

I32 $Char =
I32 List $String =

But below I explain why I probably won't make that the default string implementation.

Because in my lang, each node in a linked list has 128 bits of overhead (in addition to whatever actual data is being stored, chars would be represented as i32s), I'm thinking that the default string implementation should not be a linked list

image

Haskell has similar memory usage here (potentially worse, as they didn't include GC metadata)... But seems it's common to use different string implementations in Haskell due to this so plan is to just make equivalent to Data.Text the default

image image

2022.10.26 - List demo improvements

Made the list demo more interesting, also I started towards improving the memory efficiency by making some temporary locals generated by the compiler automatically removed so that they don't waste as much space since I noticed that wasm optimizers aren't able to do this by themselves image image image

2022.10.26 - Lost of optimism for project

I don't forsee this language fulfilling my original goals (ie - bad fit for target use case) but it's so far along at this point that I kinda want to just keep working on it so I can do cooler demos. But would be nice to have a new long-term side-project.

Screenshot from 2023-08-01 16-32-33

2023.8.1 - About this devlog archive

I started on a big demo in this language -- a raytracer. But I never was able to get it working and I honestly doubt that I ever will. This project was very fulfilling for me to work on. I learned a lot and wish I had a similar project in my life now that would challenge me to think and innovate. I leave this devlog archive as a way for people to look back at how the language came to be.

Please read this related blog post about the language: https://blog.dvtt.net/posts/ph.2022.5.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment