Skip to content

Instantly share code, notes, and snippets.

@Kroc
Last active August 7, 2023 21:19
Show Gist options
  • Save Kroc/62fd60dda68f667e0e4d94c9e08bf2af to your computer and use it in GitHub Desktop.
Save Kroc/62fd60dda68f667e0e4d94c9e08bf2af to your computer and use it in GitHub Desktop.
Pling!

Pling!

A nice looking functional, concatenative programming language for minimal systems.
By Kroc Camen.

  • Simple left-to-right push-down parser. No look-ahead!
    Intended for assembling & execution on very constrained systems
  • Easy to tokenise: anything separated by spaces is a "function"

Comments:

# a comment begins with a hash mark and a space
# -- the space is required because the hash mark is a
# function that reads the rest of the line and discards it

Constants:

A constant is a function that always returns the same value.

let true    1
let false   0

Variables:

A variable is a function that returns its current value.
It must be defined ahead of time and must have a default value.

var is_the_queen_dead false

A variable's value is changed with the set function.

set is_the_queen_dead true

Numbers:

let DECIMAL     0
let HEXADECIMAL $01
let BINARY      %00000001
let FLOAT       1.0

Arithmetic:

Arithmetic is done purely left-to right, there is no operator precedence. The result of an infix calculation (e.g. 4 + 5) is totalled before proceeding to the next operator (e.g. * 3). This behaviour is intentional for simplicity of parsing, particularly by 8-bit CPUs and almost entirely does away with the need to nest parentheses.

Since there is no look-ahead, the brackets are required to indicate an expression that must be evaluated to produce the result -- an expression can be thought of a small inline function that produces a value.

set number ( 4 + 5 * 3 )
set number ( number + 1 )

An expression is the only place operators may be used.

Operators:

let ADD             ( 1 + 1 )
let SUBTRACT        ( 1 - 1 )
let MULTIPLY        ( 1 * 1 )
let DIVIDE          ( 1 / 1 )
let EXPONENT        ( 2 ^ 1 )
let MODULO          ( 10 % 2 )
let LOGICAL_OR      ( 0 or 1 )
let LOGICAL_AND     ( 1 and 1 )
let BINARY_OR       ( 0 | 0 )
let BINARY_AND      ( 0 & 0 )
let BINARY_XOR      ( 0 ~ 0 )
let EQUAL           ( 0 = 0 )
let NOT_EQUAL       ( 0 != 0 )
let LESS            ( 0 < 0 )
let GREATER         ( 0 > 0 )
let LESS_EQUAL      ( 0 <= 0 )
let GREATER_EQUAL   ( 0 >= 0 )

Strings:

let greeting "Hello, World!"

Functions:

A lambda is a fixed, immutable list of values. A "value" is a number, a string, an expression, a function, other lambdas, and any other types.

Lambdas begin with : and end with ;.

: cat sit mat … ;

Any functions calls or expressions in the lambda are not evaluated until execution; the expressions are stored in the lambda in a frozen, uncalculated state.

A function is a lambda with a name.
A function is defined by a name and a lambda of instructions:

fn three :
    ( 1 + 2 )
;

Functions do not define a parameter list up-front, instead they take their parameters from the instruction stream when desired using the get function. Therefore a function could read a different number of parameters depending on what's read!

fn add :
    get first
    get second

    ( first + second )
;
echo add 1 2                # prints "3"

Note how functions return values by evaluation. So long as a value is not being used as a parameter to a function call, it is returned from the function.

Local variables (and constants) can be defined within functions and exist only within the function scope. The get function acts the same as var, defining the variable, but also retrieving the parameter at the same time.

fn add :
    get first
    get second
    var third ( first + second )
    
    third
;
echo add 1 2                # prints "3"

Conditionals:

An if block takes any value, including an expression, and a lambda to execute if the value resolves to true.

fn max :
    get first
    get second
    if ( first > second ) :
        first
        exit                # exits a function early
    ;
    second
;

For if-then-else constructs, the function if-else takes a value and two lambdas, the first is executed if the value resolves to true and the second is executed otherwise.

fn max :
    get first
    get second
    if-else ( first > second ) :
        first
    ; :
        second
    ;
;

The true & false parameters do not need to be lambdas, they can be function calls or even values to return:

fn min :
    get first
    get second
    if-else ( first > second ) first second
;

TODO: switch / match

Loops:

while ( expression ) :
    ⋮
;

do :
    ⋮
    exit
;

TODO: for loops

Lists:

Lists are dynamically generated and managed lists of values.
If lambdas are functions as constants then lists are functions as variables.

A list is defined by square brackets, either closed for an empty list, or containing a number of default values.

var empty_list []
var three_list [ 1 2 3 ]

Unlike lambdas, expressions and function calls will be evaluated when defining the list. A list can be thought of as a function that allocates memory for a list and then begins populating the list with each value it comes across.

Functions for manipulating lists exist, but these are library functions rather than intrinsic syntax so I won't got into detail here.

count list                  # return number of values in list
first list                  # return first value in list
last list                   # return last value in list
push list value             # add value to end of list
pop list                    # remove (+return) last value in list
prepend list value          # add value to start of list
insert list index value     # insert value at index
replace list index value    # replace value at index
remove list index           # remove value at index
join list list              # join two lists together as one
slice list index length     # slice list starting at index

Indexing:

Accessing an index of a list is done with the @ operator.

var numbers [ 1 2 3 ];
echo ( numbers @ 2 )        # prints "2"

The Data Stack:

Up to this point we've been avoiding an important implementation detail that makes Pling! different; it has an implicit data stack.

This means that, as well as parameters, functions can work on data that is pushed to and popped from a data stack. Unlike parameters, this data persists as we move across functions. This allows Pling! to work with both static and dynamic data types.

The data stack is always separate from the function return stack and any other implementation-specific stacks.

Values returned by functions are being pushed on to the data stack, ergo a function can return more than one value:

fn potatoes :
    1
    2
;

When we call a function with a parameter, such as echo, what we are really saying is that echo will print the result on top of the stack of what the following value / function evaluates to.

echo sir_lancelots_favourite_colour

! + ? + .

Pling! is so named because an exclamation mark (also known as a "bang" or "pling") is a function that pops the top item off the stack instead of pushing something new on. It can be used as a replacement for parameters!

1                       # push the value "1" on to the data stack
echo !                  # pop a value off the data stack and print it

Each value on the stack is opaque. It's important to understand that if you push a list on to the data stack, you will pop the entire list, not each item one-by-one:

[ 1 2 3 ]               # push a list on to the stack
echo !                  # prints "[ 1 2 3 ]"!

You can temporarily move the data pointer into the list using a with block:

4                       # note how the stack is first-in, last-out
[ 1 2 3 ]               # this will be on top of the stack
with ! :
    echo !              # prints "1"
    echo !              # prints "2"
;
echo !                  # prints "4"

You can also take a list and iterate over it. The each function takes a list as a parameter (or, with !, the stack) and calls a function / lambda for each value in the list, automatically pointing the data parameter at the popped value.

[ 1 2 3 ]
each ! : echo ! ;           # prints "1", "2", "3"

If a list is nested however, we don't automatically get recursion:

[ 1 2 [ 3 4 ]]
each ! : echo ! ;           # prints "1", "2", "[ 3 4 ]"

The map function calls a function for each value in a list and will handle the recursion for us. Note how we can also do away with the lambda since the map function takes a function name as a 2nd parameter.

[ 1 2 [ 3 4 ]]
map ! echo                  # prints "1", "2", "3", "4"

The ? function 'peeks' the stack value, but does not pop it. You can use this when you want to get the value atop the stack, but don't want to remove it.

[ 1 2 3 4 ]
map ? echo                  # prints "1", "2", "3", "4"
echo count ?                # prints 4

The . function throws away (or "drops") the value atop the stack.
Use this when you need to level the stack for parameters you don't use.

1 2 3 4                     # 4 items on stack, not a list
. . .                       # drop three items
echo !                      # prints "1"

Data Types:

In Forth it's easy to make mistakes where you put one value on the stack but you accidentally read it back and treat it as something it's not. Forth's lack of a type system exposes its unforgiving nature for beginners, or just feeling your way through a problem. Most modern Forth-like languages therefore include a type system.

Everything in Pling! is a list of values.

Values can be of any type:

  • A number
  • A string
  • An expression, a kind of list specific to operators
  • A lambda -- a statically assembled list
  • A list -- a dynamically allocated list
  • A function name
  • A structure

We've covered all but the last type in some way or another thus far.

A data type can be thought of as a class in other programming languages. Each data type has to have methods for printing and for pushing and popping from the stack.

In Pling!, data types are the lowest-level primitives that are typically implemented in machine code. How the data is stored and retrieved is highly machine-specific, however Pling! programs don't ever deal with the implementation details directly.

The type of a value is bound to it. If you push a number to the data-stack you can not read it back as a function name. Whatever is pushed will always pop as the same type as it was before.

Structures:

(in progress...)

Pling! Implementation Considerations

Here we will discuss an approach to implementing the Pling! language on a Z80 / eZ80 processor. Other processors would probably require vastly different approaches, particularly the 6502, but this document should give an idea on how the language is well designed for constrained systems.

The Z80 has a 16-bit address bus and is therefore limited to 64KB of RAM / address space. Z80 systems with > 64KB RAM have to use memory banking and these are not standardised in any way. The eZ80 is a modern Z80 with an optional extended addressing mode that allows for 24-bit addresses. Where implementation concerns differ for the two, this document will provide commentary on the different approaches expected.

This document is speculative and doesn't represent working code / design.

Source:

Pling! source code is represented as a list of Values.
Values are Pling!'s variant data type.

At the syntax level (regardless of implementation),
these are the Value Types provided by Pling!:

  • A function call echo / lambda : ... ;
  • A number, 0, $00, %00000000
  • A string " ... "
  • An expression ( ... )
  • A list [ ... ]
  • A struct { ... } (TODO)

Pling! source code is designed to be assembled into a compact, binary form with a single-pass, forward-only assembler without look-ahead; that is, any token can be identified without the need to know what the next token is.

It should be possible to assemble a file by reading one byte at a time from disk and without reading the entire file into RAM.

Like Forth, every token is separated by whitespace.
Any non-whitespace character is a valid character in a token.

: each word is a token 1 2 3 4 ;

There are two exceptions to this made in the name of programmer comfort and the fact that we are assembling code to be interpreted later rather than interpreting as we parse (i.e. Forth): Comments and Strings

  • Comments:

    Comments begin with a hash mark # and a space. The space is required after the hash mark so long as the end of the line doesn't immediately follow. Once the comment marker occurs, the rest of the line can be read in and skipped

  • Strings:

    Strings in Pling! look like strings in any other language rather than Forth's oddly disconnected strings ( " Space after opening quote mark!"), a concession to fit its simple design.

    When a token begins with a quote-mark, keep reading bytes until a closing quote-mark, this includes line-breaks! The whole string is taken as one token

    TODO: escape codes, cannot yet include quote-marks in strings.

Therefore, the tokeniser must skip leading whitespace until it comes upon a character and handle the special cases: comments and strings, before resorting to a standard token.

This EBNF grammar presents the tokeniser's view of the incoming bytes, but doesn't enforce program structure, for example pairing : with ;, that happens during parsing, although tokenising and parsing may be happening at the same time depending on implementation.

eol         = [ "\r" + ] , "\n" ;
spaces      = "\s" + ;                  (* space + tab, no eol      *)
whitespace  = ( spaces | eol ) + ;      (* spaces and newlines      *)
letter      = "\S" ;                    (* non-whitespace character *)

(* only comments distinguish end-of-line so Pling! source code
   is simply a stream of tokens and whitespace *)
source      = { [ whitespace ]          (* leading whitespace       *)
              , { token , whitespace }  (* separated by whitespace  *)
              , [ comment ]             (* optional comment         *)
              } ;

(* comments begin with # and run to the end of the line *)
comment     = "#" , spaces , { ? not-eol ? } , eol
            | "#" , eol ;

(* tokens are made from a group of any non-whitespace characters
   or the special case handling of numbers & strings *)
token       = ( number | string | letter + ) ;

(* TODO: we won't include floats initially *)
number      = integer | hexadecimal | binary ;
integer     = first-digit , { digits } ;
hexadecimal = "$" , hexit + ;
binary      = "%" , bit + ;

first-digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digits      = "0" | first-digit ;
hexit       = digits | "a" | "b" | "c" | "d" | "e" | "f"
                     | "A" | "B" | "C" | "D" | "E" | "F" ;
bit         = "0" | "1" ;

(* TODO: strings should support escape codes,
   no way to embed a quote mark in a string a.t.m. *)
string      = '"' , { ? any-character ? } , '"' ;

The Problem of Order

As each token is ingested from the source file, it is categorised according to its type and assembled into a binary representation.

As we assemble one list, we will often need to start another list and then return to where we were. Consider an if function-call that is followed by an expression and a lambda to execute when true; the lambda is not inline as there would have to be a means of skipping over it.

                       +---- - - -
                .----> | ...
                |      +---- - - -
                |      
+----+------+---+--+---- - - -
| if | expr | true |...
+----+--+---+------+---- - - -
        |
        |      +---- - - -
        '----> | ...
               +---- - - -

Therefore the assembler must be able to manage multiple on-going lists of unknown length.

You cannot simply begin a new list a little ways off in memory hoping that when you return to where you were, there's enough headroom to continue; you cannot know ahead of time how long a list will be and if you have to keep shifting lists in memory to make additional room the process will not only be slow but may run out of memory.

Every byte read must be handled and put somewhere as there is no way to go 'back' to it (the whole source file is not in RAM), therefore you cannot defer handling of a token until later.

Intermediate Representation

Pling!'s solution to indeterminate order is an intermediate representation using a linked-list of Value-types on a simple heap (i.e. no allocator).

A heap is a "pile" of data much like a stack where data is added to the top (or end, if we consider it horizontal) and only the top-most data can be removed.

           top of heap ->|
----+------+------+------+
... | data | data | data |  (free space)
----+------+------+------+----------------->

A heap is useful because it has no "gaps" (unused space) but there are severe restrictions -- only the top-most item can be expanded or contracted, and data can only be added or removed in-order. You cannot place two items on the heap and then remove the older (underneath) item first; all data is bound by scope.

The Tokeniser

As a token is read in, it is written to the top of the heap as a length-prefixed string.

----+------------------------+
... | str-len ¦ token-string |
----+------------------------+

The first character of the token is then looked at to determine the token-type:

Numbers

If the token is a number type, marked by a $, % or numerical digit then it is converted from a string into a number literal and the token string on the heap is overwritten with a 1 byte type field and then the number literal in however many bytes are required as given by the type:

----+------------------------+
... | str-len ¦ token-string |
----+------------------------+
    |<------ discarded ----- ^
    v
----+-------------------+
... | num-type ¦ number |
----+-------------------+

The datum of the Value has been captured but it is not the Value itself that can be executed. We now add a Value structure to the heap.

    |<---   datum   --->|<---          value struct          --->|
----+-------------------+----------------------------------------+
... | num-type ¦ number | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col |
----+-------------------+-----------+----------------------------+
    ^                               |
    '-------------------------------'

The Value structure includes a 1-byte type that indicates that this Value is a number. The data-pointer contains the address in the heap of the Value's data. The next-pointer will point to the next Value in the current list (when reached), and the row and column fields contain the line-number and column of text in the source file where the original Value occurs (for printing of error messages).

Strings

As a string is read in it is appended to the end of the heap like so; again with the 1-byte reserved at the head. The opening quote-mark is included as the first character of every token is always included, however the final quote-mark is excluded.

----+----------------------------------------------------------+
... | reserved ¦ " ¦ H ¦ e ¦ l ¦ l ¦ o ¦   ¦ w ¦ o ¦ r ¦ l ¦ d |
----+----------------------------------------------------------+

Strings can be much, much longer than other tokens, so once the entire string has been read in, the first two bytes of the string are replaced with the string-length. Note how this overwrites the opening quote-mark.

----+-----------------------------------------------------------+
... | string-length ¦ H ¦ e ¦ l ¦ l ¦ o ¦   ¦ w ¦ o ¦ r ¦ l ¦ d |
----+-----------------------------------------------------------+

Now that the string has been captured on the heap, the Value structure is appended.

    |<---  string  --->|<---          value struct          --->|
----+------------------+----------------------------------------+
... | str-len ¦ string | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col |
----+------------------+------------+---------------------------+
    ^                               |
    '-------------------------------'

Lambdas

Since a lambda is a list within a list, a Value-structure is placed on the heap that will point to the next Value where the lambda will be assembled (be aware that a number / string literal might be added to the heap, before the next Value-struct).

    |<---          value struct          --->|
----+----------------------------------------+---------------
... | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col | ... lambda ...
----+------------+---------------------------+---------------
    (lambda)     |                              ^
                 '------------------------------'

The location [in the heap] of the parent list is remembered for later and the lambda is assembled, Value by Value, as any other. When that list terminates, the location of the parent list is recalled and the next-pointer is updated to point to the top of the heap where the parent list may continue.

                            .------------------------------.
                            |                              v
----+-----------------------+----------------+----------+----
... | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col | (lambda) | ...
----+------------+---------------------------+----------+----
                 |                             ^
                 '-----------------------------'

It is the chaining of the data and next fields that allows lists to be built out of order and for lists-within-lists of any length and any depth of nesting, memory permitting, to be handled.

Function Calls

A function call is simply a link to another list (Lambda) that has already been defined previously.

                ----+----------------------------------------+----
                ... | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col | ...
                ----+------------+----------+----------------+----
                                 |          |                   ^
    .----------------------------'          '-------------------'
    v
----+----------------------------------------+----
... | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col | ...
----+-----------------------+----------------+----
                            |                   ^
                            '-------------------'

The language grammar is designed to not require forward-references, that is, calling a function before it's been defined. You must define a function before it can be called. This avoids having to do a second pass on the code to fix up forward-references and to minimise the amount of meta-data that needs to be held during assembly.

The heading below describes the process of mapping function names in string form from the source code to the existing assembled code.

The Dictionary

A dictionary is a lookup of function names to their position within the assembled code. It is named after Forth's dictionary (because it calls tokens "words").

The Pling! keyword fn defines a new function. fn is a special keyword that only the assembler understands and does not exist in the assembled code when running.

When the parser comes across the fn keyword, it reads the next token as the name of the function to use. A dictionary Value is created on top of the heap, pointing to the function name just captured.

    |< function-name >|<---        dictionary-entry        --->|
----+-----------------+----------------------------------------+
... | len ¦ f | o | o | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col |
----+-----------------+------------+---------------------------+
    ^                              |
    '------------------------------'

The type field is used for private flags. The data field points to the beginning of the function-name string. The next field will point to the next dictionary entry (when it occurs), not the beginning of the function lambda; the function lambda is assumed to immediately follow the dictionary entry:

    |<---        dictionary-entry        --->|< lambda >|
----+----------------------------------------+----------+
... | type ¦ data-ptr ¦ next-ptr ¦ row ¦ col |    ...   |
----+------------+----------+----------------+----------+
                 |          |
  (name) <-------'          '----> (next dictionary-entry)

The next-ptr field of the previous Value is not updated until after the function is defined and a non function-definition occurs. That is, the previous Value skips 'over' the function definition.

    .-----------------------------.
    |                             |
+---+---+---------------------+---v---+----
| value | dict-entry ¦ lambda | value | ...
+-------+---^----+------------+-------+----
            |    |
- - - ------'    '------> (dict-entry)

When a function name token is encountered, the chain of dictionary entries is followed to find the function.

A more efficient method of building the dictionary and searching it might be used in the future, however the current method allows building directly on the heap during tokenisation without having to place a dictionary structure somewhere in memory that may grow too large, or the heap might run into.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment