Last active
December 11, 2020 16:21
-
-
Save Qix-/584242819f59af05b57717a9ca86140e to your computer and use it in GitHub Desktop.
6 years later and I still need to sit down and make this programming language
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Self-contained example for writing to stdout and to a file. | |
# | |
# Relies only on the standard assembler module (std.asm), which | |
# cannot be stubbed out here (as they are intrinsics). std.asm | |
# allows direct, architecture-specific emission of machine code | |
# instructions and access to registers. | |
# | |
# Also relies on the @platform, @arch, @effect, and @force_inline | |
# intrinsics, which cannot be (reasonably) stubbed out here. | |
# | |
# - @platform: conditionally enable the immediate element | |
# based on the target platform (some identifiers | |
# can match for multiple, e.g. `posix` always | |
# enables when `linux` would too). | |
# | |
# - @arch: conditionally enable the immediate element | |
# based on CPU architecture. | |
# | |
# - @effect: specifies that the immediate element produces | |
# one or more side effects, the special `unknown` | |
# effect specifying that the caller must specify | |
# effect at the call site (meant for e.g. syscall | |
# wrappers, as demonstrated below). | |
# | |
# - @force_inline: force-"pastes" the body of the immediate block | |
# where the callsite occurs instead of invoking a | |
# function. | |
# | |
# Lastly, it relies on the amorphous `str` (string) class, which | |
# has yet to be specified. | |
# | |
# Apart from these intrinsics, no other compiler feature or | |
# piece of the standard library is used. | |
# | |
# A few notes about syntax to better understand the following | |
# code: | |
# | |
# - @foo is a directive, which will either attach information | |
# to the element before which it is placed, or will modify | |
# the AST directly at compile time (including, but not limited | |
# to, removing the AST node altogether). All directives must | |
# be immediately and statically solvable, and (at least for now) | |
# directive declarations cannot themselves have directives applied | |
# to them. | |
# | |
# Directives may take parameters - each being passed to the directive | |
# as an AST node (which, in turn, includes a set of tokens). This allows | |
# arbitrary identifier passing to the directive instead of having to declare | |
# values beforehand - thus solving the problem of "new platform X" not otherwise | |
# being supported by the compiler until the compiler is modified. | |
# | |
# - `error FOO` raises an error, where `FOO` is an arbitrary constant value, | |
# scoped directly to the function that emits the error. Further, an error constant | |
# can be pre-declared for use among many methods using `use error FOO`. | |
# | |
# Error constants are referred to by name; the underlying numeric value associated | |
# with them is not meant to be exposed to the programmer. A list of all possible | |
# error codes can be enumerated for each function via `some_function.errors`, which | |
# is of type `error[]` (`error` is a rare case of keyword, contextually behaving as | |
# a typename or an operator, depending on its usage - e.g. `use error` is treated | |
# as a special case of `use`. | |
# | |
# - To handle errors, there is the oneshot version using `<statement> else <statement|block>` | |
# and the block version using `try <block> else <block>`. | |
# | |
# Within an error handler (`else <statement|block>`), using the keyword `error` as a statement, | |
# by itself with no argument, re-throws the existing error. This means that a union of all | |
# possibly produced error constants is added to the current function, too. | |
# | |
# The language would be able to determine frontiers where specific (or all) error codes are | |
# handled and would not propagate upward, even if another codepath re-throws the error. | |
# Such errors are NOT included in the function's error set. | |
# | |
# Finally, `else assert` can be used to terminate the program (see notes about `assert` | |
# below). | |
# | |
# - `assert` is a keyword. Used by itself (with no argument), it is the equivalent of C's | |
# `abort()`. With either one or two arguments, it checks a `bool` condition and either | |
# continues (if true) or aborts (if false). An optional second argument can be given a | |
# string to be emitted when the assertion fails. | |
# | |
# `assert` has the added benefit of being selectively static, meaning that it can be | |
# used to assert traits about types (especially when type variables (see below) are used). | |
# The example uses this feature to specify valid types of generic functions; similar to | |
# C++ templates, in cases where two overloads of a generic function have the same signature | |
# and one of the types is a type variable, assertions can be used to specify traits about | |
# the type that the overload allows. At resolve time, the singular remaining (non-asserted) | |
# overload is used; zero remaining overloads yields a compiler error about no valid overloads | |
# for the specified types, and more-than-one remaining overloads yields a compiler error about | |
# ambiguous overloads for the specified types. | |
# | |
# - `$T` denotes a type variable called `T`. All usages of type variables aside from | |
# their declarations must retain the `$` prefix. Type variables can be used multiple | |
# times in a function declaration to imply that the types must match. | |
# | |
# As mentioned previously, `assert` can be used to further specify the characteristics | |
# of the type. For example, if a method should take two parameters, where the first | |
# parameter can be of any sized signed integer, and the second parameter must be the | |
# unsigned equivalent of the first (same size, just unsigned), then it would be | |
# expressed using an `assert` like so: | |
# | |
# fn foo(a $A, b $B) | |
# assert $A.is_signed | |
# assert $B == $A.unsigned | |
# | |
# More trivial examples include making sure the integer is of sufficient size | |
# (`assert $A.width >= 32`) or to enforce other compositions of the type - | |
# for example, checking that $B is an array of $A (`assert $B == $A[]`). | |
# | |
# - `switch` statements have no `case` keyword. Further, cases do NOT fall-through | |
# unless: | |
# | |
# - The case comes directly after another case (the prior case having no body) | |
# - The `continue` statement is used, which jumps to the next case statement. | |
# | |
# Further, `else` is used in lieu of a "default" case. | |
# | |
# - By default, all declarations are module-private. Export them for visibility | |
# outside the module using the `pub` qualifier. | |
# | |
# - `for` is the only looping mechanism. It can be broken out of using `break` | |
# and forcibly iterated using `continue`. There are a number of syntaxes | |
# it accepts for both iteration as well as control flow (e.g. what a `while` | |
# loop is classically for). These syntaxes have not been solidified yet, so | |
# a bit of creativity was used in this example. | |
# | |
# - `as` performs type conversion; there is no implicit coersion in this language. | |
# If an expression is not already type $T, but you want to treat it as such, | |
# `<expr> as $T` must be used. Further, a conversion must exist for the type; | |
# many conversions are intrinsic, and conversions to an aliased type are | |
# automatic. | |
# | |
# There is an example of a conversion function below for the custom `Ostream` | |
# type and `str`. This isn't particularly useful for the example in particular | |
# but I wanted to explore how it'd be achieved. | |
# | |
# `as` also has a special meaning when used with `use`, which creates a symbol | |
# alias (similar to C++'s `using`, though without the need to forward type | |
# variables). | |
# | |
# use str as my_string_class | |
# | |
# - Variable declarations (including parameters) are reversed to C; the identifier | |
# comes first, followed by the type. For example, `foo str` specifies a variable | |
# "foo" with type "str". | |
# | |
# - Variables are immutable by default. Their types can be marked mutable using a | |
# bang postfix - e.g. `foo str!`. For implicit declarations, the bang postfix comes | |
# directly after the identifier - e.g. `foo! = some_string`. | |
# | |
# - Values are passed by-value by default. References to values can be made using | |
# the amp postfix - e.g. `foo str&`. | |
# | |
# - Arrays are declared as either fixed size or lazily sized using the `[n]` and `[]` | |
# type postfixes, respectively. For example, a variable that is an array of 5 | |
# 32-bit unsigned integers would be declared as `foo u32[5]`. An array that | |
# takes the size of its immediate initialization value omits the size specifier - | |
# e.g. `foo u32[] = [ 1, 2, 3, 4, 5 ]`. | |
# | |
# - Combinations of arrays, references and bangs (mutable specifiers) can be used, | |
# though note that like most languages the order matters. | |
# | |
# - `str!&` is a reference to a mutable string; the reference cannot be re-assigned. | |
# - `str&!` is a reference to an immutable string; the reference can be re-assigned. | |
# - `str&` is a reference to an immutable string; the reference cannot be re-assigned. | |
# - `str!&!` is a reference to a mutable string; the reference can be re-assigned. | |
# | |
# The postfix notation was chosen as it is the most readable given the terseness of the | |
# language and the possible composed types - there are no prefix tokens to confuse the | |
# order of reading, and each successive token further specifies the existing type. | |
# This design was on purpose. | |
# | |
# As a case study: | |
# | |
# `str![]&!&` is a non-reassignable reference (&) to a re-assignable reference (&!) | |
# to an immutable array ([]) of mutable strings (str!). | |
# | |
# To allow the individual array elements to be re-assigned (e.g. `foo[2] = new_string`), | |
# the array itself would need to be marked as mutable: `str![]!&!&`. | |
# | |
# To disallow individual strings be modified, remove the bang from the `str`: `str[]&!&`. | |
# | |
# To re-assign the second-level reference (since the first-level reference is immutable), | |
# lower the reference and assign it: `&foo = some_array_of_strings`. | |
# | |
# - Mutable types can be demoted to immutable types; immutable types cannot be promoted to mutable | |
# types. This is one of the few cases of implicit type coersion. | |
# | |
# - (Not demonstrated in this example) blocks can be marked `@pure`, indicating that they are not | |
# allowed to emit side effects. Many functions are automatically detected as pure and are optimized | |
# as such, but the directive can be used to enforce this. | |
# | |
# - `self` is a special identifier reserved for the first argument of a function (optionally) - | |
# in which case, any expressions of `type(self)` can be dot-accessed to retrieve that function, | |
# automatically binding the L-value as the `self` parameter. This works similarly to Python's | |
# class methods, but the functions can be declared anywhere. | |
# | |
# type Foo | |
# fn do_bar(self Foo) | |
# sys.io.out.print(`hello bar\n`) | |
# fn main() | |
# f = Foo{} | |
# f.do_bar() # prints "hello bar" | |
# do_bar(f) # prints "hello bar" (equivalent) | |
# | |
# - `...` is a special token that can be used as the LAST parameter in a function | |
# declaration. It denotes a parameter pack - a statically determined pack of | |
# zero or more parameters of arbitrary types. | |
# | |
# The pack token can be used in many places, including in invocation call sites | |
# as parameters (seen in this example in the `print` function to forward parameters | |
# to `try_print`). | |
# | |
# Certain syntax involving `...` as a postfix operator are planned but not yet | |
# solidified. Such cases will behave similarly to C++'s parameter pack postfix | |
# operator within templates. | |
# | |
# - Along with `str`, a few other type aliases are automatically included into the root | |
# module scope: `isize`, `usize`, and `bool`. | |
# | |
# The first two are aliases to signed and unsigned integral types matching | |
# the size of the CPU's general purpose registers or address size (whatever | |
# is fitting and most performant for the platform/architecture in question). | |
# They behave similarly to C's `int`. | |
# | |
# `bool` is its own type equivalent to the statement `type bool u1`. The | |
# language constants `true` and `false` are defined as `false bool = 0u1` | |
# and `true bool = 1u1`. Since it is its own type, overloads can granularly | |
# overload both unsigned integral types and boolean types separately. | |
# | |
# - Overloads of similarly classed integral types (e.g. `u16` and `u32`) | |
# do NOT conflict - passing a value of type `u64` in this case would | |
# result in a compilation error. Using `as` to widen or narrow an integer | |
# is acceptable here, or a type variable + assertion can be used in cases | |
# where variable integral widths are allowed (especially useful in cases | |
# of serializers): | |
# | |
# fn write(n $T) | |
# assert $T.width <= 16 | |
# n16 = n as u16 | |
# fn write(n $T) | |
# assert $T.width > 16 and $T.width <= 32 | |
# n32 = n as u32 | |
@platform(posix) | |
fn errno(r $T) | |
assert $T.is_unsigned | |
switch r | |
1: error EPERM | |
2: error ENOENT | |
3: error ESRCH | |
4: error EINTER | |
# ... ad nauseum. | |
else | |
assert | |
@arch(amd64) | |
@platform(linux) | |
@effect(unknown) # force caller to specify effect | |
pub fn syscall(nr usize, ...) usize | |
use std.asm.amd64 as X | |
X.util.set_args(...) | |
X.syscall(nr) | |
if (X.rax as i64) < 0 | |
errno(-(X.rax as i64) as usize) | |
ret X.rax | |
@arch(x86) | |
@platform(linux) | |
@effect(unknown) # force caller to specify effect | |
pub fn syscall(nr usize, ...) usize | |
use std.asm.x86 as X | |
X.util.set_args(nr, ...) | |
X.int(0x80) | |
if (X.eax as i32) < 0 | |
errno(-(X.eax as i32) as usize) | |
ret X.eax | |
@platform(linux) | |
pub fn exit(status isize) | |
@arch(x86) nr = 1 | |
@arch(amd64) nr = 60 | |
@effect(terminate) syscall(nr) | |
# tell compiler we'll never reach here. | |
assert | |
@platform(linux) | |
pub fn open(filename &str, flags usize, mode usize) isize | |
@arch(x86) nr = 5 | |
@arch(amd64) nr = 2 | |
r = @effect(file_open) syscall(nr, filename.c_str(), flags, mode) | |
ret r as isize | |
@platform(linux) | |
pub fn write(fd isize, buf u8[]&, count usize) usize | |
@arch(x86) nr = 4 | |
@arch(amd64) nr = 1 | |
r = @effect(file_write) syscall(nr, fd, buf.addr, count) | |
ret r | |
@platform(linux) | |
pub fn close(fd isize) | |
@arch(x86) nr = 6 | |
@arch(amd64) nr = 3 | |
@effect(file_close) syscall(nr, fd) | |
type Ostream | |
@platform(linux) | |
fd isize | |
@platform(linux) | |
enum Flags | |
WRITE = 1 | |
CREATE = 64 | |
@platform(linux) | |
pub fn Ostream.open(filename str&, flags Ostream.Flags, mode usize) Ostream | |
fd = open(filename, 1 | flags, mode) else error | |
ret Ostream { fd } | |
@platform(linux) | |
pub fn try_print(self Ostream&, v $T) Ostream& | |
buf = to_string(v) | |
write(self.fd, buf, buf.length) | |
ret self | |
@force_inline | |
pub fn print(...) $T | |
# Given the use-case here that we should assume | |
# the success path, especially in CLI applications, | |
# we perform an assert on the success of try_print | |
# to keep code cleaner when using .print(). | |
# | |
# Applications that want to actually test for, and | |
# handle/recover from, a failure to write to the stream | |
# should call .try_print(). | |
ret try_print(...) else assert | |
pub fn to_string(v $T) str | |
assert $T.is_signed | |
ret [ | |
v < 0 then `-` else ``, | |
to_string(abs(v) as $T.unsigned) | |
].join() | |
pub fn to_string(v $T!) str | |
assert $T.is_unsigned | |
if v == 0 | |
ret `0` | |
digits = std.math.log10(v) | |
result str = str.with_capacity(digits) | |
for (digits-1)...0 as i | |
result[i] = 0x30 + (v % 10) | |
v //= 10 | |
assert v == 0 | |
ret result | |
@force_inline | |
pub fn to_string(v str&) str& | |
ret v | |
pub fn main() u8 | |
out = Ostream{ 1 } | |
file = Ostream.open(`/tmp/foo`, 0, 8x755) else | |
out.print(`failed to open file: {error}\n`) | |
ret 1 | |
defer file.close() else | |
out.print(`warning: failed to close file: {error}\n`) | |
out.print(`opened file /tmp/foo for writing\n`) else assert | |
for 0..10 as i | |
file.print(`loop iteration {i}\n`) else | |
out.print(`failed to write to file: {error}\n`) | |
ret 1 | |
out.print(`done!\n`) | |
ret 0 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment