Instead of things like var ErrFoo = errors.New("foo")
or return fmt.Errorf("foo: %d", n)
I would like a shorthand
syntax that allows to define a new error type.
type ErrFoo error{"foo"}
desugars to:
type ErrFoo struct {}
func (ErrFoo) Error() string { return "foo" }
and can be used as return ErrFoo{}
type ErrFoo error{
"foo (n=%d): %w"
n int
err error
}
desugars to:
type ErrFoo struct {
n int
err error
}
func (e ErrFoo) Error() string {
return fmt.Sprintf("foo (n=%d): %w", e.n, e.err)
}
func (e ErrFoo) Unwrap() error {
return e.err
}
and can be used as return ErrFoo{42, err}
.
It does not happen very frequently but when it does being able to reach out to a well integrated, readable, composable and efficient futures/promises package would be invaluable. This is especially true when you are building services with high fanouts in which your code is orchestrating a large number of subrequests, and some subrequests depend on the result of other subrequests.
This could even just take the shape of a broadcast channel that is implicitly closed after the first value is written to it. Receivers would block until when this happens, and then would all be unblocked and receive that value.
The current compiler heavily favors compilation speed over runtime performance of the generated code. This is often an acceptable tradeoff, but not always. When you are running large services having longer compiles in exchange for better efficiency of the generated code is often desirable.
When escape analysis can not prove that a value does not escape, it may still be possible to prove that unique ownership of the value can be explicitly handed over (e.g. if in function A we know we have unique ownership of value V and we hand it over to a closure C to be executed in a new goroutine, we can move V directly to the stack of the goroutine that will run C).
This could also be used to stack-allocate in the stack frame of the caller objects that the callee would normally allocate on the heap, and then return to the caller.
Update:
This was also discussed in https://mdempsky.notion.site/Dynamic-escape-analysis-76bbeecd3ac4440c88d0cb2f722aaf75. Some notes:
An unfortunate limitation though is that any pointers stored through another pointer must be retained. And similarly, any pointers loaded through another pointer must be borrowed. But escape analysis has similar limitations around pointer indirections, so maybe it's still net positive.
Maybe this could be partially worked around, at least coarsely, by using more than 1 bit per pointer (e.g. a tristate not-owned/owned/owned-transitively, or a quadstate for not-owned/owned/owned-transitively-one-layer/owned-transitively; or maybe even one bool per pointer/reference field in the struct).
Note that in Go, struct fields and array indices are addressable, so Perceus-style reference counting code would need to call runtime.findObject to find the reference count for an arbitrary pointer. I expect this would be too slow for the GC savings to be a win, but it could still be worth experimenting with and quantifying.
This slowdown could possibly be alleviated by specializing+inlining findObject
? Another thing that could help is using one more bool to signal whether the pointer already points to the start of the allocation (in which case we should be able to skip the call to findObject) or not.
- Aggressive inlining of hot functions
- Partial function inlining (hot path)
Move cold code away from hot code
Including speculative devirtualization
Merge identical tails of machine basic blocks (ending in unconditional jumps/returns).
If the caller guarantees that there is enough stack space for the callee, the call target should be directly the instruction following the callee function prologue.
For things like
var s []*T
for i := range x {
s = append(s, &T{ /* ... */ })
}
The compiler could notice that in this case:
- The length of
s
will belen(x)
, and therefore could replacevar s []*T
withvar s = make([]*T, 0, len(x))
- The loop will allocate
len(x)
T
values, so it could perform a batch allocation oflen(x)
individualT
values (note: not a "slice ofT
s" as that would prevent individualT
values from being GCed individually) and then use those batch allocations for the&T{ /* ... */}
.
This would reduce the number of allocations from log(len(x))+len(x)
to 2.
Similarly to what is done for goroutine initial stack sizes, that are chosen dynamically depending on workload to minimize the number of stack growth operations and memory usage, the runtime could do the same in more cases, e.g. when map or slices are first allocated or are growing: if slices/maps that are allocated (or need to grow) at a certain code location often are grown again before being collected, then it would be preferable if map/slices allocated at that code location were overallocated (e.g instead of doubling in size, allocate directly the most likely final capacity of that map/slice).
If the compiler itself was usable as a library (e.g. as in the case of LLVM) this would open up the way to new tooling, including potentially the ability to run it as a JIT (e.g. to make use of CPU features detected at runtime, to perform PGO at runtime, and/or to avoid interpretation overhead)
If the compiler supported pure AST->AST macros, explicitly import
ed using the import
/go.mod
machinery, it would be possible for users to safely extend the language and potentially remove a lot of repetition.
The go PLS could be extended to allow users to visualize/debug what each macro does.
Currently most file I/O blocks a OS thread. Moving file I/O to io_uring or other nonblocking mechanisms would avoid high thread counts when performing lots of file/disk I/O.