Skip to content

Instantly share code, notes, and snippets.

View kprotty's full-sized avatar

protty kprotty

View GitHub Profile

Currently, the std.Io interface isn't too optimizable:

  1. All forms of concurrent execution goes through io.async/asynConcurrent(f) and io.await which can't be statically aware of when .await will be called, so it must dynamically allocate the context needed to run f.
  2. The main api for blocking/unblocking on an arbitrary state is through Mutex & Condvar which require locking a mutex to wait (& wake correctly) + restrict implementors to only a usize with a biased representation for each state.
  3. It ties cancellation to the task/concurrency model instead of to blocking operations (which are really the ones getting cancelled); To cancel a set of operations, they must be wrapped in a new spawned Future. It's also unclear, due to the racy nature of io.cancel, if a blocking operation consumes the stored cancellation request, or if it persists & causes all future blocking ops in that Future to return Cancelled.

I've thought of some ideas on how to address these + the all-encompassing nature of th

  1. Every atomic object has a timeline (TL) of writes:

    • A write is either a store or a read-modify-write (RMW): it read latest write & pushed new one.
    • A write is either tagged Relaxed, Release, or SeqCst.
    • A read observes some write on the timeline:
      • On the same thread, future reads can't go backwards on the timeline.
      • A read is either tagged Relaxed, Acquire, or SeqCst.
      • RMWs can also be tagged Acquire (or AcqRel). If so, the Acquire refers to the "read" portion of "RMW".
  2. Each thread has its own view of the world:

  • Shared write timelines but each thread could be reading at different points.
@Rexicon226
Rexicon226 / benchmark.zig
Last active June 17, 2024 14:01
optimized memeql benchmark
// zig build-exe benchmark.zig -OReleaseFast -lc
// ./benchmark 4096
const std = @import("std");
const allocator = std.heap.c_allocator;
const iterations_per_byte = 1000;
const warmup_iterations = 10;
comptime {
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer std.debug.assert(gpa.deinit());
const allocator = &gpa.allocator;
const num_threads = std.math.max(1, (std.Thread.getCpuCount() catch 1) / 2);
const epolls = try allocator.alloc(std.os.fd_t, num_threads);
defer allocator.free(epolls);