Skip to content

Instantly share code, notes, and snippets.

@AshyIsMe
Last active May 18, 2024 08:42
Show Gist options
  • Save AshyIsMe/e3ff095a1f5f1698a76c1c0693cd4a2d to your computer and use it in GitHub Desktop.
Save AshyIsMe/e3ff095a1f5f1698a76c1c0693cd4a2d to your computer and use it in GitHub Desktop.
ideal computing environment

What kind of Computing environment do I want?

Dataframe Oriented Programming: https://csvbase.com/blog/1

Accessing up to date data quickly and easily (even on phone) and pulling decision making information out of it constantly should be effortless.

Something like the stories of the old APL mainframe environment.

  • Named tables that are always up to date.
  • Not needing to worry about ram or compute location relative to data etc.
  • Hierarchical but also tagged organisation of datasets.
  • Easy job scheduling (like systemd-timers)

What kind of Datasets?:

  • market data
  • commodities
  • currencies
  • commodity shipments
  • country productions
  • country investments (all the SAI global database stuff)
  • energy data: opennem.org.au, global
  • tables from Wikipedia
  • US bureau of stats, every country

It should be almost effortless to add new dataset scrapers, a pandas pd.read_html(...) and a simple scraper schedule. Datasets should keep version history so monitoring change over time is easy and useful.

What kind of interface?

  • Ripgrep/fzf full text searching of everything.
  • Splunk style web interface for event streaming.
  • Csvbase style interface for tables.
  • Ag-grid dataframe viewing at a button click.
  • Web based code editing and repl, jupyter is terrible but hints at the possibility.
  • I think kdb+ has a "something studio" app that might have some useful ideas.
  • Github.dev with the vscode web version is not it.
  • Docs should also be effortless and live with the code, jupyter notebooks aren't quite it but hint at it.

How do we implement this?

What functionality is required for the above?

  • Long lived state (independant from compute nodes): Filesystem or S3-compatible object storage?
    • Mostly thinking about mostly-read parquet files here.
  • authn/authz to protect the long lived state.
  • Job scheduling. Cron/systemd/etc but needs to work with transient nodes.
    • DAG job dependencies (Bank Python's Dagger)
  • Webserver nodes. Or can we get away with static bucket hosting?
  • Transient compute nodes. Mix of home/office desktops and cloud VMs. Lots (most?) datasets are small enough that a home pc could run scheduled tasks like: scrape-table-from-site.py and push to S3 or similar.

K.I.S.S.

  • S3 (or R2) full of parquet files
  • guix for language tooling (or nix, but guix is nicer ime)
  • tailscale (or headscale)

It should be extremely cost effective.

Home PCs should participate in scheduled jobs transparently.

Use spot nodes temporarily if needed, transparently.

It should also be multi-user capable, at least for a single org.

Publishing "notebooks" should be as easy as gist.github.com.

How do we NOT implement?

  • kubernetes - i'm convinced this is way overcomplicated for the vast majority of organisations
  • docker - just zip up binaries into a blob, what could go wrong?
  • serverless - ridiculous vendor lockin doesn't seem like the sort of thing we'll build the Starship Enterprise with. Maybe fly.io is worth digging into though to test out the ideas?
@AshyIsMe
Copy link
Author

@AshyIsMe
Copy link
Author

Bank Python is fascinating and full of amazing and also horrific ideas.

  • distributed KV store (with namespaces and union "mounting" like plan9.
  • dag job/event dependencies
  • Job scheduler (systemd, koobies, nomad, etc)
  • oltp vs olap (separate solution for both)

How do we take the best and simplest ideas from smalltalk, plan9, erlang, Bank Python, APL and mainframes, serverless, and meet the core features without it turning into ludicrous techno dweeb rube Goldberg machines?

@AshyIsMe
Copy link
Author

AshyIsMe commented Sep 19, 2023

Possibly useful:

Mine for ideas:

@AshyIsMe
Copy link
Author

Distributed Shell Syntax

Just like bash except Each process can run on a separate node in the cluster.

Pipelines should be network transparent.
Need some form of shared filesystem for the binary PATH and also to redirect from/to.

Design a syntax without any implementation to start with.

Interactive cluster compute, old school unix style.
What is a cluster other than a SMP machine with higher latency between cores?

@tejr mentioned that Rob Pike used to talk about something similar as "host independence".

@AshyIsMe
Copy link
Author

@AshyIsMe
Copy link
Author

@AshyIsMe
Copy link
Author

Elixir Explorer and Livebook look very interesting: https://github.com/elixir-nx/

@AshyIsMe
Copy link
Author

@AshyIsMe
Copy link
Author

AshyIsMe commented May 18, 2024

@AshyIsMe
Copy link
Author

self hosted gitpod or code-server for web ide possibly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment