AshyIsMe/idealcomputing.md

Last active May 18, 2024 08:42

Star (2) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/AshyIsMe/e3ff095a1f5f1698a76c1c0693cd4a2d.js"></script>
Save AshyIsMe/e3ff095a1f5f1698a76c1c0693cd4a2d to your computer and use it in GitHub Desktop.

Download ZIP

ideal computing environment

Raw

idealcomputing.md

What kind of Computing environment do I want?

Dataframe Oriented Programming: https://csvbase.com/blog/1

Accessing up to date data quickly and easily (even on phone) and pulling decision making information out of it constantly should be effortless.

Something like the stories of the old APL mainframe environment.

Named tables that are always up to date.
Not needing to worry about ram or compute location relative to data etc.
Hierarchical but also tagged organisation of datasets.
Easy job scheduling (like systemd-timers)

What kind of Datasets?:

market data
commodities
currencies
commodity shipments
country productions
country investments (all the SAI global database stuff)
energy data: opennem.org.au, global
tables from Wikipedia
US bureau of stats, every country

It should be almost effortless to add new dataset scrapers, a pandas pd.read_html(...) and a simple scraper schedule. Datasets should keep version history so monitoring change over time is easy and useful.

What kind of interface?

Ripgrep/fzf full text searching of everything.
Splunk style web interface for event streaming.
Csvbase style interface for tables.
Ag-grid dataframe viewing at a button click.
Web based code editing and repl, jupyter is terrible but hints at the possibility.
I think kdb+ has a "something studio" app that might have some useful ideas.
Github.dev with the vscode web version is not it.
Docs should also be effortless and live with the code, jupyter notebooks aren't quite it but hint at it.

How do we implement this?

What functionality is required for the above?

Long lived state (independant from compute nodes): Filesystem or S3-compatible object storage?
- Mostly thinking about mostly-read parquet files here.
authn/authz to protect the long lived state.
Job scheduling. Cron/systemd/etc but needs to work with transient nodes.
- DAG job dependencies (Bank Python's Dagger)
Webserver nodes. Or can we get away with static bucket hosting?
Transient compute nodes. Mix of home/office desktops and cloud VMs. Lots (most?) datasets are small enough that a home pc could run scheduled tasks like: scrape-table-from-site.py and push to S3 or similar.

K.I.S.S.

S3 (or R2) full of parquet files
guix for language tooling (or nix, but guix is nicer ime)
tailscale (or headscale)

It should be extremely cost effective.

Home PCs should participate in scheduled jobs transparently.

Use spot nodes temporarily if needed, transparently.

It should also be multi-user capable, at least for a single org.

Publishing "notebooks" should be as easy as gist.github.com.

How do we NOT implement?

kubernetes - i'm convinced this is way overcomplicated for the vast majority of organisations
docker - just zip up binaries into a blob, what could go wrong?
serverless - ridiculous vendor lockin doesn't seem like the sort of thing we'll build the Starship Enterprise with. Maybe fly.io is worth digging into though to test out the ideas?

Author

AshyIsMe commented Sep 17, 2023

Muy Interesante: https://calpaterson.com/bank-python.html

Author

AshyIsMe commented Sep 18, 2023

Bank Python is fascinating and full of amazing and also horrific ideas.

distributed KV store (with namespaces and union "mounting" like plan9.
dag job/event dependencies
Job scheduler (systemd, koobies, nomad, etc)
oltp vs olap (separate solution for both)

How do we take the best and simplest ideas from smalltalk, plan9, erlang, Bank Python, APL and mainframes, serverless, and meet the core features without it turning into ludicrous techno dweeb rube Goldberg machines?

Author

AshyIsMe commented Sep 19, 2023 •

edited

Loading

Possibly useful:

https://superset.apache.org/
https://github.com/hyperdxio/hyperdx
rqlite or sqlite+litestream

Mine for ideas:

beacon.io - https://www.youtube.com/@beaconplatforminc
- They seem to use the web version of vscode interestingly.
https://www.redhat.com/sysadmin/quadlet-podman

Author

AshyIsMe commented Sep 25, 2023

Distributed Shell Syntax

Just like bash except Each process can run on a separate node in the cluster.

Pipelines should be network transparent.
Need some form of shared filesystem for the binary PATH and also to redirect from/to.

Design a syntax without any implementation to start with.

Interactive cluster compute, old school unix style.
What is a cluster other than a SMP machine with higher latency between cores?

@tejr mentioned that Rob Pike used to talk about something similar as "host independence".

Author

AshyIsMe commented Sep 28, 2023

Muy Interesante:

Author

AshyIsMe commented Sep 30, 2023

Also very interesting:

https://github.com/man-group/ArcticDB

Author

AshyIsMe commented Jan 14, 2024

Elixir Explorer and Livebook look very interesting: https://github.com/elixir-nx/

Author

AshyIsMe commented Feb 17, 2024

https://github.com/JerBouma/FinanceDatabase

Author

AshyIsMe commented May 18, 2024 •

edited

Loading

Mine this for ideas: https://www.timestored.com/pulse
https://pulseui.net/dash
https://github.com/timestored/pulseui

Author

AshyIsMe commented May 18, 2024

self hosted gitpod or code-server for web ide possibly

AshyIsMe/idealcomputing.md

What kind of Computing environment do I want?

What kind of Datasets?:

What kind of interface?

How do we implement this?

How do we NOT implement?

AshyIsMe commented Sep 17, 2023

Uh oh!

AshyIsMe commented Sep 18, 2023

Uh oh!

AshyIsMe commented Sep 19, 2023 •

edited

Loading

Uh oh!

AshyIsMe commented Sep 25, 2023

Uh oh!

AshyIsMe commented Sep 28, 2023

Uh oh!

AshyIsMe commented Sep 30, 2023

Uh oh!

AshyIsMe commented Jan 14, 2024

Uh oh!

AshyIsMe commented Feb 17, 2024

Uh oh!

AshyIsMe commented May 18, 2024 •

edited

Loading

Uh oh!

AshyIsMe commented May 18, 2024

Uh oh!

AshyIsMe/idealcomputing.md

What kind of Computing environment do I want?

What kind of Datasets?:

What kind of interface?

How do we implement this?

How do we NOT implement?

AshyIsMe commented Sep 17, 2023

Uh oh!

AshyIsMe commented Sep 18, 2023

Uh oh!

AshyIsMe commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AshyIsMe commented Sep 25, 2023

Distributed Shell Syntax

Uh oh!

AshyIsMe commented Sep 28, 2023

Uh oh!

AshyIsMe commented Sep 30, 2023

Uh oh!

AshyIsMe commented Jan 14, 2024

Uh oh!

AshyIsMe commented Feb 17, 2024

Uh oh!

AshyIsMe commented May 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AshyIsMe commented May 18, 2024

Uh oh!

AshyIsMe commented Sep 19, 2023 •

edited

Loading

AshyIsMe commented May 18, 2024 •

edited

Loading