Original revision: Sep 6, 2018
Most developers are familiar and proponents of the Unix Philosophy Unix philosophy - Wikipedia particularly, Write programs that do one thing and do it well. In practice though, the tooling just doesn’t exist to build useful network services which follow this approach.
Let’s take a lightweight WebSocket service. In 2018 we’ve no shortage of languages and frameworks - however largely incompatible with each other, to create the service.
I’m most familiar with the Python world so can break out the different frameworks in that world that you could use: twisted, eventlet, gevent, tornado, asyncio, sanic - and even though these use the same base language, using libraries designed to be used with one of these frameworks would likely be difficult to use with another framework. And then there are also a myriad of options with Java, Golang, Erlang, Rust.
I think it’s telling when an interesting innovation happens in one of these ecosystems, people who prefer a different one begin porting (or requesting ports).
I’ve been messing with https://pptr.dev, a Node.js library to easily automate Chrome a bunch recently (it’s great). It’d be nice to process data scraped with Puppeteer seamlessly using libraries I’m familiar with in Python. As near as I can tell Puppeteer was first made publicly available around Aug 18, 2017. Come Aug 28, 2017 Are there any plans to port puppeteer to Python? · Issue #575 · GoogleChrome/puppeteer · GitHub
There is now what looks to be a useful port GitHub - miyakogi/pyppeteer: Headless chrome/chromium automation library (unofficial port of puppeteer)
I run (badly) a few open source projects and I feel exhausted just thinking about the on-going effort that’ll be needed to maintain this unofficial port.
Boot strapping an entirely new method of development, a new language, a new approach on concurrency is even more daunting. Unless you are able to attract sufficient volunteers to flesh out your ecosystem with the essential batteries included, realistically even if your approach has significant novel advantages it won’t be usable for real work.
A quick brain dump of batteries an ecosystem could really use:
- Protocols: json / msgpack / thrift / grpc
- Ability to read / write document formats: cvs, xls, pdf
- Network: TCP, HTTP, HTTP/2, WebSockets
- Bindings for AWS, Kafka, Redis, MySql, Sqlite, Mongo
- Rich date handling
- DNS resolution
- Sane primitives to coordinate async
- Template rendering
- Package management
- Cryptography, tls, ssh
- Science and math libraries
- Heck: even just slugify-ing a url using industry best practices
Pony Lang is a new language that has a lot of interesting qualities. The project list "batteries required" under reasons not to use it yet Discover - Pony
So what would it look like if we constructed our systems with small tools that can communicate easily with each other?
The first thing, I think(?) is that this is largely not possible currently. The suite of small tools needed don’t exist.
This is a shot at a HTTP Server that takes a JSON payload with two keys, a
and b
and returns their sum.
$ s6-tcpserver 127.0.0.1 8080 sh -c '
http2json | \
jq .body | jq "{\"res\": .a + .b}" \
json2http'
$ jo a=3 b=4 | curl -d @- localhost:8080
{"res": 7}
Some more thoughts looking at this snippet:
- Bash quoting is prohibitive to building complex system on the command line.
- s6 use of “Bernstein chaining” Chain loading - Wikipedia has a lot of advantages but isn’t as natural
TCP socket server, binds to a port, spawns a process for each connection, a maps the connection’s socket’s read to the processes stdin and the processes stdout to the socket’s write.
In this case a small shell script is spawned that has an imaginary binary http2json that parses HTTP requests from it’s stdin and translate it to a JSON document, perhaps in the form:
{
"method": "POST",
"path": "/",
"headers": {...},
"body": "{\"a\":3,\"b\":4}"}
This is then piped to an instance of jq to extract the body of the request, and then a second version of jq which parses the body as JSON, and sums fields a
and b
and finally pipes this result to an imaginary binary that would take the JSON payload and turn it into a HTTP response.
Sorry for the delay getting some of these thoughts out. I was away this weekend, and I've been trying to cobble together a few projects that I've had on the back burner. I'm kind of just dumping a few ideas, and I'll post more thoughts as they come.
I think what you are touching on is really about the ability to compose small, well written, units. The UNIX philosophy really shines because of the ability to compose these units together. You can write things that do one thing well because they don't have to be concerned too much with what goes on either side (stdin and stdout). This restriction is really the catch-22 though. By being so restricted, you give up considering how you might improve throughput because there is this unavoidable barrier. But that very barrier is such a powerful unit of composability. As soon as you ask the question of performance, you have to either accept the limitations or entirely break the model.
But, it is fun to write simple tools that utilize the standard POSIX APIs. Its like they were designed for that or something. :)
It does beg the question of a hypothetical system that doesn't suffer the limitation while retaining the flexibility, but I'm not sure if that is really feasibly. Microsoft had tried with PowerShell if I recall correctly, but that was still an attempt at improving management tools, not really a service composition system.