juliangruber/00-readme.md

Last active August 22, 2022 07:12

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/juliangruber/032276dbdd44e056d5f7fc18d8debc3e.js"></script>
Save juliangruber/032276dbdd44e056d5f7fc18d8debc3e to your computer and use it in GitHub Desktop.

Download ZIP

Raw

00-readme.md

L1<>L2 architecture

Conventions

Naming
- L1: L1 machine
- l1: Individual l1 worker process
The overview graph lists all involved components. In future graphs irrelevant components are omitted for readability, which doesn't mean they have been removed

Problems to solve

L2->l1 routing
l1->L2 routing

Raw

01-overview.md

Overview

flowchart TB
  subgraph L1
    master
    subgraph workers
      l1
      l1'
    end
  end
  L2
  L2'
  subgraph L2s
    L2
    L2'
  end
  L2s --> |register|workers
  workers --> |request CID|L2s
  L2s --> |send CAR|workers
  client --> |request CID|workers
  workers --> |send CAR|client

sequenceDiagram
  participant Client
  participant l1s
  participant L2s
  L2s->>l1s: Register
  Client->>l1s: Request CID
  l1s->>L2s: Request CID
  L2s->>l1s: Send CAR
  l1s->>Client: Send CAR

Raw

02-problem-a.md

Problem: L2->l1 routing

The l1 worker that receives the client request needs to be the one that also receives the CAR file from the L2. If L2s make new HTTP requests for their response, then the Node.js cluster module assigns it to a random l1 worker process.

We run into this problem because L2s likely living in home networks aren't directly reachable by the outside / can't act as servers. Therefore the L2 is the only one that can open new connections to the L1. This means that the L1 can't act as an HTTP client for example, requesting data from the L2. The L1 can request data, but unless we figure out how the L2 can reply directly, it will hit the L1's load balancer and get assigned a random l1, which didn't receive the original HTTP request from the client.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 --> |register|l1
  l1 --> |request CID|L2
  L2 --> |send CAR|l1'
  client --> |request CID|l1
  l1 --> |send CAR|client
  
  linkStyle 2 stroke:red
  linkStyle 4 stroke:red

Raw

04-workers-advertise-ports.md

Proposal: Workers expose ports

Workers get their own port, so that L2 can reach the right worker (not behind load balancer).

needs one extra port per cpu core
complex dynamic nginx configuration
complex dynamic docker configuration
complex dynamic firewall configuration

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 --> |register|l1
  l1 --> |request CID with port|L2
  L2 --> |send CAR to worker port|l1
  client --> |request CID|l1
  l1 --> |send CAR|client
  
  linkStyle 2 stroke:yellow
  linkStyle 1 stroke:yellow
  linkStyle 4 stroke:green

Raw

05-nginx-sticky-routing-good.md

Proposal: Nginx sticky routing

Make nginx use sticky routing

doesn't work with free nginx because the incoming client request and the L2's request are from different sources and so stickyness can't be applied per client

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
  end
  L2 -->|register|l1
  l1 --> |request CID|L2
  L2 --> |send CAR|l1
  client --> |request CID|l1
  l1 --> |send CAR|client

TODO

If we use the hash routing strategy then we can match L1s and L2s by CID URL path segment.

Raw

06-next-strategies.md

Next strategies to try

Single TCP Connection

If there is a single TCP connection between l1 and L2, then there's no load balancing between the two and the l1 that asks L2 is also the one that gets the response.

libp2p multiplexer
- I couldn't get the JS version to run unfortunately
binary websockets
- Worked in contrived example
- CAR responses will be chunked and prefixed by fixed-length request ID
- This is the current state of the L1<>L2 integration PR, and seems to work
reverse http
- protocol implications (same issue with http/1 and http/2)

Custom routing

If we replace the round robin routing in the Node.js cluster module with our own logic, we can ensure that a L2 response always arrives at the right l1, by inspecting the request URL.

custom nodejs cluster implementation
- likely quite complex

Raw

061-solution.md

Solution to L2->l1 routing

Use binary WebSockets, which allow for a persistent connection between L2 and l1, bypassing any load balancing.

Protocol

L2 registration

0{L2ID}

0 + arbitrary length l2 id (utf-8)

L1 request

{requestId[36]}{cid}

36 bytes request Id (utf-8) + cid (utf-8)

L2 response

1{requestId[36]}{chunk?}

1 + 36 bytes request Id (utf-8) + chunk?

If the chunk is empty that means the CAR file has been served completely.

TODO

Use protocol buffers

Raw

062-websocket-performance.md

Websocket performance

Unfortunately WebSockets are significantly slower than raw HTTP:

$ time node bench.mjs ws
node bench.mjs ws  5.96s user 0.66s system 104% cpu 6.329 total
$ time node bench.mjs http
node bench.mjs http  0.33s user 0.47s system 64% cpu 1.251 total

import WebSocket, { WebSocketServer } from 'ws'
import http from 'node:http'

const chunk1MB = Buffer.alloc(1024 * 1024).fill(1)

if (process.argv[2] === 'ws') {
  const wss = new WebSocketServer({ port: 3000 })
  wss.on('connection', ws => {
    ws.on('message', d => {
      if (d.length === 0) {
        process.exit()
      }
    })
  })
  const ws = new WebSocket('ws://127.0.0.1:3000')
  ws.on('open', () => {
    let todo = 1024
    const send = () => {
      if (--todo > 0) {
        ws.send(chunk1MB)
        setTimeout(send, 0)
      } else {
        ws.send(Buffer.alloc(0))
      }
    }
    send()
  })
} else {
  const server = http.createServer((req, res) => {
    let todo = 1024
    const send = () => {
      if (--todo > 0) {
        res.write(chunk1MB)
        setTimeout(send, 0)
      } else {
        res.end()
      }
    }
    send()
  })
  server.listen(3000)
  const req = http.get('http://127.0.0.1:3000', res => {
    res.on('data', () => {})
    res.on('end', () => {
      process.exit()
    })
  })
}

Raw

063-reverse-http.md

Reverse HTTP

For reference, my attempt at a reverse http implementation (ptth):

import http from 'http'
import stream from 'stream'

const server = http.createServer((req, res) => {
  console.log('l1 received request')
  console.log('l1 tries turning into the client now')
  console.log('l1 request cid')
  req.socket.on('data', d => console.log('l1 receive', d.toString()))
  const cidReq = http.request({
    createConnection: () => req.socket,
    // createConnection: () => stream.Duplex.from({ writable: res, readable: req }),
    path: '/cid/CID'
  }, cidRes => {
    console.log('l1 received cid response')
  })
  cidReq.end()
})

server.listen(3000, () => {
  console.log('l1 make request')
  const req = http.request('http://localhost:3000', {
    method: 'POST'
  }, res => {
    console.log('l2 received response')
  })
  req.on('connect', () => {
    console.log('l2 established connection')
    console.log('l2 tries turning into the server now')
  })
  req.end()
})

$ node http.mjs
l1 make request
l1 received request
l1 tries turning into the client now
l1 request cid
node:events:505
      throw er; // Unhandled 'error' event
      ^

Error: Parse Error: Expected HTTP/
    at Socket.socketOnData (node:_http_client:494:22)
    at Socket.emit (node:events:527:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at Socket.Readable.push (node:internal/streams/readable:228:10)
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23)
Emitted 'error' event on ClientRequest instance at:
    at Socket.socketOnData (node:_http_client:503:9)
    at Socket.emit (node:events:527:28)
    [... lines matching original stack trace ...]
    at TCP.onStreamRead (node:internal/stream_base_commons:190:23) {
  bytesParsed: 0,
  code: 'HPE_INVALID_CONSTANT',
  reason: 'Expected HTTP/',
  rawPacket: Buffer(64) [Uint8Array] [
     71,  69,  84,  32,  47,  99, 105, 100,  47,  67,  73,  68,
     32,  72,  84,  84,  80,  47,  49,  46,  49,  13,  10,  72,
    111, 115, 116,  58,  32, 108, 111,  99,  97, 108, 104, 111,
    115, 116,  58,  56,  48,  13,  10,  67, 111, 110, 110, 101,
     99, 116, 105, 111, 110,  58,  32,  99, 108, 111, 115, 101,
     13,  10,  13,  10
  ]
}

Suggestions by Miro:

Regarding Reverse HTTP code snippet - I think the client (L2) needs to start with an HTTP/1.1 Upgrade request. The server (L2) responds with HTTP status 101 to confirm the upgrade. Only after this initial exchange you are in control of the TCP connection and can start setting up a reverse HTTP.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Protocol_upgrade_mechanism

Protocol upgrade spec: https://datatracker.ietf.org/doc/html/rfc7230#section-6.7

Registry of protocol names: https://datatracker.ietf.org/doc/html/rfc7230#section-8.6 and https://www.iana.org/assignments/http-upgrade-tokens/http-upgrade-tokens.xhtml

WebSocket’s use of protocol upgrade - for inspiration: https://datatracker.ietf.org/doc/html/rfc6455#section-1.2 Now the list of registered protocols for upgrade is very short (basically HTTP, HTTPS, websocket), it’s a question is whether ngix supports custom protocols.

Raw

08-problem-b.md

Problem: l1->L2 routing

Now that we have solved the load balancing aspect of L2->l1 routing, another problem arises:

Each L2 is connected to one l1. This means that l1s don't always have a working connection to the L2 that should be selected by the consistent hashing algorithm, and can't help themselves but request the CID from the wrong L2, which will then also hydrate itself.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1'
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1
  
  linkStyle 2 stroke:red
  linkStyle 4 stroke:red
  linkStyle 5 stroke:red

Raw

09-proxy.md

Proposal: Single proxy

Add a singular proxy process that sits in between all l1s and their connected L2s. This proxy can forward request, selecting the right L2s to contact, and also forward responses to the right l1.

flowchart TB
  subgraph L1
    l1
    l1'
    proxy
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|proxy
  L2' --> |register|proxy
  l1 --> |request CID|proxy
  proxy --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|proxy
  proxy --> |send CAR|l1

Raw

10-hash-routing.md

Proposal: Hash routing

Route L1/L2 requests by CID path segment (GET /ipfs/CID & POST /data/CID), this way connecting the right l1s with L2s. This requires us to dith the node cluster approach and let nginx load balance between them.

Downside: l1s need to synchronize with each other, to all know of all connected L2s, and ask the with l1 to ask for a CID from the right L2.

flowchart TB
  subgraph L1
    l1
    l1'
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1

map $uri $myvar{
    default $uri;
    # pattern "Method1" + "/" + GUID + "/" + target parameter + "/" + HASH;
    "~*/Method1/(.*)/(.*)/(.*)$" $2;
    # pattern "Method2" + "/" + GUID + "/" + target parameter;
    "~*/Method2/(.*)/(.*)$" $2;
}


upstream backend {
    hash $myvar consistent;
    server s1:80;
    server s2:80;
}

Raw

11-single-node-process.md

Proposal: Single Node.js process

If we switch from Node.js process per CPU core to a singular Node.js http process, then we don't need a routing layer for l1s at all.

This means incoming HTTP requests don't need to be routed to the correct l1, and also L2 responses don't need to be routed to the correct l1. This removes the need for persistent connections between l1s and L2s as well, so we can go back to the NLDJSON + HTTP POST protocol.

Downside: If we need to distribute workload across multiple CPU cores, we need to look into strategies like worker pools with shared array buffers, on top of which streaming data verification (and more work) is implemented. For now, this shouldn't be a concern though.

Protocol

L2 registers, L1 requests data

GET htts://L1_URL/register/L2_ID
{"requestId":"REQUEST_ID","cid":"CID"}\n
{"requestId":"REQUEST_ID","cid":"CID"}\n
{"requestId":"REQUEST_ID","cid":"CID"}\n
\n                                      <- heartbeat
...

"requestId" is created by the client connecting to the L1, passed down to the L2. It will be attached by L1 and L2 to log lines created as part of the request response cycle, for ease of debugging.

L2 sends data

POST https://L1_URL/data/CID

Architecture

flowchart TB
  subgraph L1
    l1
  end
  subgraph L2s
    L2
    L2'
  end
  L2 --> |register|l1
  L2' --> |register|l1
  l1 --> |request CID|L2
  client --> |request CID|l1
  l1 --> |send CAR|client
  L2 --> |send CAR|l1

Potential improvements

Report L2 health through bidirectional ndjson protocol

POST htts://L1_URL/register/L2_ID
> {"ok":true,"cpu":0.8,"memory":0.2}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n
< {"requestId":"REQUEST_ID","cid":"CID"}\n

Author

juliangruber commented Aug 16, 2022 •

edited

Loading

document potential bidirectional ndjson protocol for reporting l2 health

juliangruber/00-readme.md

L1<>L2 architecture

Conventions

Problems to solve

Overview

Problem: L2->l1 routing

Proposal: Workers share files

Proposal: Workers expose ports

Proposal: Nginx sticky routing

TODO

Next strategies to try

Single TCP Connection

Custom routing

Solution to L2->l1 routing

Protocol

L2 registration

L1 request

L2 response

TODO

Websocket performance

Reverse HTTP

Problem: l1->L2 routing

Proposal: Single proxy

Proposal: Hash routing

Proposal: Single Node.js process

Protocol

L2 registers, L1 requests data

L2 sends data

Architecture

Potential improvements

Report L2 health through bidirectional ndjson protocol

juliangruber commented Aug 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliangruber commented Aug 16, 2022 •

edited

Loading