Skip to content

Instantly share code, notes, and snippets.

@wapiflapi
Last active August 29, 2015 14:15
Show Gist options
  • Select an option

  • Save wapiflapi/0b0ad7e0d156cb3b1192 to your computer and use it in GitHub Desktop.

Select an option

Save wapiflapi/0b0ad7e0d156cb3b1192 to your computer and use it in GitHub Desktop.
IPC architecture for binglide V2

OUTDATED

This document is now outdated, thanks for the feedback everyone. This is basically what I will be trying out next:

The rest of this document is available for reference.

Goals

We want something where a client can issue a request and receives multiple answers because a first quick answer might come from cache while a more detailed answer is being computed.

Each request can be computed in parallel because it can be devided in chunks, it needs to be agregated before it can be sent back to the client.

Requests should be able to be canceled by a client before they are finished.

Architecture

  • pipeline for parallel task distribution: The clients request stuff to be computed by workers.

  • pub-sub for out of band messaging:

    • Disconnection
    • Cancel requests
    • Maybe validate requests, which has the same effect as canceling but is issued by the sink when he receives a chunk so that other workers can stop working on it if they where. Tradeoff is chatter vs. avoid double-work. The thing is double work isnt supposed to happen and should be rare.
  • routers for request / replies: (similar to Majordomo V2)

    • The ventilator receives requests and forwards them to the sink.
    • Once the sink has complete responses he sends them back to the ventilator who can forward them back to the client.
#                         client
#                           |
#                           | .;==============================;.
#                           | ||                     .        ||
# .----------------------+-ROUTER---------+         /|\       ||
# |                      |                |      <,--|        ||
# |                      |   ventilator   |       |--'>       ||
# |                      |                |      \|/ counter  ||
# |                      +-PUB--PUSH------+       'clockwise  ||
# |                         |    ||                data-flow  ||
# | .----+------------------+----||------------.              ||
# | |    |    .;============|====++============|====;.        ||
# | |    |    ||            |    ||            |    ||        ||
# | | +-SUB--PULL------+ +-SUB--PULL------+ +-SUB--PULL-----+ ||
# | | |                | |                | |               | ||
# | | |     worker     | |     worker     | |     worker    | ||
# | | |                | |                | |               | ||
# | | +------PUSH------+ +------PUSH------+ +------PUSH-----+ ||
# | |         ||                 ||                 ||        ||
# | |         ``=================++=================''        ||
# | '-----------------------,    ||                           ||
# |                         |    ||                           ||
# |                      +-SUB--PULL------+                   ||
# |                      |                |                   ||
# |                      |      sink      |                   ||
# |                      |                |                   ||
# '----------------------+-ROUTER---------+                   ||
#                           ||                                ||
#                           ``================================''

Reliability

  • pub-sub: If a client dies we dont care, he doesnt need the notifications anymore. If server dies we have a bigger problem.

  • pipeline: If the collector (client/sink) doesn't get an answer in a reasonable amount of time he can re-issue the request.

    • If a client re-issues a request the venitlator will see the chunks as being processed already. In that case there should be a timestamp on that in-process and if it has been to long the ventilator can restart those tasks.
    • This job could also be done by the sink if waiting for some requests to complete for too long. The question is should they both do it or not?

Types of failure we aim to handle: worker crashes and restarts, worker busy looping, worker overload, queue crashes and restarts, and network disconnects.

Caching

The broker devides requests in chunks of the desired granularity, check those against the cache and then mark them as being processed. The broker doesnt forward new requests for chunks marked as being done or being processed. The sink will mark chunks it receives as being done.

Issues

  1. Not sure if the sink should have sub AND/OR pub (but also see issue 2.)

    • Some reasons for sub:
      • Know about cancelations, don't agregate.
      • Know about disconnect ?
    • Some reasons for pub:
      • send validations to workers
  2. In the current design the sink and ventilator can probably be merged.

  3. It is possible we want multiple sinks. Because the sink will do the agregation and that might be hard work. Especially when the whole file is cached in chuncks and the workers dont do anything anymore.

    Then we need another broker between 'workers' and 'sinks' ? Is this like map-reduce?

    mapreduce nodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment