Skip to content

Instantly share code, notes, and snippets.

@thinkerbot
Created May 18, 2012 03:55
Show Gist options
  • Save thinkerbot/2723073 to your computer and use it in GitHub Desktop.
Save thinkerbot/2723073 to your computer and use it in GitHub Desktop.

Utilities

# fork
zmqc -w PUSH ADDRESS...

# merge 
zmqc -w PULL ADDRESS...

Wrapper

zmqn COMMAND....

Start command on n subprocesses like:

merge A ipc | COMMAND | fork B ipc

Start a sink like:

merge B ipc

Start a source like:

fork A ipc

Issues

  • Signals must be propagated to all subprocesses
  • This only handles stdout. There may need to be another sink for stderr, which would ideally also indicate which subprocess caused the issue
  • If a worker fails, then records can be lost in between. There must be a mechanism ensuring all records are processed (ex all workers must exit 0).

Wish List

  • Sources can split on arbitrary record separators (both the source for the workers and the source for the sink)
  • The sink can print results in order. This is probably stupid. It would require the sink to have a polling order set per the source (ie source-sink communication). Processing would block on the slowest record.
  • Allow workers to be created on other machines. This could be a separate command that just starts the source and sink to push and pull from specific places.

Thoughts

Perhaps stderr can be used for real good here. Have the source and sink indicate progress, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment