Skip to content

Instantly share code, notes, and snippets.

@sogaiu
Last active November 15, 2024 15:27
Show Gist options
  • Save sogaiu/0c504e35ef10ee49bee8442200abe222 to your computer and use it in GitHub Desktop.
Save sogaiu/0c504e35ef10ee49bee8442200abe222 to your computer and use it in GitHub Desktop.
mrepl, nrepl notes

thinking to have

  • network-based
    • easier to detach / reattach compared to stdio-based? (doesn't tmux manage somehow?)
    • debugging of messages might be easier? at least doesn't require changes to code because wireshark or similar can be used.
  • sessions
    • not having sessions (assuming single connection) has drawbacks?
      • long-evaluation blocking communication issue (cf. tooling session vs eval session)

uncertain

  • use of bencode, netstrings, or similar
    • the length prefixes may help with boundary detection and memory usage?
    • though could magic bytes do something similar? what about arbitrary output via stdout...couldn't that contain magic bytes that could mess stuff up?
    • fixed with prefix that contains message size (or remaining message size) might be better (see comments / code by hiredman and bakpakin (spork's msg.janet))
  • how to make available to arbitrary program
    • clojure provided a socket repl which can be enabled without changing existing code
    • unrepl "upgraded" an existing socket repl
  • capability querying instead of versioning because of brittleness?
    • other projects seem to have adopted this approach
      • emacs
      • lsp

potential modifications vs nrepl or other existing options

  • response messages being tagged appropriately seems like a better design compared to nrepl's
    • response message type having to be guessed from keys and contextual info?
      • nrepl does this and thus the client is burdened with more complexity?
      • theoretical problems with response messages being indistinguishable if middleware design is not coordinated across all entities?
  • if a middleware mechansim or other extension mechanism is considered, better to have "namespaced" op names?
  • on the whole is it better to have timestamp responses from the server?
    • is there some weird edge case of time on the server being modified in inappropriate ways?
  • actually, do bencode / netstrings really address the buffer allocation issue well?

what stuff about nrepl is good to keep?

  • sessions

things to leave off of an initial version

  • multiple transports
  • middleware

things that might be nice but seem potentially impractical to do well

naming fun

  • arepl
  • enrepl
  • moreplay - what we need is less work and...

https://groups.google.com/g/clojure/c/iyqFHXkO0Mw/m/grrvy75RGc0J

hiredman:

nrepl's protocol is also very line reader centric, which is a drag,
and the "integer" that prefixes messages is really just a variable
length string and is not useful for allocating buffers to receive data
in a client because it is a lines / 2 instead of a byte count. this
makes writing a client that uses anything but a BufferedReader
challenging. I am not advocating the slime protocol, but at least the
slime protocol prefixes message with a fixed width representation of
the count of bytes in the rest of the message.

I have been toying with a slime<->nrepl and using nio and polling to
make it all run in single thread, the slime protocol is very easy to
process like this, while I haven't been able to figure out a good way
to parse the nrepl protocol without having a thread continuously loop
around reading lines from the socket.

cemerick:

My apologies for not getting back to you privately about this earlier.
It's been a hell of a week.

When I was designing the protocol, my aim was to make it simple enough
to implement from, e.g., Python, and able to be hoisted up onto something
like STOMP with relative ease. Thus, line orientation and all-strings
made a lot of sense.

I would not pretend to be an expert at designing network protocols. My
question would be: in what context is allocating response buffers
efficiently a must-have?

hiredman:

I am no python programmer, but if you look at
http://docs.python.org/library/socket.html you see it passes in the
number of bytes you wish to receive on a call to the receiv method on
a socket. With that in mind parsing nrepl messages becomes a huge
pain. At no time when parsing a nrepl message do you know how many
bytes you need. The messages start with a string representation of a
number (variable width) and the rest of the message is some set of new
delimited strings.

I thought you were just advocating for ditching slime because it's not
clojure centric enough, how does python fit into this?

stuartsierra:

Yes, as a heavy Emacs/SLIME user who does not work with Common Lisp any
more, I'd rather have a Clojure-specific Emacs environment, especially
something that can do more introspection of the JVM, e.g., look up
JavaDocs and examine classes through reflection.

In my not-terribly-well-informed opinion, string-oriented protocols are
easier to parse in string-oriented languages like Perl and Emacs Lisp,
whereas byte-oriented protocols help avoid mixups with character encoding
and line-endings.  Pick your poison.

hiredman:

My objection has nothing to do with string vs. byte.

Messages used in wire protocols exist on a continuum between fixed
width and variable width. The happy medium there, which almost all
protocols follow is a fixed width header that also provides the bye
count of the following variable width body. The nrepl protocol appears
to have aimed at that, but missed.

Fixed width messages have the advantage of being extractable from the
byte stream without having to do any parsing, while with a variable
width message format you must parse and extract messages from the
stream in the same step.

After trying to write a nrepl client it is fairly obvious to me that
in the creation of the protocol similar protocols were examined and
because they were prefixed by numbers, the nrepl messages were,
without a clear understanding of why and what the purpose of the
number is. As implemented the number may as well be replaced with some
kind of START and END tag.

cemerick:

FWIW, a hello-world-level interaction with an nREPL server from python:

https://gist.github.com/de3b8d0ecdccf6655a63

You don't need to know how many bytes you need when parsing an nrepl
message, which would seem to be an advantage to me.

> I thought you were just advocating for ditching slime because it's not
> clojure centric enough, how does python fit into this?

I've never advocated "ditching slime", I've been advocating for a network
REPL suitable for use by any and all Clojure tooling. My mentioning python
was just an example of a non-Clojure runtime from which one might want to
talk to a Clojure/JVM process running a network REPL server.

...

hiredman:

I didn't mean to imply that I wanted to replace the number with tags,
what I meant to imply is that the number of lines is not any better
than START and BEGIN tags, while a fixed width count of bytes (even a
number string padded to a constant number of bytes with zeroes) is
better.

cemerick:
  hiredman: Is your only objection the wire protocol?

hiredman:
  cemerick: that was as far as I got implementing it, if there
            wire protocol stops being a blocker, I may find other things
            but given that the wire protocol opens the door to evaling
            things, seems like anything else could be worked around
            infact I worked around the wireprotocol by sticking
            bytebuffers in a blocking queue and extending IOFactory to
            lbq's and having a thread that just read from there, but I
            really wasn't happy with it

cemerick:
  hiredman: so you're doing this from Java or Clojure already?

hiredman:
  clojure

  why would I did this from java?

cemerick:
  why not use the nREPL client that's already there?

hiredman:
  because I was trying to write a single threaded polling
  translator between nrepl and slime, and currently reading
  from nrepl requires blocking, complicated parsing, or a
  seperate thread

  which if you are trying to forward stuff back and forth
  between sockets in a single thread is bad

cemerick:
  hiredman: OK. Help me fully understand your objection --
  so this is not just a matter of protocol design hygiene?

hiredman:
  you cannot know how many bytes you need to read from the
  wire in order to have read the entire message

cemerick:
  I got that, I'm just being dense about why that matters. The
  number of lines (2 * entries) is known, so why isn't that
  effectively the same thing?

hiredman:
  no!

  byte count and number of lines are not the same thing

cemerick:
  surely not

  *functionally* the same thing

hiredman:
  with line count in order to determinte the number of bytes to
  read you have to scan through the byte stream as you
  reading looking for newlines

cemerick:
  sure

  why is byte count important?

  * cemerick is trying to understand the use case, not be difficult

hiredman:
  you mean my exact use case?

cemerick:
  sure

hiredman:
  with nio you don't use io streams

  you hand a byte buffer to a channel and say "fill this up with
  bytes"

cemerick:
  oh, NIO!

  See, I missed that in your message from earlier.

  OK, now I see your objection.

hiredman:
  newline seperation and line count are only useful if you only
  plan on implementing on top of line reader type things

cemerick:
  which is what I was aiming for. *shrug*

  I've never had any use for NIO.

  hiredman: so you're working on a SLIME/swank <-> nREPL bridge?

hiredman:
  I was, just fun, not something to release, standard
  disclaimers, etc

cemerick:
  OK; in that case, do you believe implementing such a thing
  would require a nonblocking implementation in any case?

hiredman:
  cemerick: doesn't require, but I was enjoying writing one
  until I ran into this issue with nrepl

18:59
cemerick:
  The fact that there are no CharChannels is unfortunate

  I *really* don't want to start dropping byte counts into a
  stream that's just UTF-8 all the time, period.

hiredman:
  cemerick: a fixed number of utf-8 characters has a fixed byte
  count

cemerick:
  hiredman: not above \u0255

hiredman:
  (format "%06x" 10)

  ,(format "%06x" 10)

clojurebot:
  "00000a"

hiredman:
  numerals are not above that

  the entire thing doesn't need to be fixed, you just need a fixed
  header that gives you the byte count of the rest

cemerick:
  I'm not saying that the numerals are going to be outside the
  UTF-8 range or byte ordering, I'm saying I don't like mixed-
  mode representations.

hiredman:
  mixed mode in what way?

cemerick:
  it's a stream of chars that has a byte-counts in the middle of
  it

  it's not a functional objection, only a hygenic one.

  reminds me of PDF in an unfortunate way, actually

hiredman:
  not functional?

cemerick:
  I mean, what you're suggesting will work just fine, I would
  just consider it warty.

hiredman:
  in what sense?

cemerick:
  it's no longer a pure text protocol

  anyway, the bigger problem is this would make reader-
  oriented clients impossible

hiredman:
  not true

  oh, maybe true

cemerick:
  you'd have to go along counting bytes of each char or string you read

  yack

  I'll have to think about things a bit more.

  hiredman: thank you for the discussion. 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment