A few months ago I started working on mass modernization project to learn about Arvo and Vere and to contribute to the core development. Since then I had some success working on WebAssembly interpreter, and I decided that it would be wise to focus my attention on that project instead.
I managed to achieve the first milestone outlined in the grant proposal. To ensure my efforts don't go in vain I will describe what I achieved and learned while working on that milestone.
I don't have a CS background and working on Urbit core was my first serious programming experience. I think this post would be helpful for someone who is trying to figure out how Urbit works, because I will describe my contribution in roughly the same order in which I learned and implemented the solution to the task.
The goal of the project was to replace current |mass
Hood generator with a -mass
thread that would return the memory report as a noun to the user, allowing the user to manage the report: storing it or sending to other ships or hosting providers for analyzis. In order to solve this task we need to understand:
- How
|mass
works right now; - Why a "Hood generator" approach would not be feasible for the upgrade, and how threads can be used to achieve our goal;
- How Arvo and Vere communicate to inject data into Arvo from the outside world (memory report in our case), and how to use duct system to route data to some process;
- How Vere processes (namely serf and lord) interact with each other.
When you type |mass
in Dojo, you get a memory report printout in the terminal. Let's break down the mechanics of |mass
in steps.
First, |mass
is a poke of %hood
app with the product of /hood/mass
generator. The following commands are all identical:
|mass
:hood|mass
:hood &helm-mass ~
When %hood
receives the poke, it emits a card to Dill with [%flog %heft ~]
task:
++ poke-mass
|= ~ =< abet
(emit %pass /heft %arvo %d %flog %heft ~)
:: (... some lines skipped)
%helm-mass =;(f (f !<(_+<.f vase)) poke-mass)
The %flog
task gets unwrapped, and on %heft
task Dill passes a %whey
note to Arvo:
++ call :: receive input
|= kyz=task
^+ +>
?+ -.kyz ~& [%strange-kiss -.kyz] +>
:: (... some lines skipped)
%heft (pass /whey %$ whey/~)
And on %whey
note Arvo passes an ovum [//arvo mass/whey]
to the runtime, where whey
is a nested data structure with annotated cores. In other words, it has a type mass
:
+$ mass $~ $+|+~
(pair cord (each * (list mass)))
This structure is similar to what you get in the first half of |mass
printout, before "total userspace: ...
" line, except it contains nouns instead of sizes in bytes. Those nouns happen to be cores, see ++whey
definition in arvo.hoon
.
The rest of |mass
process happens in Vere, in serf.c
process. First, when c3__mass
literal (equivalent to Hoon's %mass
term) is detected in the ovum, serf saves the whey
noun in sef_u->sac
(in _serf_sure_feck
), and then this noun gets measured in _serf_grab
. The measurement results are then simply printed to stderr
alongside with some runtime memory usage information ("total arvo/jet/noun/road stuff
" lines).
A generator is a Dojo utility that allows to run code against the user input. Generators are located in gen
directory in a given desk as .hoon
files. A "naked" generator is defined as a gate that takes user input as an argument and it has no knowledge of the current time, identity of the ship or entropy. A %say
generator is defined as a cell of %say
term and a gate which takes user input alongside with time, entropy and path to the generator's desk, which includes ship's @p
. Dojo provides all these arguments when the generator is called from the command line.
As you can see from the structure of a generator, it cannot return any information that lies outside of the gate's subject. Since the information about memory usage lies outside of a generator's subject, we would have to use something else for |mass
upgrade.
Right now memory report request is implemented as a poke to %hood
agent, and pokes do not return any nouns either: they are a one-way road to interact with an app.
An alternative would be to use a thread to pass a task to Arvo the way |mass
does it now, and then to receive a response from runtime as a gift
from a vane. Note that the choice of a vane does not matter here, they are simply used to handle I/O between a thread and Arvo.
As an illustration of what we want to build consider a -time
thread:
> -time ~s3
~s3..0082
To understand an important difference between generators and threads that return values let's remind ourselves what is Arvo. The formal interface of Arvo is defined in the bottom of arvo.hoon
file:
:: Arvo formal interface
::
:: this lifecycle wrapper makes the arvo door (multi-armed core)
:: look like a gate (function or single-armed core), to fit
:: urbit's formal lifecycle function (see aeon:eden:part).
:: a practical interpreter can and will ignore it.
::
|= [now=@da ovo=ovum]
^- *
.(+> +:(poke now ovo))
In other words, Arvo is a gate that takes an event and returns a new version of itself. Tail of (poke now ovo)
replaces Arvo's context, while the head is the list of effects. Simply put, Arvo is a function that:
(Arvo event) -> [Arvo' (list effect)]
The effects can be handled by the runtime: Vere can set up timers, send packets etc. Notice that in the formal defintion of Arvo it actually does not care about the effects at all: the interpreter could in theory ignore all the effects, the deterministic nature of Arvo would still be preserved.
Think about it in this way: suppose you tried to send a message from your Urbit to some other ship, but you never heard anything back. Did it happen because of some issues in the network between the ships, or did it happen because the interpreter ignored the effect "send this packet to ~sampel-palnet
"? From your ship's perspective those options are equivalent: both happened due to nondeterministic nature of the world that lies outside of Arvo.
When you run a generator, you have one event and one effect:
> +hello %world
'hello, world'
Event: type "+hello %world\n" -> Effect: print "'hello, world'"
This effect is deterministic, you will always get the same output for a given state of your Arvo.
With threads, you can have more complex chains of I/O:
> -time ~s3
~s3..0082
Event: type "-time ~s3\n" -> Effect: set timer for 3 seconds
(... 3 seconds later)
Event: timer elapsed after ~s3..0082 -> Effect: print "~s3..0082"
Here, the last effect is not strictly determined by the first event, since it relies on nondeterministic information from the runtime that sends the second event. Running this thread multiple times would give you different results even if Arvo state stays the same:
> -time ~s3
~s3..007b
> -time ~s3
~s3..00a6
> -time ~s3
~s3..008d
To upgrade |mass
I wrote a -mass
thread that works in a similar fashion to -time
: it sends a memory report request and waits for an answer, and then returns the answer to the thread caller.
In this section I will describe my thought process when developing -mass
on Arvo side in roughly the same order as I wrote the code.
For the body of the main thread I took inspiration from -time
thread and the gates that it calls to run other threads, namely ++send-wait
and ++take-wake
from /lib/strandio.hoon
:
:: /ted/mass.hoon
::
/- spider
/+ strandio
=, strand=strand:spider
^- thread:spider
|= arg=vase
=/ m (strand ,vase)
^- form:m
=+ !<(~ arg)
;< ~ bind:m send-mass-request:strandio
;< report=(unit) bind:m take-mass:strandio
(pure:m !>(report))
:: /lib/strandio.hoon
::
(...)
++ send-mass-request
=/ m (strand ,~)
^- form:m
=/ =card:agent:gall
[%pass /heft %arvo %k %heft ~]
(send-raw-card card)
::
++ take-mass :: WIP
=/ m (strand ,(unit))
^- form:m
|= tin=strand-input:strand
?+ in.tin ~&(in.tin `[%skip ~])
~ `[%wait ~]
[~ %sign * %khan %quac *]
`[%done p.sign-arvo.u.in.tin]
==
::
(...)
I will not dwell on threads too much, there is a nice guide on threads in Urbit docs. You can see that -mass
thread passes a %heft
request to Khan vane instead of Dill, and expects a %quac
gift from it. This is where I ended up moving mass logic because Khan code is much shorter and simpler. Instead of repeating the guide on threads in this post, I'd like to talk about micgal ;<
rune.
Both explanations in rune docs and threads guide were hard to grok out, so I came up with my own. It is inspired by this Computerphile video on monads, and the code example is from the video too.
Suppose you have a ++safe-div
gate that returns a result of integer division wrapped in unit
to handle division by zero without crashing:
++ safe-div
|= [a=@ud b=@ud]
^- (unit @ud)
?: =(b 0) ~
`(div a b)
Let's define a type expr
which will represent an expression that consists of a bunch of integer divisions. The type will be defined recursively:
+$ expr
$~ 0
$@ @ud
[p=expr q=expr]
So an instance of expr
is either an integer or a cell of two expressions p
and q
, that corresponds to integer division p // q
. Now let's write a gate that takes an expression and evaluates it, reducing it to a unit of integer.
A naive approach would probably look like this:
++ eval-naive
|= e=expr
^- (unit @ud)
?@ e `e
=/ a=(unit @ud) $(e p.e)
?~ a ~
=/ b=(unit @ud) $(e q.e)
?~ b ~
(safe-div u.a u.b)
Here for each argument of safe-div
we pin a unit to the subject and perform a typecheck manually, returning ~
if the unit happened to be empty for each argument. We can rewrite the gate, fusing together the logic of typecheck and ~
return with ++biff
binding gate.
What +biff
does is that it takes an argument (unit mold)
and a gate that takes mold
as an argument, and tries to apply the argument to the gate. If the unit is empty, then ~
is returned, otherwise is returns the output of the gate. If you take a bunch of gates that take some value and return a unit of the value, then you can chain them together with this high-order gate, also called monadic bind. Now we can rewrite the eval function:
++ eval-ugly
|= e=expr
=* this $
^- (unit @ud)
?@ e `e
%+ biff this(e p.e)
|= a=@
%+ biff this(e q.e)
|= b=@
(safe-div a b)
To shorten the code we can now introduce ;<
rune:
++ eval
|= e=expr
^- (unit @ud)
?@ e `e
;< a=@ _biff $(e p.e)
;< b=@ _biff $(e q.e)
(safe-div a b)
What ;<
does is:
- It treats the second child as a gate, slamming it with its first child, which is a mold. The product of that gate is another gate
bind
- It then slams
bind
gate with a cell of the third child and a gate whose sample is a bunt of the first child and whose body is the code in the fourth child.
So in our example the first ;<
first evaluates (_biff a=@)
which returns biff
, and then it applies biff
as a bind to a unit from $(e p.e)
and the gate |=(a=@ (... rest of the code))
. This behavior is equivalent to ++eval-ugly
example. The main difference is that the implicit gates built by ;<
are not exposed in the namespace, so there is no need to alias ++eval
with =*
like in ++eval-ugly
gate.
Since -mass
thread interacts with Khan vane for memory report retrieval, we now need to add new tasks and gifts to Khan, as well as define Khan logic for those tasks.
In Lull we add a new gift %quac
and two new tasks, %quac
and %heft
:
:: sys/lull.hoon
::
(...)
+$ gift :: out result <-$
$% [%arow p=(avow cage)] :: in-arvo result
[%avow p=(avow page)] :: external result
[%quac p=(unit)] :: memory report
==
+$ task :: in request ->$
$~ [%vega ~] ::
$% $>(%born vane-task) :: new unix process
(...)
[%quac p=(unit)] :: memory report
[%heft ~] :: report request
== ::
(...)
%heft
task is sent by the thread, and %quac
task is sent by the runtime with the memory report wrapped in unit
. When Khan receives %quac
task, it will forward it to the calling thread as a %quac
gift.
Now we need to update /vane/khan.hoon
. Remember that the memory report will be injected from the runtime as a new event, so we need to temporarily save the duct on which Khan gets %heft
task to route the memory report back to the caller. It is a good idea now to read move trace tutorial for a more extensive illustration of how effects and events get routed between different parts of the system.
Firstly, we will update the state of Khan vane to include the saved duct:
+$ khan-state ::
$: %1 :: state v1
hey=duct :: unix duct
tic=@ud :: tid counter
mass-duct=(unit duct) :: saved duct
== ::
In ++call
gate on %born
task, which Khan is supposed to receive when the runtime is launched, we check if mass-duct
is empty and return an empty memory report:
++ call
(...)
?+ -.task [~ khan-gate]
%born
?~ mass-duct
[~ khan-gate(hey hen, tic 0)]
:_ khan-gate(hey hen, tic 0, mass-duct ~)
:_ ~
[u.mass-duct %give %quac ~]
We do this in case runtime crashes during the memory report generation, so that the thread would return an empty report when pier is restarted.
In the same arm we define logic for %heft
and %quac
tasks. On the former we send a %whey
note to Arvo as usual, but also save the duct on which we heard the request:
++ heft
|= hen=duct
^- [(list move) _khan-gate]
:_ khan-gate(mass-duct `hen)
:_ ~
[hen %pass /whey %$ whey/~] :: $move with a %whey note to Arvo
And on the latter we forward the gift on the saved duct if it is present and delete the saved duct from the state, and do nothing if there is no duct saved in the state:
++ quac
|= git=gift
^- [(list move) _khan-gate]
?~ mass-duct `khan-gate
:_ khan-gate(mass-duct ~)
:_ ~
[u.mass-duct %give git] :: $move with a gift to the original caller of ++heft
All that is left is to update Khan types $sign
and $note
and ++load
arm to upgrade the old state to the new one.
Khan can now request %whey
from Arvo and return %quac
gift:
+$ note :: out request $->
$~ [%g %deal *sock *term *deal:gall] ::
$% $: %g :: to %gall
$>(%deal task:gall) :: full transmission
== ::
$: %k :: to self
$>(%fard task) :: internal thread
== ::
$: %$ :: to Arvo
$>(%whey waif) :: memory report
== ==
(...)
+$ sign
(...)
$>(?(%arow %avow %quac) gift) :: thread result
And the old state needs to be handled correctly:
+$ khan-states $%(khan-state-0 khan-state)
::
+$ khan-state-0 ::
$: %0 :: state v0
hey=duct :: unix duct
tic=@ud :: tid counter
==
::
++ state-0-to-1
|= old=khan-state-0
^- khan-state
[%1 hey tic ~]:old
:: +load: migrate an old state to a new khan version
::
++ load
|= old=khan-states
^+ khan-gate
?- -.old
%1 khan-gate(state old)
%0 $(old (state-0-to-1 old))
==
This was a complete description of changes in Arvo side. Now the %mass ovum needs to be handled in Vere.
This section is similar to -mass
development flow description section, but here we will cover Earth side of the problem. As a proof of concept, the noun that the runtime will send back to Arvo is going to be total sweep value, which is just an atom.
When |mass
is entered in Dojo, the memory report gets printed by _serf_grab
function in serf.c
. This function takes a u3_noun
which is generated by ++whey
arm in Arvo described above. I updated it by making it return u3_weak
, which is either u3_none
or a noun, and then have it return the atom of total sweep volume:
// vere/serf.c
//
static u3_weak
_serf_grab(u3_noun sac)
(...)
c3_w tot_w = 0;
(...)
tot_w += u3a_maid(fil_u, "total userspace", u3a_prof(fil_u, 0, sac));
tot_w += u3m_mark(fil_u);
tot_w += u3a_maid(fil_u, "space profile", u3a_mark_noun(sac));
(...)
return u3i_word(tot_w * 4);
_serf_grab
is called in u3_serf_post
function, which we also update accordingly to make it return out=(unit *)
:
// vere/serf.c
//
(...)
/* u3_serf_post(): update serf state post-writ.
*/
u3_weak
u3_serf_post(u3_serf* sef_u)
{
u3_noun out = u3_none;
(...)
if ( c3y == sef_u->mut_o ) {
u3_weak grab_mass = _serf_grab(sef_u->sac);
sef_u->sac = u3_nul;
sef_u->mut_o = c3n;
if (grab_mass != u3_none) {
out = u3nc(u3_nul, grab_mass);
}
}
(...)
return out;
}
The purpose of u3_serf_post
is to update the state of the serf after the event was processed and saved in the event log. This function is called in main.c
, and we will make some adjustments to make it send another plea to request another event that will contain the desired mass report:
// vere/main.c
//
/* _cw_serf_writ(): process a command from the king.
*/
static void
_cw_serf_writ(void* vod_p, c3_d len_d, c3_y* byt_y)
{
(...)
u3_weak serf_post_out = u3_serf_post(&u3V);
if (serf_post_out != u3_none) {
_cw_serf_send(u3nc(c3__quac, serf_post_out));
}
}
}
The plea that we send is a cell of %quac and the memory report noun, and it is going to be handled by _lord_on_plea
function in lord.c
:
case c3__quac: {
_lord_plea_mass(god_u, u3k(dat));
} break;
And the function that injects the event is _lord_plea_mass
. The card cad
is the task to Khan, which will receive the task on a wire wir=[/quac]
(that wire gets ignored anyway). Then we build the ovum: c3__k
denotes Khan vane. Next I am building a driver to send the %quac
task: this is suboptimal but I couldn't figure out the other way to do it and just copied the logic from some other piece of code, possibly Behn injecting "elapsed timer" events.
From ~master-morzod:
you code here is making a bespoke driver for every %mass message. instead, you should just call a function in an existing driver (i recommend term.c as a catchall)
Then the ovum as both a noun and a struct u3_ovum
are passed as an event. Arvo will send the task to Khan, which will forward the memory report to the calling thread.
/* _lord_plea_mass(): inject mass report
*/
static void
_lord_plea_mass(u3_lord* god_u, u3_noun dat)
{
u3_noun cad = u3nc(c3__quac, dat);
u3_noun wir = u3nc(c3__quac, u3_nul);
u3_ovum* egg_u = u3_ovum_init(0, c3__k, wir, cad);
u3_pier* pir_u = god_u->cb_u.ptr_v;
u3_auto* car_u = c3_calloc(sizeof(*car_u));
u3_noun ovo;
car_u->pir_u = pir_u;
car_u->nam_m = c3__quac;
u3_auto_plan(car_u, egg_u);
u3_assert( u3_auto_next(car_u, &ovo) == egg_u );
{
struct timeval tim_tv;
gettimeofday(&tim_tv, 0);
u3_lord_work(god_u, egg_u, u3nc(u3_time_in_tv(&tim_tv), ovo));
}
}
- Clone my versions of vere and urbit repos (mind the branches, modernize-mass and mass-thread respectively)
- Build vere binary
- Boot a fakezod:
./vere/bazel-bin/pkg/vere/urbit -F zod -B urbit/bin/brass.pill -A urbit/pkg/arvo
- Run
-mass
in dojo, After the usual printfs, you should get the noun printed to Dojo:
(...)
total marked: MB/111.438.160
free lists: MB/1.909.560
sweep: MB/111.438.160
[~ 111.438.160]
The last line is the returned mass report. You can now write it to disk:
> .mass/report -mass
Or send it over the wire to ~zod via |hi
ping :)
|hi ~zod [_|=(a=(unit) (scow %ud (@ (need a)))) -mass]
Working on this project gave me a lot of insight on the internals of Arvo and Vere and on the interaction between the two. I hope you enjoyed this report and learned something new too.
To finish the project, one would most likely have to perform the following steps:
- Rewrite
_lord_plea_mass
to callu3_auto_plan
from a function with an existing driver instead of making a bespoke driver for each%quac
task - Make a
u3_quac
structure with the same shape as$mass
noun and populate it in_serf_grab
. Convert the struct to a noun after the sweep and return it instead of a single atom. Then return(unit quac)
to the thread caller:
+$ quac
$~ $+|+~
(pair cord (each @ud (list quac)))
OK, so as I understand it the main task remaining is to refactor the Dojo side to return all of the memory weights as a single noun?