hmm.md

Hey, thanks for the feedback ! yes it is, especially because I'm not trying to solve a concrete problem, but just to find new ideas. But writing the thesis (or at least starting) is even more challenging I found.

Regarding changing the f-p model, I don't think we should. if we manage to have something "pluggable enough", that doesn't force the user to use a specific library, I'd say it's better. About the highlight point with "No !" attached to it: I agree that sharing a root node is useful, no question to that. My point was about sharing the lineage after the root node; apply/flatMap done to it. And more precisely, the case where we don't materialize it, where one node starts applying spore to it, then transfer the lineage without seeing the result first to an other node. That case (and this one only) has little interest in my opinion. If we allow (in our scenario that is) one node to see the result of its lineage before sending it, then I can already think of application such as sharing work ("I've managed to extract interesting features from the data, you can use the resulting Silo for your algorithm"). Regarding the git idea, and having a copy of the silo locally. If you simply duplicate both, you have to keep track of the correspondence between the two lineages. My idea was more to have the same lineage, replicated locally (assuming pure function), and be able to reason about it with other nodes. This means that if you receive a lineage from an other node, you could compare it to your own, and have some sort of diff between the two copies: not of the data, but of the operations.

(we're moving into far fetch territory, not sure it makes sense at all or if I'm just fantasizing) And essentially, you just have to attach an (user defined) merge algorithm between lineages. For Git, this is a manual one: you have to define what the merge of two branches is. If you think about the blockchain in bitcoin, the algorithm is simply "take the longest one; if you have a tie, use the first one you received, keep the other one in memory in case it's moving faster". For CRDTs, you have the commutative property that allows you to merge two histories in different order and have the same result.

(going even further in the maybe stupid ideas) I also thought of the possibility of using named spores. The idea is to transform a spore

val lInt = 5
         spore[Int, Int] {
           val capturedInt: Int = lInt
           i => i * capturedInt
         }

into

         case class GeneratedName(capturedInt: Int) extends SporeWithEnv[Int, Int] {
           type Captured = Int
           val captured: Int = capturedInt
           def apply(i: Int): Int = i * capturedInt
         }

         GeneratedName(lInt)

This way, you can defined some kind of equality between lineages. You have an equality between spore if the case class are equal: same apply method (again, pure function in mind), same captured variables. (This is inspired by the blockchain) Now, if we take any part of the lineage, we either have a root node (let's derive a hash of its content), or an apply (I let flatMap on the side right now, I haven't thought of it yet) that has a named spore, and a previous silo. From a named spore & and previous silo hash, you can derive a new hash to uniquely identify the silo. This allows you to have a deterministic way of identifying silos; and if you send a lineage to a node who already has a part of the lineage, you now know how to replay the lineage to obtain both the same data and the same (identifiable) lineage. This would also allow for nodes to create the same lineage (up to some point) without communicating, and for the server (the one with the "original" silo content) to re-use previously computed data. I think this is close to a command log, like the one you can have in some database systems (instead of the WAL), but more generalized, and with which you could more easily pinpoint where it diverged. I think you might be able to implement a distributed DB, using consensus to agree on a combined lineage (lineage representing the command log, and the data inside the silos the state of the db).

heathermiller/hmm.md