Skip to content

Instantly share code, notes, and snippets.

@LaurentMT
Last active April 29, 2024 07:45
Show Gist options
  • Save LaurentMT/e758767ca4038ac40aaf to your computer and use it in GitHub Desktop.
Save LaurentMT/e758767ca4038ac40aaf to your computer and use it in GitHub Desktop.
Bitcoin Transactions & Privacy (part 1)
This document is an attempt to define metrics quantifying the degree of privacy provided by a bitcoin transaction.
Objectives
Definition of metrics measuring the resistance of a transaction to a set of attacks against users privacy.
Attacks considered in the scope of these metrics are:
- Merged Inputs Heuristic: methods identifying the inputs controlled by a same entity
- Coinjoin Sudoku: methods identifying the links existing between the inputs and outputs of a transaction
These metrics can be applied to all bitcoin transactions but are specifically useful for qualifying the "quality" of joint transactions (CoinJoin, SharedCoin, ...).
Entropy of a Transaction
The idea of this metric comes from a discussion with Greg Maxwell (1).
He was suggesting a notion of "Coinjoin Entropy" measuring how many possible mappings of inputs to outputs are possible given the values of the txos.
Considering that the number of combinations may be very high (b/o the combinatorial explosion created by coinjoin transactions) it's wise to define the notion of entropy:
E = log2(N)
with:
E = entropy of the transaction
N = number of combinations (mappings of inputs to outputs)
Basically, it's the Shannon Entropy with the hypothesis that all events have the same probability (i.e. no additional information giving a "preference" to specific combinations).
Examples
Example 1: Basic Payment Transaction
* https://oxt.me/transaction/dcba20fdfe34fe240fa6eacccfb2e58468ba2feafcfff99706145800d09a09a6
* N = 1
* E = 0
* The unique interpretation is: [(I1,I2,I3,I4,I5,I6)=>(O1,O2)]
Example 2: Ambiguous Transaction
* https://oxt.me/transaction/8c5feb901f3983b0f28d996f9606d895d75136dbe8d77ed1d6c7340a403a73bf
* N = 2
* E = 1
* Interpretations are:
- [(I1)=>(O2), (I2)=>(O1)]
- [(I1,I2)=>(O1,O2)]
Example 3: Basic Coinjoin Transaction (DarkWallet)
* https://oxt.me/transaction/8e56317360a548e8ef28ec475878ef70d1371bee3526c017ac22ad61ae5740b8
* N = 3
* E = 1.585
* Interpretations are:
- [(I1)=>(O1,O2), (I2)=>(O3,O4)]
- [(I1)=>(O3,O2), (I2)=>(O1,O4)]
- [(I1,I2)=>(O1,O2,O3,O4)]
* Note:
O2 and O4 are the change outputs of the coinjoin transaction.
For all combinations we have I1=>O2 and I2=>O4.
We say that I1 is deterministically linked to O2 and I2 is deterministically linked to O4.
Observations
- Computation of the metric becomes quickly expensive when privacy tools like coinjoin are used to build the transaction (no surprise here).
- It's possible to decrease the computational cost thanks to side-channel information allowing to reduce the size of the problem space (e.g. inputs controlled by a same entity can be merged in a single input).
For example, these information can be provided by any heuristic allowing to cluster addresses (merged inputs, change address, ...).
- From this notion of Entropy, we can derive several metrics qualifying a transaction:
- Intrinsic Entropy: it's the value computed without any additional information, when the transaction is considered separately from the blockchain.
- Actual Entropy: it's the value taking into account additional information. On my hand I've worked on this one because I think it's the one which matters the most to users.
- Max Entropy: it's the value associated to a perfect coinjoin transaction having a structure similar or close to the evaluated transaction.
A perfect coinjoin is a coinjoin with all inputs having the same amount and all outputs having the same amount.
Here, we call structure the tuple (#inputs, #outputs)
Per se, Max Entropy isn't very interesting but it becomes much more useful when used to compute the following ratios:
Wallet efficiency = Intrinsic Entropy - Max Entropy (expressed in bits)
This metrics may be a proxy for qualifying the efficiency of a wallet when it builds a coinjoin transaction.
Wallet efficiency can also be expressed as a ratio: IntrinsicNumberOfCombinations(tx) / NumberOfCombinations(closest perfect coinjoin).
Blockchain efficiency = Actual Entropy - Max Entropy (expressed in bits)
This metrics may be a proxy for qualifying the efficiency of the whole ledger/ecosystem in term of protection of users privacy.
Blockchain efficiency can also be expressed as a ratio: ActualNumberOfCombinations(tx) / NumberOfCombinations(closest perfect coinjoin)
- Rule: Actual entropy of a bitcoin transaction is a dynamic value susceptible to decline over time. At best, it will stay constant. It will never increase.
Limitations
- This metric is susceptible to be tricked by specific patterns of transactions like steganographic transactions.
A steganographic transaction aims to hide the real payment inside a "fake" payment.
Its fingerprint is similar to a classic payment but it differs from a classic payment by involving the payee in the payment process.
Example: UserA must pay 1.5btc to UserB. UserB (the payee) collaborates to the process by providing an input.
UserA 2btc -- -- 0.5btc UserA (change)
\ T1 /
|-------|
/ \
UserB 1btc -- -- 2.5btc UserB
This transaction is indistinguishable from a classic payment (User1 provides 2 inputs. 1 output goes to the payee, 1 output is change).
According to our previous definition, we have E(T1)=0 instead of E(T1)=1 (i.e. classic payment + steganograhic transaction)
- This metric is a first good proxy for measuring privacy at transaction level. A transaction with a high entropy is likely to provide better privacy to the users.
But, as illustrated by the "Coinjoin Sudoku" attack (2), this metric fails to detect privacy leaks occuring at the lower level of specific inputs/outputs.
Therefore, this metric shall be completed by additional metrics (see part 2)
Implementation
I've developed a python implementation of an algorithm computing this metric.
This implementation surely deserves the title of "worst possible solution": brute force algorithm, no parallelization, no optimization of the algorithm (memoization), ...
The unique optimization used by this implementation is a reduction of the problem space thanks to information provided by external heuristics.
As a direct consequence of this brute force approach, processing is limited to the most simple transactions (with #inputs <= 10 & #outputs <= 10).
Anyway, I was confident this implementation might crunch a good portion of the bitcoin ledger (60-70% of the transactions) so I've decided to keep it for a first round of tests.
Tests & Results
- Processing of transactions from block 1 to block 388602 (December 2015)
- #Txs (Total) = 97,977,912 (100,00%)
- #Txs with entropy computed = 96,597,663 ( 98,59%)
- #Txs not processed = 1,380,249 ( 1,41%)
TBH, I wasn't expecting such a large coverage ratio for this first round. I'm not sure it's great news for privacy.
Analysis of the unprocessed transactions
It's almost certain that a subset of these transactions is composed of true Coinjoin/SharedCoin transactions.
But it's also likely that another part is composed of large transactions with low entropy (classic payments with a large number of inputs and 2 outputs, ...).
A better/smarter implementation may allow the refinement of these results. To be done...
Analysis of the transactions processed
Cumulative distribution of txs per #combinations (loglog scale): http://i.imgur.com/jms8tpi.png
This distribution measures the probability of a transaction with #combinations >= value
Cumulative distribution of txs per entropy (linlog scale): http://i.imgur.com/NlLBL5W.png
This distribution measures the probability of a transaction with entropy >= value
Around 85.47% of the transactions processed have a null entropy (i.e. they have inputs & outputs deterministically linked)
Around 14.52% of the transactions processed have E(tx) >= 1 (i.e. they're as good or better than "ambiguous" transactions)
Around 1.89% of the transactions processed have E(tx) >= 1.585 (i.e. they're as good or better than the most basic coinjoin transactions)
In best case scenario, we may have around 3.3% of the transactions with an entropy greater or equal to the entropy of a basic coinjoin but it's likely that the real value is somewhere between 2% and 3%.
Note that it doesn't imply all these transactions have a coinjoin structure. They just produce higher entropies like coinjoin transactions.
References
(1): G.Maxwell - CoinJoin: Bitcoin privacy for the real world (https://bitcointalk.org/index.php?topic=279249.msg7117154#msg7117154)
(2): K.Atlas - Coinjoin Sudoku (http://www.coinjoinsudoku.com/)
@AdamISZ
Copy link

AdamISZ commented Mar 1, 2019

@LaurentMT

I have several questions about this methodology (I know this is very old, and I did briefly look at it earlier, but now someone is proposing to add this to JoinMarket I want to understand better).

(1) What is the definition of Actual Entropy? It seems you use E = log_2(N) for this, but I'm wondering about your choice of N based your examples - see the below points under (3).
(2) Even for intrinsic entropy, I have questions. What precisely does it mean for an input and an output to be linked? Common and unilateral ownership of both utxos? Or can it also be that there is a payment, so that even though there are different owners, it is still considered linked if there is a direct payment from one to the other?
(3) Why does example 1 have only 1 interpretation? What is wrong with the interpretation of all inputs and outputs owned by 1 party? What about a payjoin style interpretation (in other words, coinjoin and payment)? What about a dual payment with no change?
(4) For the ambiguous case, I take it you're aware that elsewhere I've asserted there are at least 8 different interpretations (and I think more recently belcher wrote at least 9, yes found it here: https://en.bitcoin.it/wiki/Privacy#Multiple_interpretations_of_a_blockchain_transaction ) for exactly this structure ((X, Y in, X, Y out)). However:
Researching more carefully, I first realised that there have to be 15 interpretations just purely on a relational basis, using "A, B, C, D.." to refer to distinct entities, the mappings can be: 1(AA:AA), 2(AA:AB), 3(AA:BA), 4(AA:BC), 5(AA:BB) .. 6(AB:AA), 7(AB:BB), 8(AB:AB), 9(AB:BA), 10(AB:BC), 11(AB:CB), 12(AB:AC), 13(AB:CA), 14(AB:CD), 15(AB:CC). Note here for example: look at case 4, with outputs BC, i don't also show CB as an alternative, because since B and C are not part of input set, those two cases are the same. And, note e.g. cases 8 and 9 I consider distinct because output amounts can be different, so AB is not same as BA.
After more experimentation, I realised that this is actually the same as the Bell Numbers [1], which basically enumerate distinct partitions of a set. When there are 5 "positions" (for example, 3 inputs and 2 outputs) the number is 52. Generally, this scales roughly exponentially (according to Wikipedia it's asymptotically something like n^n/log(n); which i guess implies entropy varying a bit superlinearly in n, like ~ nlogn).

While your analysis of coinjoin is sound from the point of view of assuming it's a coinjoin without payment cross-entity (that's always how I've looked at it, too), i.e. you use subset-sum or "coinjoin sudoku" to list possibilities, it's also true that the above kind of "full listing", with the 6th Bell number, is the more full case right (so that'd be 203 possibilities).

It occurs to me that not only payjoin, fake coinjoin and to-self payments, and batched payments may lead to non-standard linkage possibilities, but also coinswap and similar.

Well tbh these are just introductory thoughts before figuring out something more concrete, but I'm particularly interested to know your answer to (1) above.

[1] https://en.wikipedia.org/wiki/Bell_number

@LaurentMT
Copy link
Author

LaurentMT commented Mar 2, 2019

Q1 and Q2:
Here are some elements about the terms used in the gist.

  • Entropy of a transaction: Basically it's just the number of interpretations under another form.
  • Intrinsic Entropy: Computation of the entropy based on the structure of the transaction alone (amounts of inputs and outputs)
  • Actual Entropy: Computation of the entropy taking into account additional information (like the fact that 2 inputs are controlled by a same participant).
    Note that these additionnal information can be the result of the processing of another transaction or from some human intelligence.
  • Link between an input and an output: captures the "concept" of an intentional financial flow between the input and the output (taking into account the amount transferred to the output) but Boltzmann doesn't care if a same Entity controls a specific input and a specific output. As you noted, it makes more sense in the context of coinjoin transactions aggregating several independent financial flows (see Bitfury paper "Untangling bitcoin transactions"). This definition excludes the possibility to deal properly with joint payments => Boltzmann is unable to deal with Payjoin/P2EP transactions (IMHO, no algorithm considering a tx in isolation can deal with this kind of transactions). Also, note that Joinmarket transactions with fees paid by the takers to the makers create a similar problem. I dealed with it in Boltzmann by introducing a kind of tolerance threshold for these fees. It somewhat helps to avoid overclustering for joinmarket txs but Joinmarlet clearly introduces an additionnal difficulty here (good job!).

Q3:
Boltzmann considers that there's an asymmetry between the inputs and the outputs of a transaction. While its possible to use the knowledge of several inputs controlled by the same entity for the computations (see actual entropy), doing the same for the outputs is a risky game leading to erroneous results.
WRT to joint payments and Payjoin/P2EP, see my previous answer. Actually, as soon as you introduce the idea of Payjoin/P2EP txs, the number of possible interpretations suddenly explodes because of all the different potential combinations of amounts transferred to the joint output).

Q4:
Right. "Your" number of interpretations is higher because your model introduces the concept of who controls what. Boltzmann model only cares about the links between the inputs and the outputs. In my opinion, both are valid models. Actually, I've used your model while describing different interpretations for some txs generated by Samourai Wallet (see https://twitter.com/LaurentMT/status/1081624645764288512)
IMO, Boltzmann models is more restrictive in the sense that it doesn't care if a same entity controls a specific input and a specific output. To say it differently, Boltzmann doesn't care about plausible deniability.

WRT Bell numbers, there are definitely a useful tool. I would also recommend to check Exponential Bell Polynomials which is some mind blowing stuff (https://en.wikipedia.org/wiki/Bell_polynomials#Combinatorial_meaning) and can be useful for specific cases. I've used them in Bolztmann to speed up the computation of coinjoins with denominated amounts (see https://github.com/Samourai-Wallet/boltzmann/blob/master/boltzmann/utils/tx_processor.py#L264)

You're absolutely right that the main goal of Boltzmann is to deal with ambiguous and coinjoin-like transactions. My first motivation for Boltzmann was to avoid overclustering on OXT.

My very personal conclusion about this subject of "privacy metrics":
I don't think that there's such a thing as a metric able to measure "Privacy". Trying to define an ideal "Privacy metrics" is a fool's errand. BUT (and it's a very important "but") it's absolutely possible to define "privacy metrics" which are useful for specific objectives, especially if you use them as negative indicators (i.e.: a high score doesn't provide any certainty about your privacy but a low score might be the sign of potential privacy issues).

For instance, Boltzmann served me well in order to avoid overclustering on OXT (with the consequence that OXT "suffers" from underclustering when compared to others analytics platforms but I'm fine with this tradeoff because I think it's easier to deal with it).

Its recent introduction into Samourai Wallet (with a visual indicator of deterministic links) follows the same principle. It's used as a way to build awareness about the issue of deterministic links but it follows the principle that it should be used as a negative indicator (a high entropy doesn't give you any guaranteee but a low entropy isn't a good sign for your privacy).

IMHO, the hardest part is all about the "education" of users. Almost all users expect tools providing a metrics telling them that their privacy is safe. The challenge for all of us is to change this mindset.

@AdamISZ
Copy link

AdamISZ commented Mar 3, 2019

First, thanks for the detailed responses. I understand what you're doing better, although I don't think I fully understand why you're doing it; a couple more comments/questions below:

Q1 and Q2:
Here are some elements about the terms used in the gist.

* Entropy of a transaction: Basically it's just the number of interpretations under another form.

* Intrinsic Entropy: Computation of the entropy based on the structure of the transaction alone (amounts of inputs and outputs)

I'm still finding this slightly ambiguous. When you say "amounts of inputs and outputs" I think you mean btc amounts of inputs and outputs, but "amount" could mean number.

Moreover, I understand that this ("Intrinsic Entropy") really just means "this transaction taken in isolation, not considering inferred information from elsewhere, such as who owns which input/output", but I still find that that your base definition, of "entropy is number of interpretations" is ambiguous, to me, because in your given 3 examples, I don't understand why you came up with N=1,2,3 in examples 1,2,3 respectively. Or more precisely: I largely understand the reasoning, but it seems arbitrary, since other interpretations are possible. Well, this probably folds into the below comment anyway:

* Actual Entropy: Computation of the entropy taking into account additional information (like the fact that 2 inputs are controlled by a same participant).
  Note that these additionnal information can be the result of the processing of another transaction or from some human intelligence.

* Link between an input and an output: captures the "concept" of an intentional financial flow between the input and the output (taking into account the amount transferred to the output).
  As you noted, it makes more sense in the context of coinjoin transactions aggregating several independant financial flows (see Bitfury paper "Untangling bitcoin transactions").
  This definition excludes the possibility to deal properly with joint payments => Boltzmann is unable to deal with Payjoin/P2EP transactions (IMHO, no algorithm considering a tx in isolation can deal with this kind of transactions).
  Also, note that Joinmarket transactions with fees paid by the takers to the makers create a similar problem.
  I dealed with it in Boltzmann by introducing a kind of tolerance threshold for these fees. It somewhat helps to avoid overclustering for joinmarket txs but it clearly introduces an additionnal difficulty (good job!).

Q3:
Boltzmann considers that there's an asymmetry between the inputs and the outputs of a transaction.
While its possible to use the knowledge of several inputs controlled by the same entity for the computations (see actual entropy), doing the same for the outputs is a risky game leading to erroneous results.

Right, so setting aside payjoin for a moment, what I find confusing about example 1 is that you assert only 1 interpretation, which you give as (all inputs) -> (o1, o2), but isn't it the case that the situation where o1 is change and o2 is payment is different from vice versa? Similarly, the case where o1 and o2 represent different recipients is different too. They represent a different financial flows; in one case the payer paid 0.01, in the second 300, in the third 300.01, to some external entit(ies).

If I'm reading you correctly, it seems like you were quite focused on coinjoin, i.e. you wanted specifically to distinguish cases where more than one entity is directing funds from A to B inside one tx. I guess I can imagine, that you find this useful, since you can combine this with techniques for tracing (e.g. change heuristics) and then apply the entropy concept in a "transitive" or compounded way across multiple transactions. So it will usually be a value of 0, unless you have some coinjoin behaviour (and as you note below, there are other transaction patterns it will just totally fail to see).

WRT to joint payments and Payjoin/P2EP, see previous answer.
Actually, as soon as you introduce the idea of Payjoin/P2EP txs, the number of possible interpretations suddenly explodes because of all the different potential combinations of amounts transferred to the joint output).

Q4:
Right. "Your" number of interpretations is higher because your model introduces the concept of who controls what. Boltzmann model only cares about the links between the inputs and the outputs. In my opinion, both are valid models. Actually, I've used your model while describing different interpretations for some txs generated by Samourai Wallet (see https://twitter.com/LaurentMT/status/1081624645764288512)
IMO, Boltzmann models is more restrictive in the sense that it doesn't care if a same entity controls a specific input and a specific output. To say it differently, Boltzmann doesn't care about plausible deniability.

WRT Bell numbers, there are definitely a useful tool. I would also recommend to check Exponential Bell Polynomials which is some mind blowing stuff (https://en.wikipedia.org/wiki/Bell_polynomials#Combinatorial_meaning) and can ve useful for some specific cases.
I've used them in Bolztmann to speed up the computation of coinjoins with denominated amounts (see https://github.com/Samourai-Wallet/boltzmann/blob/master/boltzmann/utils/tx_processor.py#L264)

You're absolutely right that the main goal of Boltzmann is to deal with ambiguous and coinjoin-like transactions. My first motivation for Boltzmann was to avoid overclustering on OXT.

My very personal conclusion about this subject of "privacy metrics":
I don't think that there's such a thing as a metric able to measure "Privacy". Trying to define an ideal "Privacy metrics" is a fool's errand. BUT (and it's a very important "but") it's absolutely possible to define "privacy metrics" which are useful for specific objectives, especially if you use them as negative indicators (i.e.: a high score doesn't provide any certainty about your privacy but a low score might be the sign of potential privacy issues).

For instance, Boltzmann served me well in order to avoid overclustering on OXT (with the consequence that OXT "suffers" from underclustering when compared to others analytics platforms but I'm fine with this tradeoff because I think it's easier to deal with it).

Its recent introduction into Samourai Wallet (with a visual indicator of deterministic links) follows the same principle. It's used as a way to build awareness about the issue of deterministic links but it follows the principle that it should be used as a negative indicator (a high entropy doesn't give you any guaranteee but a low entropy isn't a good sign for your privacy).

IMHO, the hardest part is all about the "education" of users. Almost all users expect tools providing a metrics telling them that their privacy is safe. The challenge for all of us is to change this mindset.

Thanks for those comments, helpful also, I have no response on them right now. As above ... I think I get it now (you are basically counting subset sum possibilities based on btc amounts), but I'm bothered that that isn't a real measure of how ambiguous/ "entropic" a transaction is.

@LaurentMT
Copy link
Author

LaurentMT commented Mar 3, 2019

I'm still finding this slightly ambiguous. When you say "amounts of inputs and outputs" I think you mean btc amounts of inputs and outputs,

Correct.

Moreover, I understand that this ("Intrinsic Entropy") really just means "this transaction taken in isolation, not considering inferred information from elsewhere, such as who owns which input/output",

Correct

I don't understand why you came up with N=1,2,3 in examples 1,2,3 respectively. Or more precisely: I largely understand the reasoning, but it seems arbitrary, since other interpretations are possible.

I think the ambiguity comes from the fact that we use the term "interpretations" for the description of 2 very different things. On its side, Boltzmann is focused on the question of "deterministic links" while the model you're discussing seems focused on the question "who controls what?".

In Boltzmann, an "interpretation" is a specific configuration of the links between the inputs and the outputs of the transaction. In the case of a coinjoin transaction, Boltzmann tells us that there are N such configurations and for each configuration, it tells us the links between the inputs and outputs. That gives us a "link probability matrix" (as in "there are X configurations for which input1 in linked to output2") which can be expressed as a probability of an existing link between an input and an output (100% = deterministic link).

Actually, the approach used for Boltzmann is really similar to the concept of Entropy used in thermodynamics with a Boltzmann configuration being the equivalent of a microstate as defined in thermodynamics. From this, we derive the Entropy score by using Gibbs Entropy formula, with the assumption that the probability of each configuration is the same (i.e. we don't have additionnal information allowing to ponder a given microstate with a higher probability).

Boltzmann doesn't try to evaluate if an input and an output are controlled by a same entity. The point isn't that this information isn't useful, it's just that Boltzmann wasn't designed for that goal. It's a matter of tradeoff. As a consequence, Boltzmann is "useless" for Payjoin/P2EP transactions (and almost all transactions with payments made between the senders). So yes, we might say that this is an arbitrary choice but as I wrote in my previous conclusion, I truly believe that any "privacy metric" is limited in scope. There's no such thing as a perfect or a better "privacy metric". IMHO, the best we can expect is multiple metrics evaluating specific aspects and/or used to improve specific aspects during coin selection.

but I'm bothered that that isn't a real measure of how ambiguous/ "entropic" a transaction is.

If my understanding is correct, in the model that you're describing, the term "interpretation" has a very different meaning. In this model, an interpretation is a specific configuration of "which entity controls which inputs/outputs".

Concerning the use of Bell numbers, it seems to make sense in the context of your goal for this metrics (if my understanding is correct).

Just one question: according to your metrics, what would be the number of interpretations for this transaction?

  1BTC             49.5BTC
            =>
 49BTC              0.5BTC

A few additional shower thoughts (in no specific order):

  • you model seems like an interesting approach which is complementary to Boltzmann. If we envision the bitcoin txs graph as a graph of interlinked utxos, your model has a focus on the ambiguity introduced by the nodes of the graph ("who controls which utxos?") while Boltzmann has a focus on the ambiguity introduced by the edges of the graph ("probability of a link between 2 utxos")
  • you could apply the same logic as Boltzmann: for the derivation of a probability that a set of inputs and outputs are controlled by a same entity and for the derivation of an "entropy score" at the level of the transaction,
  • IMHO, the most inportant questions that we need to answer when designing these kind of privacy metrics are:
    • Scope: what are the scope and limitations of the model?
    • Utility: beyond the theoretical aspects, how can we use the metric for the detection of privacy issues or for an improvement of coins selection algorithm?
    • Scalability: how expensive is the computation of the metrics? For instance, Boltzmann doesn't scale well. Actual implementation might be improved but it doesn't change the fact that at its core, Boltzmann is trying to solve an enumeration problem which is known as having an "exponential" difficulty. That puts obvious limitations on the transactions that Boltzmann can process (note that this limitation is sometimes mitigated by address reuse which "helps" to weaken the complexity of past transactions).
    • Composability/Transitivity: is it easy to compose the results computed at the level of a single transaction for a faster computation at the level of the transaction graph? Boltzmann doesn't seem to perform very well on this axis. So far I've been unable to find a simple formula allowing to compose the results found for 2 subsequent txs (but I'm not a mathematician).
    • Threat Model for the metric: how can the metric be defeated (e.g.: what happens if the adversary uses addresses clustering algorithms or if he knows which wallet software you're using?) and how to deal with that (e.g.: concept of "Intrinsic" and "Actual" scores) ?
    • Education: Obviously, we don't know what we don't know about the knowledge of the adversary. Thus, how do we educate users about the fact that these metrics should be used as negative indicators and that there's no such thing as a perfect metric measuring their privacy?

EDIT: Added a note about the scalability and composability of metrics.

@AdamISZ
Copy link

AdamISZ commented Mar 4, 2019

If we envision the bitcoin txs graph as a graph of interlinked utxos, your model has a focus on the ambiguity introduced by the nodes of the graph ("who controls which utxos?") while Boltzmann has a focus on the ambiguity introduced by the edges of the graph ("probability of a link between 2 utxos")

This is a nice simple way to state it. "Link entropy" vs "Node entropy".

Actually, the approach used for Boltzmann is really similar to the concept of Entropy used in thermodynamics with a Boltzmann configuration being the equivalent of a microstate as defined in thermodynamics. From this, we derive the Entropy score by using Gibbs Entropy formula, with the assumption that the probability of each configuration is the same (i.e. we don't have additionnal information allowing to ponder a given microstate with a higher probability).

Sure, but I don't think this changes between the two models. What probably does change: with "node entropy" the probabilities of individual "microstates" are in practice wildly different, whereas in a coinjoin scenario (choosing between multiple subset sum linkages) that isn't the case, so the assumption of "all states equiprobable", while the simplest, would be a bit shitty for "node entropy".

Just one question: according to your metrics, what would be the number of interpretations for this transaction?

What I described above would imply 15 possibilities for any 2-in 2-out, no matter the btc amounts (but of course I haven't proposed anything sophisticated like reasons to discount specific possibilities, etc.).

Re: the various metrics, sure, that looks reasonable, but let me state my principal concern/issue with the "Link Entropy" approach:

Consider the case of a simple tree of transactions, each of which is a payment from one of the two outputs of its parent. So, every transaction is a bog-standard bitcoin payment. Since all of them would fit Example 1 here, all would have zero entropy. This means that a normal bitcoin transaction graph has zero entropy.

I understand your reasoning - that there is no ambiguity in linkages, and it's assumed that each transaction has a deterministic flow, but I fear that this is massively misleading to consumers of this metric.

Since there is, in fact, massive ambiguity in the actual flow of funds. Change heuristics are usually dubious, and at best they're still probabilistic. So even in the absolutely least private case (well - I ignore address reuse), it concerns me that an entropy measure of 0 will be misused and will help parties who want to sell snake-oil "tracing" to people.

Thanks again, this is an interesting topic and a lot clearer to me now.

@LaurentMT
Copy link
Author

LaurentMT commented Mar 4, 2019

Sure, but I don't think this changes between the two models.

Agreed. It's just a matter of variables selected for the definition of a "microstate".

What probably does change: with "node entropy" the probabilities of individual "microstates" are in practice wildly different, whereas in a coinjoin scenario (choosing between multiple subset sum linkages) that isn't the case, so the assumption of "all states equiprobable", while the simplest, would be a bit shitty for "node entropy".

I tend to agree and this is also why I believe that the distinction between intrinsic and actual entropies is important. The intrinsic version is a kind of ideal case assuming the equiprobability. The computation of actual entropy is equivalent to removing the equiprobability hypothesis. A corollary is that in Bitcoin, the intrinsic entropy is the max possible value for the entropy score. In best case scenario, this value will stay the same as time passes but it will often decrease if new information are collected, allowing to associate a null probability to some microstates.

I understand your reasoning - that there is no ambiguity in linkages, and it's assumed that each transaction has a deterministic flow, but I fear that this is massively misleading to consumers of this metric.

IMO, it depends on the purpose of the tool. As you noted, if you're an analytics company it may be misleading but if the goal is to alert users about potential privacy issues, displaying the "worst case scenario" seems to me like a good conservative approach.

@nopara73
Copy link

nopara73 commented Mar 12, 2020

On Entropy

Basing it on Shannon entropy is a creative intuition, but there is no need to resort to any intuition at all, as in 2017, Maurer et al. in the paper titled Anonymous CoinJoin Transactions with Arbitrary Values worked out the mathematical basis of it.

Here the entropy depends on solely the number of valid non-complete sub-mappings (N), but notice what you want to examine is the strength of input - input, input-output and output-output relationships. With these, weighted by the values you can come up with a measure for entropy that does not rely on intuition. Also the work on top of this entropy can be mostly ported to the new entropy metric, so it's not like everything is lost by changing the intuition axiom with a mathematical axiom.

You may be wondering how the strength of the links of input-input, input-output and output-output are determined: take all the valid sub-mappings and count them. This would be with your notation N + 1 (as @AdamISZ noted the possibility of the tx not being a CJ shall also not be ruled out, at least not by default.) Let's call it |M|, which is = N + 1. Next select a link and count how many valid sub-mappings the link is in the same partition, and let's call this number |L|. Finally you can get the probability to the link by dividing them: |L|/|M|. Then you get all the links and from there on getting your axiomatic entropy metric is obvious.

Feasibility Concern

Unless you optimized something since to estimate N, it is not computationally more expensive as you are relying on identifying the sub-mappings themselves, just like Maurer et al.

Math Heavy Explanation

image

User Friendly Exmplanation

You can find a more user friendly introduction and some suggestions on how to build other metrics out of it here.

Example

Take tx1 1, 1, 100 -> 1, 1, 100 and tx2 100, 100 -> 100, 100. Shannon entropy gives the same result for both transactions, while an entropy built on top of the Knapsack paper has more accurate score, because in Knapsack you must factors in that the 100 -> 100 partition of tx1 can be found in every valid sub-mapping, while in Shannon you're only concerned with the number of sub-mappings.

@LaurentMT
Copy link
Author

as in 2017, Maurer et al. in the paper titled Anonymous CoinJoin Transactions with Arbitrary Values worked out the mathematical basis of it.

It's been a while since I've read this paper but my souvenir it that it was indeed an interesting paper. For at least 2 reasons:

  • The approach is very similar to the one used by Boltzmann and the paper provides a description far better than these gists (academic quality),
  • Their model extends the analysis made by Boltzmann (links between inputs and outputs) to links between inputs and links between outputs.

Basing it on Shannon entropy is a creative intuition

Actually, the use of the concept of Entropy wasn't based on an intuition but on the certainty that it was the right tool for the task that motivated the design of the algo.

TL/DR:

Sometimes, in statistical physics or in thermodynamics, entropy is interpretated as a measure of uncertainty (as in "a measure of our ignorance about the system"). For instance, we can observe a macrostate and we know that this macrostate may correspond to N different microstates. We aren't able to know the exact microstate but at least, we can evaluate our lack of knowledge, measured as the Entropy.

The specific form of the formula used to compute the entropy will depend on the fact that microstates have:

  • an equal probability => formula proposed by Boltzmann (the scientist... not the algo),
  • unequal probabilities => (generalized form proposed by Gibbs).

In the case of bitcoin transactions, the algo was initially designed under the angle of blockchain analysis. Thus, having (among others things) a metrics that measures our lack of knowledge about the right interpretation was a win '"Knowledge of our lack of knowledge is still useful knowledge).

In a nutshell we have these correspondences:

  • macrostate => the transaction that we can observe on the blockhain,
  • microstates => all potential interpretations of the transaction,
  • Boltzmann's (the scientist) entropy => Boltzmann's (the algo) entropy

In details, the algo enumerates all valid interpretations. From this, it then derives:

  • the entropy,
  • the link probabilities matrix (probabilities of links between inputs and outputs)

At this point, tyou may wonder why we use the Boltzmann's form and not the Gibb's form to compute the entropy.

Well, The reason is pretty simple. When we analyze a transaction in isolation (without additional info that we may collect in the blockchain or by others means), all interpretations are equiprobable. Thus the choice of this form. This is what I called the "Intrinsic Entropy" in this gist.

But here's the most interesting part, IMO. If we can collect additionnal info (e.g: 2 inputs are controlled by the same entiy), then we can rule out some of the interpretations , which is roughly equivalent to setting a null probability for these interpretations in the Gibb's forms while considering that others interpretations have the same probability. This is what I called the "Actual Entropy" in this gist.

Under the hood, this is how the Boltzmann algo is used by OXT. At first, OXT computes the "Intrinsic" entropy of the tx but if new information are gathered later from new transactions, then OXT recomputes the entropy with these new information. This is what is displayed by OXT.

Take tx1 1, 1, 100 -> 1, 1, 100 and tx2 100, 100 -> 100, 100. Shannon entropy gives the same result for both transactions, while an entropy built on top of the Knapsack paper has more accurate score, because in Knapsack you must factors in that the 100 -> 100 partition of tx1 can be found in every valid sub-mapping, while in Shannon you're only concerned with the number of sub-mappings.

I'm not sure to understand what you mean by Shannon's entropy in this context. All I can say for sure is that Boltzmann (with the hypothesis of equiprobability of the interpretations) returns 2 different results:

  • tx1: Entropy = 3 (i.e. 8 interpretations)
  • tx2: Entropy = 1.584963 (i.e. 3 interpretations)

@nopara73
Copy link

Wrong example and probably wrong about Shannon naming of the entropy, it's just AFAIK Shannon's intuition was using the logarithm and ended up working in practice, but as I'm trying to look it up, I can see there's much more to this, anyway I won't spend much time on this, as it's less relevant to the conversation. Rather I'd get back to my tailor made example that was wrong and can't help but wondering if it's not that simple to construct such example, then maybe maximizing the subsets could give roughly the same result in practice as the Knapsack way where probabilities between i/o, i/i, o/o and come up with entropy from that.

Anyway, as a side note, the Knapsack paper's goal was to create unequal output mixing and only as a side effect they had to work out a mathematical model on how to score mixes. IMO this spinoff research is more applicable than the original goal itself, as the original goal suffers from lack of further research:

  • How to do it trustlessly?
  • Is it really more efficient than equal amount mixes across transaction chains?

Regarding the trustlessness aspect I'm not sure if it's even possible as mixing networks rely on standard package sizes in one way or another.
Regarding always creating unequal amounts to maximize Knapsack entropy in a single transaction, it makes sense but does it makes sense when we extend it to transaction chains and compare it with standard amount mixes? Remixes are frequent and creating standard output amounts facilitates mixes on the inputs of the remixes. Thus it may very well be the case where the Knapsack entropy of a tx chain would be ultimately greater with standard amounts than the Knapsack entropy maximizing method on single transaction.

Side note aside, what would be the Knapsack entropy anyway?

With Knapsack we can calculate the probabilities of links, but from there we have to get to entropy somehow.

  1. Maybe the most obvious improvement would be that those links would be weighed by amounts as a 30% probable link between 1BTC and 100BTC isn't equivalent to a 30% link between 1BTC and 2BTC.
  2. Then maybe we could say that the total volume of a mix would be 1, instead of 47.5BTC so we can compare different mixes regardless the volume.
  3. Finally add them together?

Example. So for tx 1, 2 -> 1, 1, 1 we would get the following:

Sub mappings:
1,2 -> 1,1,1
1 -> 1 | 2 -> 1,1
1 -> 1 | 2 -> 1,1
1 -> 1 | 2 -> 1,1

Input match probabilities:
1 - inputs: 2(0.25) | outputs: 1(0.5) 1(0.5) 1(0.5)
2 - inputs: 1(0.25) | outputs: 1(0.75) 1(0.75) 1(0.75)

Output match probabilities:
1 - outputs: 1(0.5) 1(0.5) | inputs: 1(0.5) 2(0.75)
1 - outputs: 1(0.5) 1(0.5) | inputs: 1(0.5) 2(0.75)
1 - outputs: 1(0.5) 1(0.5) | inputs: 1(0.5) 2(0.75)

This is how far Knapsack goes. Next thing could be to normalize the amounts to 1 based on the total volume for comparisions between transactions, which is 3BTC, so then the amounts would be 1/3, 2/3 -> 1/3, 1/3, 1/3.

And then maybe add the amounts together and multiply them with the link probabilities?

  • 1BTC input links to other inputs and outputs: (1/3 + 2/3) * 0.25 + (1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5
  • 2BTC input links to other inputs and outputs: (2/3 + 1/3) * 0.25 + (2/3 + 1/3) * 0.75 + (2/3 + 1/3) * 0.75 + (2/3 + 1/3) * 0.75
  • 3 times 1BTC output links to other outputs: 3 * ((1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5)

The result of this would be 5.75 = (1/3 + 2/3) * 0.25 + (1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5 + (2/3 + 1/3) * 0.25 + (2/3 + 1/3) * 0.75 + (2/3 + 1/3) * 0.75 + (2/3 + 1/3) * 0.75 + 3 * ((1/3 + 1/3) * 0.5 + (1/3 + 1/3) * 0.5).

And the Boltzmann (not Shannon, I hope I name it properly now) entropy for 3 interpretations would be 1.58.

I'm far from sure my calculations are correct, but it communicates the idea. Is this how entropy based on link probabilities should be calculated? Or there's a better way to do it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment