TLSTweet-manifesto.md

This document is a summary of the line of reasoning leading to an idea I had about secret messaging which is referred to here as 'TLSTweet' (not sure about the name..).

What is "self-evident" about modern crypto.

There are several "truisms" in modern day cryptology (= cryptography + cryptanalysis). One of them is "don't roll your own crypto", which is meant to convey the extraordinary difficulty in creating ciphers and cryptosystems that are genuinely resistant to cryptanalysis, and the tremendous ease with which it's possible for the non-specialist, or even the specialist, to convince themselves that the cipher or system they've created is resistant to cryptanalysis, when it isn't.

Another, related but distinct, actually has a name - Kerckhoff's Principle - which briefly stated means that not only should you not hide the algorithm, but you should endeavour to make it as publically known as possible, because genuine security comes from the secrecy of the key used in the cipher. While publishing the algorithm is greatly helpful in ensuring its robustness against attacks. Kerckhoff's principle is the obverse side of the other truism, "security through obscurity is bad". One can argue about this, and indeed, I intend to.

In light of Kerckhoff's principle, steganography - which is particular kind of cryptography that involves hiding information by making it look like other, non-hidden information - looks like a bad idea. It relies on security through obscurity - although not security of the algorithm (several algorithms for steganography are widely known), but security/secrecy of the message's existence.

The other objection to steganography is much more mundane - its effectiveness tends to require a large amount of 'cover text' (or ciphertext) for each bit or byte of plaintext message. Real world steganography today is heavily hampered by this. More on this later.

So, this is the argument against. What is the argument for?

Asymmetric warfare and steganography

The basic arguments outlined in the previous section all made sense to me, but something about them as a whole didn't smell right. Then I realised - it's very analogous to the concept of 'asymmetric warfare' as it's currently understood. One side has a huge arsenal of weaponry and "plays by the rules". The other side, possessing of far less resources, must perforce use somewhat underhanded methods. The side with the big guns is, in this context, governments and very large corporations. From their point of view, Kerckhoff's law makes sense because while they want to keep secrets, they are quite OK with the world knowing that they are keeping secrets. They don't fear existential attack, they fear loss - loss of secrecy. The smaller side, the hacker and activist has a different fear - of either metaphorical or literal extermination. In the worst cases it can be solitary confinement, torture or execution.

For the activist, especially in a repressive regime, the primary goal is not really to keep secret the communications he has with his fellow activist, but rather to prevent knowledge that any such communication took place. So, the anonymity set suddenly becomes overwhelmingly important. Rather like the terrorist or freedom fighter blending in with a crowd of civilians, an activist not only has to "hide his gun" (encrypt his communications) but also do it in a way that doesn't arouse suspicion. The government or army has no such concern; they hide themselves behind armour plating, but don't care who sees the armour because in public they have an asserted right to use such armour and weaponry.

So, speaking in the most general terms, for the powerless, encryption (even end to end) is not enough. They need steganography.

Problems: hiding and bit-rate

I have tried to justify why the anonymity set is critically important, and steganography has the virtue that it can quite easily make that problem go away. By using common media - family photos, tweets and facebook messages etc - you can mix your communications in with a big crowd. But there are other serious problems to solve in using steganography. Consider one of the most well known approaches to steganography: hiding text in images. This is a well studied field (although not by me, I have to admit!) You embed a message in some low or high order bits in a photo. To the naked eye there may be no difference at all. But the problem is that you have changed the statistical properties of the data, even if only slightly. This is exactly what vast data farms in Utah are built for ... such anomalies can be discovered in bulk data sets and, without attempting to decrypt, flags can be set, and anonymity lost. The armour plated tanks then roll in.

The other problem is in some sense just the other aspect of the same point: you may only set one bit per byte or one bit per 10 bytes in the data, leaving you with a very small amount of communicated message per cat.jpg. The more data you pack into a jpg or mp3, the more you alter its statistical properties (at least over a large set); your ciphertext might be truly random (say you encrypt the message before embedding it, then it certainly should be), but the problem remains: pure randomness is not what photos are made out of, so the more you try to embed into the photo, the worse it gets.

I'm sure it's possible to get very clever with this problem. But lives are at stake here. Is there a way to remove the danger entirely? It would be nice to also be able to embed big messages in files (i.e. get a "high bitrate"), but I'm going to temporarily abandon that as a lesser goal and ask, can we at least get certainty that the file is indistinguishable from a normal one?

TLSTweet

My answer is - yes, and what I am calling TLSTweet is an example. In any given second there are millions of TLS connections between users and websites happening around the world. If you know even a little bit about web technology, you'll probably know that TLS needs to use secret keys to set up secure connections. What you might not know is that other random numbers are used to initiate connections and to transfer data. The crucial point is that these random numbers (client random, server random and IV (and nonces in some cases) for those in the know) are published directly over the wire, in 'plaintext' (although that's a confusing way to say it since IVs are sort-of part of the ciphertext). This is a requirement of the TLS protocol (although the details vary in different versions), and it's specified in the RFCs that these numbers are genuinely random. All clients (i.e. browsers and similar) and all servers do this.

So the idea is simple - Alice runs a TLSTweet enabled client which simply encrypts Alice's message to Bob's public key. This ciphertext is random; that's one of the promises of any good public key crypto system. It can therefore take the place of the data fields mentioned above (client random and server random in handshake, IV in ciphertext etc.). Alice's entire TLS connection (and browsing session on the website) would then be cryptographically guaranteed to not leak any information suggesting that she was sending a secret message. Steganography.

We will come on to all the practical (and fundamental!) problems with this idea in a minute, but first, appreciate the value here: as mentioned above, a principal problem with existing steganography is the possibility of statistically identifying secret messages because they don't have the same "fingerprint" as ordinary versions (of cat.jpg for example). This problem is entirely removed. What's more, the anonymity set could be (in some slightly unrealistic for now version of TLSTweet) absolutely vast - anyone browsing TLS enabled websites, which nowadays is basically all of them.

The problem of the server.

This idea sounds nice until you notice we haven't discussed the receiving of messages, only the sending. One slightly weird implementation works fine: suppose Bob is the (web-)server. He will be running a TLSTweet daemon which monitors the randoms as they come in and passes on those messages which decrypt succesfully (or do it later after hoovering up net traffic). He can embed his (encrypted) responses in the messages sent back from the webserver to the client. This is pretty much a perfectly undetectable system for messaging between two actors.

More realistically, not everyone can run their own webserver and do this. What we want is for the set of clients to be able to communicate with each other, but without doing anything out of the ordinary that would trigger the spyware which is currently "collecting it all" across the entire 'net.

The fact that TLS is fundamentally a client-server design weakens the pure vision of this; it is easy (and I'm toying with doing it myself) to set up a website and server that simply matches requests to pubkeys in some temporary database and forwards the messages from one client (Alice) in the same manner from server to client (Bob), again using the randoms in the server->client portion of the TLS connection. To the outside world, this could remain perfectly invisible, although some crypto shenanigans is required to avoid repeating the same random numbers, but also now we have to trust the server not to leak the existence of the pubkeys and how often they were used, and what IP addresses they were associated with (although Tor can deal with that of course; but this violates the spirit of what we're trying to achieve - no suspicious behaviour).

There are ideas how to improve this scenario, and it may be worthwhile even given its limitations, since it stops the 'collect it all' idea. "All" will still be collected, but crucial metadata lost.

Warts and all, let's examine the webserver scenario further: if there is only one such server 'tlstweet.com' say, then the point is almost entirey lost, since using it becomes suspicious in the same way as a PGP email. I emphasize 'almost' because that isn't true if, for some reason, a bunch of people are known to use it (would have to be a lot!) anyway, for non-secret-messaging purposes. So in this case it becomes close to useless - not only do you have to trust the server with your metadata, but you don't get a proper anonymity set.

The scenario is somewhat rosier if large, known servers choose to perform this function. In the extreme, if google allowed this functionality, the anonymity set problem is entirely resolved, but (a) such a thing would never happen and (b) large corporations are not really who you want to trust with these pubkeys and IP linkages. I think the realistic case is somewhere in the middle; a smallish number of smallish website servers, probably changing all the time, offering this kind of service. Large security agencies wouldn't be able to keep up with it because finding such servers would often be difficult. All the same, in this basic model the idea is somewhat flawed and limited.

The bit-rate problem remains, but maybe isn't so bad

Earlier we discussed how difficult it is to embed large amounts of data using steganography. This is certainly the case here; although I haven't yet done the calculations, the size of a message that can be transferred from Alice to Bob in each TLS request made (i.e. each page or resource loaded) might be as little as 16-50 bytes. Hence the silly name "TLSTweet" (it doesn't mean tweeting in the sense of broadcasting messages, so I should probably change the name). So having a long, detailed conversation might be problematic (too many server requests is suspicious). But on the other hand, a discussion at about the same level as is currently seen on Twitter, between two participants or perhaps more, might be workable. That's good enough to be useful, hopefully.

In order to provide cover, I'd envisage the TLSTweet client software to automatically load various pages on a website in a pattern that's not out of the ordinary in any way. For example, loading discussion threads on a forum. One might imagine improving the throughput rate by using servers delivering large amounts of content, e.g. video streaming, but don't forget that it's asymmetric - only the server delivers lots of TLS data, not the client that only makes requests, and so it wouldn't really bump up the secret message content throughput.

Final thoughts

This is an incomplete set of ideas. The core concept and philosophical idea behind it is I think quite clear, but questions remain, for example:

This focuses on TLS, but TLS is not the only public, widely used network protocol that requires random numbers.
The problem of server-possesses-metadata seems insurmountable, if we insist that ordinary web users want to communicate. Is there any way round this?

AdamISZ/TLSTweet-manifesto.md