Tahoe LAFS

Tahoe LAFS is a distributed file system with an interesting permissions model. (whitepaper) Both Immutable and Mutable files are supported (Mutable is the most complex and interesting) There are three levels of permissions, Write, Read, and Verify. Each permission is granted by giving a user a special key called a "capability". If you have the Write capability you can update the file, if you have the Read capability you can retrieve the plain text, but if you only have the Verify capability you can only validate the file integrity, but not read the contents.

The lower level capabilities are generated deterministically from the higher level capabilites. So, someone who has the Write capability can generate the Read capability and give it to someone, who will then be able to read the plain text of that file.

The Tahoe LAFS paper describes two methods of implementing this 3-layer capability model, I'll just discuss the first one.

Methods

here are some methods I'll use to describe the system, the specific cryptographic algorithms used for each are given in the paper.

// hash a text (or binary blob)

   sum = hash(text)

// encrypt a text/blob

   ciphertext = encrypt(key, plaintext)

// sign a text/blob.

   signature = sign(private_key, text)

// generate a random salt

   salt = random()

// generate the public key to a given private key

   public_key = public(private_key)

// verify that a blob is signed

   signed = verify(sig, public_key, blob)

// upload a file + metadata

   updload(id, tuple)

Write capability

Each file is associated with a private key (called the "signing key" in the paper), each update to that file must be signed with that private key. (note, the private key represents write access to the file, not the user). The private key is encrypted and stored with the file metadata.

Given a public-private key pair, the various capabilities are generated like this:

write_cap = hash(private_key)

verify_cap = hash(public_key)

//the read key is comprised of the hash(write_cap), and the verify_cap

read_cap = {hash(write_cap), verify_cap}

writing

To write a new file, the writer generates a key pair, encrypts the file contents, the private key, signs the encrypted file + metadata, and then uploads everything.

{public_key, private_key} = generate_pair()

write_cap = hash(private_key)

read_cap = hash(public_key)

verify_cap = hash(read_key)

salt = random()

cipherkey = encrypt(hash(write_cap + salt), private_key)

ciphertext = encrypt(hash(read_cap + salt), plaintext)

signature = sign(private_key, hash(ciphertext + salt))

cipher_public_key = encrypt(hash(verify_cap + salt), public_key)

upload(file_id, {cipherkey, ciphertext, salt, sig, cipher_public_key})

The purpose of encrypting with hash(read_cap + salt) is so that each version of the file is encrypted with a different key, which prevents against certain attacks. (this isn't mentioned specifically in the paper, I'm reading between the lines)

In the paper, the ciphertext is hashed with a merkle tree, so that it is possible to spread the file across multiple servers. However it's not necessary to consider this in order to understand the capability/permissions model.

verify

verification is also the first step in reading, so I'll explain it first.

the verify downloads the encrypted file + metadata and then verifys the public key and the signature.

{cipherkey, ciphertext, salt, sig, cipher_public_key} = download(file_id)

//reconstruct the public key

public_key = decrypt(hash(verify_cap + salt), cipher_public_key)

if(verify_cap != hash(public_key))
  throw INVALID

if(!verify(public_key, sig, hash(ciphertext + salt)))
  throw INVALID

if no exceptions where thrown, the file is valid.

reading

continue from the end of the verify step. reconstruct the key and then decrypt the file.

plaintext = decrypt(hash(read_cap + salt), ciphertext)

update

Finally, to update a file, the updater only needs to know the write_cap. As they can use it do decrypt the private_key

//we only care about the cipher key and the salt in this case.
{cipherkey, _, salt, _, _} = download(file_id)

//reconstruct the private_key
private_key = decrypt(hash(write_cap + salt), cipherkey)

salt2 = random()

ciphertext2 = encrypt(hash(read_cap + salt2), plaintext2)

signature = sign(private_key, hash(ciphertext2 + salt2))

upload(file_id, {cipherkey, ciphertext2, salt2, sig2, public_key})

Conclusion

Tahoe has a clever model for implementing permissions where the server is not actually in a position of authority. The clients can always verify that what the server gave it was correct, and clients also have the ability to delegate permissions to other clients without ever having rely on the servers as referee, other than to run the protocol correctly!

dominictarr/TahoeLAFS.md