Cryptography Notes

Stanford Class: Crypto I

What is Cryptography

Central Theorem: Anything that can be done with a trusted authority can also be done without.
- Instead of passing inputs x1, x2, ..., xn to some Authority object, which then outputs the result f(x1, x2, ..., xn), the inputs themselves can talk to each other and output the same result f(x1, x2, ..., xn)
Crypto Magic
- Privately outsourcing computation: e.g. Google can compute encrypted results E[result] of an encrypted query E[query] without every knowing the contents of query itself.
- Zero knowledge (proof of knowledge): It is provable that you can give the solution of any puzzle to another person without giving the details of the solution itself. WTF???
Cryptography is a rigorous science with 3 key steps:
- Precisely specify threat model (what attacker would do)
- Propose a construction
- Prove that breaking construction under threat model will solve an underlying hard problem
Notation:
- Encrpytion: c := E(k, m)
- Decrpytion: D(k, c)
- cipher: c
- message: m
- secret key: k

History of Cryptography

Caesar cipher is not a real cipher because it does not involve k (since the encryption algorithm is independent on k and only depends on m i.e. it shifts the letters in m)
Substitution cipher is easily broken despite random character mapping in k. Using frequency of letters, digrams and trigrams, we can easily brute-force k, to the point where substitution cipher can be broken by a 'cipher-text only attack' i.e. the key is basically useless
Vigener Cipher
Rotal Motors (e.g. Enigma)
DES: outdated because 2^56 bits is easily brute-forced

Discrete Probability

Formal Definition of a Cipher

A cipher defined over (K, M, C) is a pair of 'efficient' algorithm E, D where E(K x M) = C and D(K x C) = M.
It needs to satisfy the consistency rule i.e. D(k, E(k, m)) = m for all k in K, m in M.
E is often randomized (i.e. it can generate random bits as part of its encryption) while D is always deterministic.

Information Theoretic Security

A cipher text should reveal no information about the plain text
More formally, a cipher (E, D) over (K, M, C) has perfect secrecy if for any m0, m1, ...mn in M, Pr(E(k, m0)) = c, Pr(E(k, m1)) = c, ... Pr(E(k, mn)) = for all uniformly sampled k in K. This means that given any cipher text c, we will be unable to find out which m is, since they have uniformly equal probabilities.
The one-time-pad has perfect secrecy but is impractical because key length is long. In fact, the OTP is the most efficient perfect secrecy cipher because it has len(K) = len(M). Note that we require that len(K) >= len(M) for a cipher has perfect secrecy.

Two time pad is insecure

Never use stream cipher key more than once.
Given:

C1 <- m1 XOR PRG(k)
C2 <- m2 XOR PRG(k)

We can decrypt C1 XOR C2 -> m1 XOR m2. English (in particular ASCII encoding) has enough redudancy such that given mx XOR m2 we can obtain m1 and m2, and hence determine PRG(k) and subsequently decrypt all ciphers.

Project Venona (1941 - 1946) and MS-PPTP (Windows NT), 802.11b WEP are examples that used two-time pads that had security vulnerabilities.
Disk encryption is also another common example on how two time pads can be highly insecure. For some given file content m1 encrypted using a PRG(k), if the file content changes to m2 and encrypted with the same key PRG(k), we now have a two-time pad vulnerability. Therefore NEVER encrypt files on a disk using stream ciphers.
TLDR, for stream ciphers, NEVER USE THE PRG(k) more than once.

Stream ciphers are prone to 'no integrity' attacks

For any given cipher c <- m XOR k, an attacker can apply another cipher e.g. c' <- m XOR k XOR p. When decrypting with k, we will get a new message m XOR p. This causes the receiver to receive an incorrect message and there are no guarantees in the process to stop that. The attacker may not know what m is, but he with a bit of information, he can change the content of the message to something new e.g. encrypting something that he knows that is From: Bob to From: Eve, so that the receiver believes that the message is coming from From: Eve.

Linear Feedback Shift Register (LFSR)

LFSR is popular in hardware manufacturers because it is easy to implement stream cipher using LFSR in hardware.
However, they are not secure, all of these have been broken
- DVD encryption (CSS/Content Scrambling System): 2LFSRs
- GSM encryption: 3LFSRs
- Bluetooth: 4LFSRs

What is a secure cipher

Cipher text should reveal no information about the plain text (perfect secrecy)

Assignment: Write algorithms that can break the above historical ciphers.

chrisrzhou/cryptography.md