This text aims at intuitively explaining and proving the main definitions and theorems of Shannon's information theory with limited prerequisites; this includes: cross/conditional entropy, KL-divergence, mutual information, asymptotic codelength property, source and channel coding theorems . The target audience is "undergraduates or strong high school students" but I might throw in some non-essential notes for the more advanced reader. :)
As a formalist, I would like to introduce the subject with a small game...
A coin is hidden among four opaque cups. At each round, we ask yes/no questions to uncover the cup concealing the coin. Multiple rounds are played and we aim to minimize the number of questions asked.