These are notes from my (relatively) brief skim of http://unicode.org/reports/tr15/ . All graphics and tables are from there.
So basically unicode lets you define the same character in multiple ways, but recognizes that there are 2 broad types of character equivalence:
- Canonical Equivalence which handles, amongst other cases:
- compositions like Å ≡ A+ ̊ (or
\u00c5
≡\u0041\u030a
) - redundant definitions: Both
\u2126
and\u03a9
display as the ohm symbol (Ω)
- compositions like Å ≡ A+ ̊ (or
- Compatibility Equivalence which handles, amongst other cases:
- characters which are rendered differently, but can be seen as pretty much the same (non-breaking space ≡ regular space, i⁹ ≡ i9, ℌ ≡ H, etc). Note that Å is not compatibility equivalent to A.