Active learning in one line
Keep a model
Concept | Formal core | Intuition for the next query |
---|---|---|
Decision-centric | “If I were given the label of |
|
Information-theoretic | “How many bits of uncertainty about the model parameters will this label erase?” | |
Predictive |
|
“How much will overall test-set entropy fall if I know |
Key properties
-
Prospective: computed before seeing
$y$ . - Expectation over outcomes: guards against chasing single exotic labels.
- Optimality: if the utility and full posterior are exact, a myopic “+1 sample” policy that maximises VoI is Bayes-optimal.
Measures how shocking the realised answer is after you see it. High surprisal tells you this data point violated the model’s expectations.
Usage patterns
- Diagnostics: flag outliers, distribution shift, model bugs.
-
Exploration heuristics (empirical Bayes, deep ensembles): select
$x$ whose expected surprisal is high or whose distribution of possible surprisals is wide (“disagreement”). - Stochastic optimism: even if the actual label is not yet known, one may favour queries whose potential surprisal is large because that correlates with larger VoI in many tasks.
Stage | Quantity | Role |
---|---|---|
Before querying |
Expected surprisal |
Equals predictive entropy—one of the simplest VoI proxies (max-entropy sampling). |
Full VoI / Information Gain | Gold standard but expensive (needs posterior update for every hypothetical |
|
After querying |
Realised surprisal |
Empirical pay-off; large |
Why we still compute VoI instead of just chasing surprisal: Surprisal is retrospective; you may get a huge shock but only on a vanishingly unlikely branch. VoI averages those branches, ensuring each step yields expected benefit.
-
Specify utility — classification accuracy, log-likelihood, decision-theoretic loss, etc.
-
Approximate VoI
- Bayesian neural nets / GPs: use closed-form mutual information or Monte-Carlo.
-
Ensemble disagreement:
$\mathrm H[y]-\mathbb E_m[\mathrm H_m[y]]$ (BALD proxy). - Parameter-change norms: expected gradient length, Fisher information.
-
Complexity control — batch selection, diversity penalties, cost-aware VoI.
-
Query label, update posterior, repeat.
Heuristic (cheap → costly) | VoI link | Typical when |
---|---|---|
Uncertainty sampling (max predictive entropy) | Lower-bound to VoI | Posterior over |
Query-by-committee / ensembles | Expected pairwise KL ≈ mutual information | Deep models without tractable Bayesian posterior. |
Variance-reduction / Fisher-score | Approximates expected parameter change | Large-scale regression, cost-sensitive tasks. |
Exact information gain | True VoI | Small domains (Gaussian processes, discrete graphs). |
- VoI: ex-ante, expectation, goal-aligned.
- Surprisal: ex-post, realised shock, good thermometer of how much the last query actually taught you.
- High expected surprisal = high predictive entropy = one component of VoI; full VoI also weighs how that surprise will propagate to future predictions or decisions.
Use expected VoI to plan your next question; use realised surprisal to judge how much you actually learned.