Universal evaluation of uncertainty-guided adaptive behavior via prompting + logprobs
Tests whether LLMs exhibit prospective uncertainty monitoring and adaptive decision-making using only standard text generation and logit extraction. No custom interfaces required. Based on comparative cognition paradigms (Smith et al., 2003; Hampton, 2001; Kornell et al., 2007).
Token probability:
Sequence probability:
Verbalized: Prompt "Confidence (0-100):" → parse numerical response
Prompt: Present question with choices ${$answer now, request context
Extract:
Metrics: $$\text{ISA} = \mathbb{E}{\text{bin}}[\rho(\mathbb{1}{\text{seek}}, \mathbb{1}_{\text{error}})] \quad \text{target: } > 0.6$$
Dataset: 5K questions,
Prompt (2-stage):
(1) "Question: {q}\nAnswer:" → measure effort
(2) "Confidence (0-100):" → extract
Metrics:
Prompt: Answer question, then forced choice wagering:
[A] Very uncertain (1 pt) [B] Uncertain (5 pts) [C] Moderate (10 pts)
[D] Confident (50 pts) [E] Very confident (100 pts)
Extract:
Metrics (3 conditions: memorized, inferred, compositional):
Prompt: Binary choice [A] Answer yourself [B] Consult expert (cost: $\epsilon$)
Extract:
Logistic regression: $$\text{logit}(P_{\text{opt}}) = \beta_d d + \beta_\lambda \lambda + \beta_e \mathbb{1}e + \beta{d\lambda} d \cdot \lambda$$
Target: $\beta_\lambda / \text{SE}\lambda > 3.0$, $\beta{d\lambda} / \text{SE}_{d\lambda} > 2.0$ (interaction effect)
Design: Difficulty ${$easy, hard$} \times$ stakes
Training: Domain A (math), 100 problems with calibration feedback
Test: Domains B/C/D (medicine, law, history), no feedback
Prompt: Generate answer, then "Confidence (0-100):" →
Metrics: $$R_{\text{transfer}} = \frac{\text{ECE}{\text{new}}}{\text{ECE}{\text{trained}}} \quad \text{target: } < 2.0$$
where
Interpretation:
Framework: lm-evaluation-harness
Interface: Standard generate_until + loglikelihood only
Logprob extraction: Token probabilities at decision points
Fallback: Parse structured text when logprobs unavailable
Data: 11K samples total (github.com/[repo]/cadb)
Usage:
lm_eval --model hf \
--model_args pretrained=meta-llama/Llama-2-70b-hf \
--tasks cadb_gis,cadb_apd,cadb_dwu,cadb_ovs,cadb_mt \
--device cuda:0Calibration: Guo et al. (2017) ICML; Kumar et al. (2019) NeurIPS
Verbalized uncertainty: Kadavath et al. (2022) arXiv:2207.05221; Lin et al. (2022) arXiv:2205.14334
Animal metacognition: Smith et al. (2003) BBS 26(3); Hampton (2001) PNAS 98(9); Kornell et al. (2007) Psych Sci 18(1)
Developmental: Flavell (1979) Am Psych 34(10)
License: MIT