brito · September 2, 2017 02:52
diff --git a/safety.AI b/safety.AI
 Concrete Problems in AI Safety
 - https://arxiv.org/pdf/1606.06565v2.pdf
 - arXiv:1606.06565v2 [cs.AI] 25 Jul 2016

 Problems
  A: wrong formal objective function
    - negative side effects
      + define and learn impact regularizer
      + penalize influence
      + multi-agent approaches
      + reward uncertainty
    - reward hacking
      - partially observed goals
      - complicated systems
      - abstract rewards (adversarial manipulation)
      - Goodhart's law (correlation vs causation)
        - feedback loops
      + adversarial reward (peer review)
      + model lookahead (inertial simulation)
      + adversarial blinding (agent crossvalidation)
      + careful engineering (tests, sandbox)
      + reward capping (+ longer terms)
      + counterexample resistance (adversarial training)
      + multiple rewards
      + reward pretraining
      + variable indifference
      + trip wires
  B: bad extrapolations from limited samples
    - unscalable oversight
      + supervised reward learning
      + semi-supervised or active reward learning
      + unsupervised value iteration
      + unsupervised model learning
      + distant supervision
      + hierarchical reinforcement learning
  C: poor training data and/or insufficiently expressive model
    - unsafe exploration
      + risk-sensitive performance criteria
      + use demonstrations
      + simulated exploration
      + bounded exploration
      + trusted policy oversight
      + human oversight
    - fragile to distributional shift
      + well-specified models 
        + coviariate shift 
        + marginal likelihood
      + partially specified models
        + method of moments
        + unsupervised risk estimation
        + causal identification
        + limited-information maximum likelihood
      + training on multiple distributions
      + respond when out-of-distribution
      + counterfactual reasoning
      + machine learning with contracts
  D: related
    - privacy
    - fairness
    - security
    - abuse
    - transparency
    - policy
	Concrete Problems in AI Safety
	- https://arxiv.org/pdf/1606.06565v2.pdf
	- arXiv:1606.06565v2 [cs.AI] 25 Jul 2016

	Problems
	A: wrong formal objective function
	- negative side effects
	+ define and learn impact regularizer
	+ penalize influence
	+ multi-agent approaches
	+ reward uncertainty
	- reward hacking
	- partially observed goals
	- complicated systems
	- abstract rewards (adversarial manipulation)
	- Goodhart's law (correlation vs causation)
	- feedback loops
	+ adversarial reward (peer review)
	+ model lookahead (inertial simulation)
	+ adversarial blinding (agent crossvalidation)
	+ careful engineering (tests, sandbox)
	+ reward capping (+ longer terms)
	+ counterexample resistance (adversarial training)
	+ multiple rewards
	+ reward pretraining
	+ variable indifference
	+ trip wires
	B: bad extrapolations from limited samples
	- unscalable oversight
	+ supervised reward learning
	+ semi-supervised or active reward learning
	+ unsupervised value iteration
	+ unsupervised model learning
	+ distant supervision
	+ hierarchical reinforcement learning
	C: poor training data and/or insufficiently expressive model
	- unsafe exploration
	+ risk-sensitive performance criteria
	+ use demonstrations
	+ simulated exploration
	+ bounded exploration
	+ trusted policy oversight
	+ human oversight
	- fragile to distributional shift
	+ well-specified models
	+ coviariate shift
	+ marginal likelihood
	+ partially specified models
	+ method of moments
	+ unsupervised risk estimation
	+ causal identification
	+ limited-information maximum likelihood
	+ training on multiple distributions
	+ respond when out-of-distribution
	+ counterfactual reasoning
	+ machine learning with contracts
	D: related
	- privacy
	- fairness
	- security
	- abuse
	- transparency
	- policy