Game of AGI (GOA) is a socio-techno monte carlo simulation which estimates the existential risk posed by AGI. You play by setting various factors, i.e. AI spending and AGI safety spending, along with their uncertainties and GOA returns the existential risk, corruption risk, and value alignment probability associated with your guesses by sampling 1M times from guassian distributions of your guesses.
N.B. When referring to AGI's as separate entities, I am referring to autonomous groups of humans (i.e. companies, governments, groups, organizations) that are in control of some tech stack capable of AGI. There may be several autonomous AGIs within any certain stack, but we assume they have an umbrella objective defined by the controlling org.
Each input parameter is modeled by a Guassian distribution with a set mean and standard deviation. When setting the standard deviation, it's helpful to remember that 67% of samples will fall within the mean +/- the standard deviation.
This is the probability that AGI will be aligned with humananity for reasons other than a general solution to long term alignment. In this "winging it" scenario, we solve safety issues as they come up in various types of applications, i.e. disinformation on Facebook, or hitting pedestrians in a self-driving car. Then as a result of iterating on more and more capable AI systems designed to safely meet our evolving needs, safe AGI emerges by default.
One should consider the likelihood of AGI emerging from Google/OpenAI when setting this, and their focus on safety relative to governments. Such labs are perhaps also more amenable to oversight than other well funded, mostly state-based groups as thes labs will be accountable to member-states' laws, and so likely safer in this respect as well.
We make the optimistic assumption that safety by default trumps corruption, i.e. a corrupt lab originating AGI will still be benevolant if AGI is safe by default.
agi_safe_by_default
This is the probability that we will solve long term AGI alignment generally. For example, if iterated amplification and safety via debate provide a general solution with which we could align any AI, then this would be 1
.
See the control problem.
value_alignment_prob
Here we consider dollars as a proxy for effectiveness of AGI safety efforts.
dollars_agi_safety
Global annual spending in USD directly on ensuring AGI is safe. c.f. field of AGI safety
dollars_ai
Global annual spending in USD on AI, eventually culminating in AGI
ideal_safety_to_ai_spend_ratio
Ideal dollars_agi_safety / dollars_ai
So if you think AGI safety spending should equal 1% of total spending on AI, set this to 0.01
ignore_spend
Run the model without considering spending
There are currently two far and away leaders in the race to AGI - Google and OpenAI. These two orgs have seemingly relatively beneficent goals, none-the-less such a concentration of potential power poses a major risk of those in control of AGI becoming corrupt. One solution to corruption could be some form of oversight. Oversight can also help ensure safety standards are being upheld in labs working on AGI. Things that such an oversight organization could do are:
- Produce a safety score for top AI labs
- Prepares courses on training for employees to spot safety violations and report them anonymously if they don't feel safe addressing them internally.
- Provides a secure hotline for concerned scientists to raise safety concerns
- Holds Inspections that include private, anonymous interviews with employees to get feedback on AGI safety (and sharing AGI safety breakthroughs similar to the IAEA's Technical Cooperation Progamme), and AGI progress
For now the exact form of this organization, i.e. an international agency, governmental agency, or industrial standards group is kept nebulous. Be optimistic that oversight will lead to less corruption and higher safety.
oversight_dollars
Global dollars currently spent on AGI safety oversight
ideal_oversight_dollars
Ideal spending on oversight for AGI safety
ignore_oversight
Whether to ignore the effects of oversight
One key concept in measuring corruption is understanding how many autonomous groups of humans will be in control of AGI. The more such originating organizations, the less power will be concentrated in the hands of the few, and the more competition / checks and balances there will be between humans in control of AGI before the singularity. Also, the more people in each organization, the more likely there will be someone who acts as a whistle blower or mutineer in the name of humanity.
On the flip side, a larger number of AGI controlling entities introduces risk in the form of "bad apples" who are able to cause outsized destruction using AGI. This could be in the form of hackers, rogue states, and possibly militaries with short-sited goals. This is not currently modeled as the x-risk is so alarmingly high already, that increasing the risk even more doesn't seem to warrant any qualitative changes in actions that should be taken as a result of the model. i.e. We need oversight and safety spending. Also, it seems that a small number of labs are in contention to originate AGI currently so large numbers of originators aren't realistic. However, in terms of action items around promoting openness (i.e. should OpenAI be more open) - this type of modeling would help clarify that. The nuances of such a model should include the relative destructive vs constructive power of a rogue AGI organization (i.e. it is easier to destroy than to create) and the counterbalance to such destructive power offered by the higher total number of AGI controlling organizations.
Another issue not covered by the model is that of warring AGI's. I.e. if more than one AGI exists, it makes sense they would have competing goals and therefore may attempt to eliminate the other to achieve their goals. For the purposes of settings this input, consider that AI's and the humans designing them will favor cooperation over conflict.
Note that we currently consider a single entity controlling AGI to be almost certainly corrupt, and that this corruption will be an existential threat. This is extreme, but it's not obvious how extreme. For example, how much will alignment work matter if the groups developing AGI are corrupt? The model currently assumes not at all. However, it obviously depends on how corrupt the humans are. Again this is not modeled, corrupt == x-risk in this initial model.
number_of_originators
Number of AGI originating organizations
ppl_per_originator
Number of people in the AGI originating organization
snowden_prob
Probability that a single person in a corrupt originator
becomes a whistleblower or even mutineer, c.f. Fritz Houtermans
https://docs.google.com/document/d/164O4fmp-zsbeIenq3l-slo-VGfYUagNY/edit
It's not just about origination but quick followers as they can be just as important. For this we need to consider things like the number of the breakthroughs needed on top of what's openly available for replication to occur. If replication can occur early within the takeoff window, we will get some of the effects of additional originators. When setting this parameter, one should consider the amount of open source work that will be available and the additional data, hardware, people, and software that will be needed to replicate an AGI. For multiple originations to occur simultaneously, we'd need some cross-originator coordination, i.e. Google trains an AI seemingly more capable than its best AI researchers at their jobs - they then coordinate with OpenAI to share the system and decentralize control over it. Without such coordination, there will be a replication delay between the first AGI and the next which this parameter models.
c.f. https://www.lesswrong.com/tag/ai-takeoff
takeoff_in_days
Number of days from AGI to the singularity
days_to_replicate
Number of days for an independent group to replicate AGI
ignore_takeoff_and_replication
Whether to ignore this and the replication delay
The model is currently a DAG and has no recurrence to model time. Obviously this is not the case in reality, as variables will have recurrent effects on eachother across time, but I've tried to simplify things as much as possible while still being able to express relationships between variables adequately. I'm also working with someone on a simulation across time - but this is in early stages.
Inputs are assumed to be I.I.D. Again I've tried my best to strike the right balance between expressively and simplicity here. I will try to keep this version simple, while also working on another more complex version for comparison between approaches.
All distributions are Guassian, whereas probabilities are better modeled with beta distributions. There are also some long tail effects that need to be accounted for, depending on how much uncertainty you provide in your inputs. I don't anticipate the outputs varying that widely with more accurate distributions, but we will see!