Which of them feel wrong to you? I agree with all them other than 3b, which I'm unsure about - I think it this comment does a good job at unpacking things.
2a is Katja Grace's Doomsday argument. I think 2aii and 2aiii depends on whether we're allowing simulations; if faster expansion speed (either the cosmic speed limit or engineering limit on expansion) meant more ancestor simulations then this could cancel out the fact that faster expanding civilizations prevent more alien civilizations coming in to existence.
At the Center on Long-Term Risk we're open to remote work. Currently we're only hiring for summer research fellows and the application page states (as with other previous positions, iirc)
Location: We prefer summer research fellows to work from our London offices, but will also consider applications from people who are unable to relocate.
Last year we had one fully remote fellow.
The lifecycle of 'agents'
Epistemic status: mostly speculation and simplification, but I stand by the rough outline of 'self-unaware learners -> self-aware consequentialists struggling with multipolarity -> static rule-following not-thinking-too hard non-learners'. The two most important transitions are "learning" and then, once you've learned enough, "committing/self-modifying (away from learning)".
I briefly sketch three phases I guess that ‘agents’ go through, and consider how two different metrics change during this progression. This is a highly speculative just-so story that currently intuitively sounds correct to me, though I’m not very confident in very much of what I’ve written and leaned too much into the ‘fun’ heuristic at times.
The transition from the first stage to the second stage is learning to become more consequentialist. The transition from the second stage to the third is self-modifying away from consequentialism.
In each of three stages I consider the predictability of both (a) the agent’s decisions and (b) the agent’s environment when one has either (I) full empirical facts about the agent and environment or (II) partial empirical facts. I don’t think these two properties to track are the most important or relevant, but helped to guide my intuitions in writing this life-cycle.
Agents in this stage are characterised by learning, but not yet self-modifying - they have not learned enough to do this yet! They have started in motion (possibly by selection pressure), and are on the right track towards becoming more consequentialist / VNM rational / maximise-y.
They’re generally relatively self-centred and don’t model other agents in much detail if at all. They begin to have some self-awareness. There’s not too much sense that they consider different actions: the process to decide between actions is relatively ‘unconscious’ and the ability to consider the value of modifying oneself is beyond the agent for a while. They stumble into the next stage by gaining this ability.
These agents are updating on everything and thus ‘winning’ more in their world. The ability to move into stage two requires some minimum amount of ‘winning’ (due to selection pressures).
Agent’s decisions | Agent’s environment | |
Full empirical facts | High. Computationally the agent is not doing anything advanced and so one can easily simulate them. | Low-medium. The environment is relatively unaffected by the agent since they are not very good at achieving their goals. One might expect to see some change towards the satisfaction of their preferences. This is more true in ‘easy mode’ i.e. worlds where there is little to no competition. |
Partial empirical facts | Medium. The agent’s behaviour, since it is poorly optimised, could fit any number of internal states.
Further, there may be significant randomness involved in the decision making. This could be deliberate e.g. for exploration. This could also be because of low error correction in their decision-making module and physical features of the world can influence their decisions. | Slightly lower than above. Their goals and preferences are not necessarily obvious from their environment.
(Again, the less competition in the environment, or the easier it is for them to achieve their goal, or the more crude their goal is, make this ability to predict easier). |
Agents in this stage are consequentialists. Between stages one and two, they now reason about their own decision process and are able to consider actions that modify their action-choosing process. They also remain updateful and have the capacity to reason about other agents (not limited to their future selves, who may be very different). These three features make stage two agents unstable: they quickly self-modify away.
At the end of this stage, the agents are thinking in great detail about other agents. They can ‘win’ in some interactions by outthinking other agents. The interactions are not necessarily restricted to nearby agents. The acausal landscape is massively multipolar and the stakes (depending on preferences) may be much higher than in the local spacetime environment.
Agent’s decisions | Agent’s environment | |
Full empirical facts | Low. Agents are doing many logical steps to work out what other agents are thinking. | High. They are beginning to build computronium and converge on optimal designs for environment (e.g. Dyston-sphere like technology) |
Partial empirical facts | Low-medium. The environment still gives lots of clues about the agent’s preferences and beliefs and the agent is following a relatively simple to write down algorithm.
Further, the agent is already optimising for error correcting and preserving its existing values and improving cognitive abilities, and so their mind is relatively orderly.
However, the process by which they move from stage 2 to 3 (which is what happens straight upon coming to stage 2) may be highly noisy. This commitment race may be a function of the agent’s prior beliefs about facts they have little evidence for, and this prior may be relatively arbitrary. | High. The exact contents of some of the the computronium may be hard to predict (which is pretty much predicting their decision) but some will be easy (e.g. their utiltronium). |
Agents in this stage are in it for the long haul (trillions of years). Between phases 2 and 3 the agent makes irreversible commitments, making themselves more predictable to other agents and settling into game-theoretic equilibria. Phase 3 agents act in ways very correlated with other agents (potentially in a coalition of many agents all running the same algorithm).
Phase 3 agents have maxed out their lightcone with physical stuff and reached the end of their tech tree. They have nothing left to learn and are most likely updateless (or similar e.g. a patchwork of many commitments constraining their actions). There’s not much thinking for the agent left to do; everything was decided a long time ago (though maybe this thinking - the transition from phase 2 to 3 - took a while). The agent mostly sticks around just to maintain their optimised utility (potentially using something like a compromise utility function following acausal trade).
The universe expands into many causally disconnected regions and the agent is ‘split’ into multiple copies. Whether these are still meaningfully agents is not clear: I would guess they are well imagined as a non-human animal but with overpowered instincts and abilities to protect themselves and their stuff - like a sleeping dragon guarding its gold.
Agent’s decisions | Agent’s environment | |
Full empirical facts | High. There are not many decisions left to make. They are pretty much lobotomised versions of their “must think about the consequences of everything”-former selves. They follow simple rules and live in a relatively static world. | High. Massive stability (after all the stars rearranged into the most efficient arrangement). The world is relatively static. |
Partial empirical facts | High. They have very robust error correcting mechanisms, and also mechanisms to prevent the emergence of any consequentialist (sub-)agents with any (bargaining) power within their causal control. | High. There’s a lot of redundancy in the environment in order to figure out what’s going on. Not much changes. |
I agree. I think we should break "doom" into at least these four outcomes {human extinction, humans remain on Earth} x {lots of utility achieved, little to no utility} ( )
Mmm. I'm a bit confused about the short timelines: 50% by 2030 and 75% by 2030 seem pretty short to me.
I think the medium timelines I use has a pretty long tail, but the 75% by 2060 is pretty much exactly the Metaculus' community 75% by 2059.
Thanks for sharing! I've definitely had productivity gains from using a similar setup (Logseq, which is pretty much an open source clone of Roam/Obisidan and stores stuff locally as .md files).
This is a short follow up to my post on the optimal timing of spending on AGI safety work which, given exact values for the future real interest, diminishing returns and other factors, calculated the optimal spending schedule for AI risk interventions.
This has also been added to the post’s appendix and assumes some familiarity with the post.
Here I consider the most robust spending policies and supposes uncertainty over nearly all parameters in the model[1] Inputs that are not considered include: historic spending on research and influence, rather than finding the optimal solutions based on point estimates and again find that the community’s current spending rate on AI risk interventions is too low.
My distributions over the the model parameters imply that
I recommend entering your own distributions for the parameters in the Python notebook here.[3] Further, these preliminary results use few samples: more reliable results would be obtained with more samples (and more computing time).
I allow for post-fire-alarm spending (i.e., we are certain AGI is soon and so can spend some fraction of our capital). Without this feature, the optimal schedules would likely recommend a greater spending rate.
Caption: Fixed spending rate. See here for the distributions of utility for each spending rate.
Caption: Simple - two regime - spending rate
Caption: The results from a simple optimiser[4], when allowing for four spending regimes: 2022-2027, 2027-2032, 2032-2037 and 2037 onwards. This result should not be taken too seriously: more samples should be used, the optimiser runs for a greater number of steps and more intervals used. As with other results, this is contingent on the distributions of parameters.
Caption: An example real interest function , cherry picked to show how our capital can go down significantly. See here for 100 unbiased samples of .
Caption: Example probability-of-success functions. The filled circle indicates the current preparedness and probability of success.
Caption: Example competition functions. They all pass through (2022, 1) since the competition function is the relative cost of one unit of influence compared to the current cost.
This short extension started due to a conversation with David Field and comment from Vasco Grilo; I’m grateful to both for the suggestion.
Inputs that are not considered include: historic spending on research and influence, the rate at which the real interest rate changes, the post-fire alarm returns are considered to be the same as the pre-fire alarm returns.
And supposing a 50:50 split between spending on research and influence
This notebook is less user-friendly than the notebook used in the main optimal spending result (though not un user friendly) - let me know if improvements to the notebook would be useful for you.
The intermediate steps of the optimiser are here.
Adjacent to interstice's comment about trade with neighbouring branches, if the AI is sufficiently updateless (i.e. it is reasoning from a prior where it thinks it could have human values) then it may still do nice things for us with a small fraction of the universe.
Johannes Treutlein has written about this here.
Yep! I have the same intuition
Nice! I look forward to seeing this. I did similar analysis - both considering SIA + no simulations and SIA + simulations in my work on grabby aliens