You want to learn an embedding of the opportunities you have in a given state (or for a given state-action), rather than just its potential rewards. Rewards are too sparse of a signal.
More formally, let's say instead of the Q function, we consider what I would call the Hope function: which given a state-action pair (s, a), gives you a distribution over states it expects to visit, weighted by the rewards it will get. This can still be phrased using the Bellman equation:
Hope(s, a) = rs' + f Hope(s', a')
The "successor representation" is somewhat close to this. It encodes the distribution over future states a partcular policy expects to visit from a particular starting state, and can be learned via the Bellman equation / TD learning.
On reflection these were bad thresholds, should have used maybe 20 years and a risk level of 5%, and likely better defined transformational. The correlation is certainly clear here, the upper right quadrant is clearly the least popular, but I do not think the 4% here is lizardman constant.
Wait, what? Correlation between what and what? 20% of your respondents chose the upper right quadrant (transformational/safe). You meant the lower left quadrant, right?
Very surprised there's no mention here of Hanson's "Foom Liability" proposal: https://www.overcomingbias.com/p/foom-liability
I appreciate that you are putting thought into this. Overall I think that "making the world more robust to the technologies we have" is a good direction.
In practice, how does this play out?
Depending on the exact requirements, I think this would most likely amount to an effective ban on future open-sourcing of generalist AI models like Llama2 even when they are far behind the frontier. Three reasons that come to mind:
If we are assessing the impact of open-sourcing LLMs, it seems like the most relevant counterfactual is the "no open-source LLM" one, right?
Noted! I think there is substantial consensus within the AIS community on a central claim that the open-sourcing of certain future frontier AI systems might unacceptably increase biorisks. But I think there is not much consensus on a lot of other important claims, like about for which (future or even current) AI systems open-sourcing is acceptable and for which ones open-sourcing unacceptably increases biorisks.
(explaining my disagree reaction)
The open source community seems to consistently assume the case that the concerns are about current AI systems and the current systems are enough to lead to significant biorisk. Nobody serious is claiming this
I see a lot of rhetorical equivocation between risks from existing non-frontier AI systems, and risks from future frontier or even non-frontier AI systems. Just this week, an author of the new "Will releasing the weights of future large language models grant widespread access to pandemic agents?" paper was asserting that everyone on Earth has been harmed by the release of Llama2 (via increased biorisks, it seems). It is very unclear to me which future systems the AIS community would actually permit to be open-sourced, and I think that uncertainty is a substantial part of the worry from open-weight advocates.
Note that the outlook from MIRI folks appears to somewhat agree with this, that there does not exist an authority that can legibly and correctly regulate AI, except by stopping it entirely.
Left the following comment on the blog: