The framework is AI strategy nearcasting: trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today’s.
Usage of "nearcasting" here feels pretty fake. "Nowcasting" is a thing because 538/meteorology/etc. has a track record of success in forecasting and decent feedback loops, and extrapolating those a bit seems neat.
But as used in this case, feedback loops are poor, and it just feels like a different analytical beast. So the resemblance to "forecasting" seems a bit icky, particularly if you are going to reference "nearcasting" without explanation it in subsequent posts: <https://ea.greaterwrong.com/posts/75CtdFj79sZrGpGiX/success-without-dignity-a-nearcasting-story-of-avoiding>.
I spent a bit thinking about a replacement term, and I came up with "scenario planning absent radical transformations analysis", or SPARTA for short. Not perfect, though.
Would someone be able to clarify the difference between the term HFDT as used here and in the original "Takeover" post, and RLHF?
My understanding is that HFDT doesn't assume an RL model.
This is the first in a series of pieces taking a stab at dealing with a conundrum:
It seems to me that in order to more productively take actions (including making more grants), we need to get more clarity on some crucial questions such as “How serious is the threat of a world run by misaligned AI?” But it’s hard to answer questions like this, when we’re talking about a development (transformative AI) that may take place some indeterminate number of decades from now.
This piece introduces one possible framework for dealing with this conundrum. The framework is AI strategy nearcasting: trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today's. One (but not the only) version of this assumption would be “Transformative AI will be developed soon, using methods like what AI labs focus on today.”
The term is inspired by nowcasting. For example, the FiveThirtyEight Now-Cast projects "who would win the election if it were held today,” which is easier than projecting who will win the election when it is actually held. I think imagining transformative AI being developed today is a bit much, but “in a world otherwise relatively similar to today’s” seems worth grappling with.
Some potential benefits of nearcasting, and reservations
A few benefits of nearcasting (all of which are speculative):
As with nowcasting, nearcasting can serve as a jumping-off point. If we have an idea of what the best actions to take would be if transformative AI were developed in a world otherwise similar to today’s, we can then start asking “Are there particular ways in which we expect the future to be different from the nearer term, that should change our picture of which actions would be most helpful?”
Nearcasting can also focus our attention on the scenarios that are not only easiest to imagine concretely, but also arguably “highest-stakes.” Worlds in which transformative AI is developed especially soon are worlds in which the “nearcasting” assumptions are especially likely to hold - and these are also worlds in which we will have especially little time to react to crucial developments as they unfold. They are thus worlds in which it will be especially valuable to have thought matters through in advance. (They are also likely to be worlds in which transformative AI most “takes the world by surprise,” such that the efforts of people paying attention today are most likely to be disproportionately helpful.)
Nearcasting might give us a sort of feedback loop for learning: if we do nearcasting now and in the future, we can see how the conclusions (in terms of which actions would be most helpful today) change over time, and perhaps learn something from this.
A major reservation about nearcasting (as an activity in general, not relative to forecasting) is that people are arguably quite bad at reasoning about future hypothetical scenarios,1 and most of the most impressive human knowledge to date arguably has relied heavily on empirical observations and experimentation. AI strategy nearcasting seems destined to be vastly inferior to e.g. good natural sciences work, in terms of how much we can trust its conclusions.
I think this reservation is valid as stated, and I expect any nearcast to look pretty silly in at least some respects with the benefit of hindsight. But I still think nearcasting (and, more generally, analyzing hypothetical future scenarios with transformative AI2) is probably under-invested in today:
I think part of the challenge of this kind of work is having reasonable judgment about which aspects of a hypothetical scenario are too specific to place big bets on, vs. which aspects represent relatively robust themes that would apply to many possible futures.
The nearcast I’ll be discussing
Starting here, and continuing into future pieces, I’m going to lay out a “nearcast” that shares many (not all) assumptions with this piece by Ajeya Cotra (which I would highly recommend reading in full if you’re interested in the rest of this series), hereafter abbreviated as “Takeover Analysis.”
I will generally use the present tense (“in my scenario, the alignment problem is difficult”) when describing a nearcasting scenario, and since the whole story should be taken in the spirit of speculation, I’m going to give fewer caveats than I ordinarily would - I will often say something like “the alignment problem is difficult” when I mean something like “the alignment problem is difficult in my most easily-accessible picture of how things would go” (not something like “I know with confidence that the alignment problem will be difficult”).
The key properties of the scenario I’ll be considering are:
I note that the combination of “Changes compared to today’s world are fairly minimal” with “Transformative AI is knowably near” implies a fast takeoff approaching: a very rapid transition from a world like today’s to a radically different world. This is (in my view) a key way in which reality is likely to differ from the scenario I’m describing.5
I still think this scenario is worth contemplating, for the following reasons:
Next in series
Footnotes
Though this is something I hear claimed a lot without necessarily much evidence; see my discussion of the track record of futurists here. ↩
E.g., see this piece, this piece and Age of Em ↩
For example, I generally feel better about predictions about technological developments (what will be possible at a particular price) than predictions about what products will be widely used, how lifestyles will change, etc.
Because of this, I think “Extreme technology X will be available, and it seems clear and obvious that the minimum economic consequences of this technology or any technology that can do similar things would be enormous” is in many ways a better form of prediction than “Moderate technology Y will be available, and it will be superior enough to its alternatives that it will be in wide use.” This is some of why I tend to feel better about predictions about transformative AI (which I’d say are going out on a massive limb re: what will be possible, but less of one re: how significant such a thing would be) than about predictions of self-driving cars (which run into lots of questions about how technology will interact with the regulatory and economic environment). ↩
Some readers have objected to the “trial and error” term, because they interpret it as meaning something like: “trying every possible action until something works, with no system for ‘learning’ from mistakes.” This is not how AI systems are generally trained - their training uses stochastic gradient descent, in which each “trial” comes with information about how to adjust an AI system to do better on similar inputs in the future. But I use the “trial and error” term because I think it is intuitive, and because I think the term generally is inclusive of this sort of thing (e.g., I think Wikipedia’s characterization of the term is quite good, and includes cases where each “trial” comes with information about how to do better in the future). ↩
To be clear, I expect that the transition to transformative AI will be much faster than the vast majority of people imagine, but I don’t expect it will be quite as fast as implied here. ↩
To my knowledge, most existing discussion of the alignment problem - see links at the bottom of this section for examples - focuses on abstract discussions of some of the challenges presented by aligning any AI system (things like “It’s hard to formalize the values we most care about”). Takeover Analysis instead takes a nearcasting frame, and walks mechanically through what it might look like to develop powerful AI with today’s methods. It discusses mechanically how this could lead to an AI takeover - as well as why the most obvious methods for diagnosing and preventing this problem seem like they wouldn’t work. ↩