I mainly post on the EA Forum.
And many readers can no doubt point out many non-trivial predictions that Drexler got right, such as the idea that we will have millions of AIs, rather than just one huge system that acts as a unified entity. And we're still using deep learning as Drexler foresaw, rather than building general intelligence like a programmer would.
One of the simpler and more important lessons one learns from research on forecasting: be wary of evaluating someone’s forecasting skill by drawing up a list of predictions they got right and wrong—their “track record.” One should compare Drexler’s performance against alternative methods/forecasters (especially for a forecast like “we’re still using deep learning”). I’m not saying this is nothing, but I felt compelled to highlight this given how often I’ve seen this potential failure mode.
I feel like this is a good example of a post that—IMO—painfully misses the primary objection of many people it is trying to persuade (e.g., me): how can we stop 100.0% of people from building AGI this century (let alone in future centuries)? How can we possibly ensure that there isn’t a single person over the next 200 years who decides “screw it, they can’t tell me what to do,” and builds misaligned AGI? How can we stop the race between nation-states that may lack verification mechanisms? How can we identify and enforce red lines while companies are actively looking for loopholes and other ways to push the boundaries?
The point of this comment is less to say “this definitely can’t be done” (although I do think such a future is fairly implausible/unsustainable), and more to say “why did you not address this objection?” You probably ought to have a dedicated section that very clearly addresses this objection in detail. Without such a clearly sign-posted section, I felt like I mostly wasted my time skimming your article, to be entirely honest
To go a step further, I think it's important for people to recognize that you aren't necessarily just representing your own views; poorly articulated views on AI safety could crucially undermine the efforts of many people who are trying to persuade important decision-makers of these risks. I'm not saying to "shut up," but I think people need to at least be more careful with regards to quotes like the one I provided above—especially since that last bullet point wasn't even necessary to get across the broader concern (and, in my view, it was wrong insofar as it tried to legitimize the specific claim).
Setting aside all of my broader views on this post and its content, I want to emphasize one thing:
But in the last few years, we’ve gotten:
[...]
- AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark, for
I think that this is painfully overstated (or at best, lacks important caveats). But regardless of whether you agree with that, I think it should be clear that this does not send signals of good epistemics to many of the fence-sitters[1] you'd presumably like to persuade.
(Note: Sen also addresses the above quote in a separate comment, but I didn't feel his point and tone was similar to mine, so I wanted to comment this separately.)
I would probably consider myself in this category. Note, however, I am not just talking about skeptics who are very unlikely to change their views.
In short, surveillance costs (e.g., "make sure they aren't plotting against you and try detonating a nuke or just starting a forest fire out of spite") might be higher than the costs of simply killing the vast majority of people. Of course, there is some question to be had about whether it might consider it worthwhile to study some 0.00001% of humans locked in cages, but again that might involve significantly higher costs than if it just learned how to recreate humans from scratch as it did a lot of other learning about the world.
But I'll grant that I don't know how an AGI would think or act, and I can't definitively rule out the possibility, at least within the first 100 years or so.
Day 5 of forced writing with an accountability partner!
Leverage wrote a report on “argument mapping” in the early 2010s and published the findings in 2020. I am very interested in ”argument mapping”[1] for tough analytical problems like AI policy, and multiple people have directed me to this report when I bring up the topic. I think this report raises some important points but its findings are probably flawed—or at the very least, people reading the report probably derive an overly-pessimistic view of “argument mapping” as a whole, especially given that the evaluation metrics are strange.[2]
Rather than focus on where I agree with the report, in this shortform I will just briefly outline some of the qualms I have with this report. I do not consider these rebuttals definitive—I recognize that there may be more to the research than I can see—but I could not easily determine if/how the report responds to some of these criticisms (which has notable irony to it). Some of these objections include:
This term is painfully broad and, as Leverage demonstrates, often is used to refer to methods which I would not endorse, such as when they try create deductive arguments or otherwise heavily use formal logic. However, in lieu of a better term at the moment, I will continue referring to argument mapping in scare quotes.
Thus, it might be possible to claim that the report was accurate in its findings, but that the problem simply comes from misinterpretation. I think that the scope itself was problematic and undesirable, but in this shortform I will reserve deeper judgments on the matter.
I couldn’t quickly verify whether the report used alternative terms to get at this idea, but I don’t recall seeing this on previous occasions when I half-skimmed-half-read the report...
TAI seems like a partially good example for illustrating my point: I agree that it's crucial that people have the same thing in mind when debating about TAI in a discussion, but I also think it's important to recognize that the goal of the discussion is (probably!) not "how should everyone everywhere define TAI" and instead is probably something like "when will we first see 'TAI.'" In that case, you should just choose whichever definition of TAI makes for a good, productive discussion, rather than trying to forcefully hammer out "the definition" of TAI.
I say partially good, however, because thankfully the term TAI has not taken such historically established root in people's minds and in dictionaries, so I think (hope!) most people accept there is not "a (single) definition."
Words like "science," "leadership," "Middle East," and "ethics," however... not the same story 😩🤖
Day 4 of forced writing with an accountability partner!
The Importance (and Potential Failure) of "Pragmatism"[1] in Definitional Debates
In various settings, whether it's competitive debate, the philosophy of leadership class I took in undergrad, or the ACX philosophy of science meet-up I just attended, it's common for people to engage in definitional debates. For example, what is “science?” What is “leadership?” These questions touch on some nerves with people who want to defend or challenge the general concept in question, and it drives people towards debating about “the right” definitions—even if they don’t always say it that way. In competitive debate, debaters will sometimes explicitly say that their definition is the “right” definition, while in other cases they may say their definition is “better” with a clear implication that they mean “more correct” (e.g., "our dictionary/source is better than yours").
My initial (hot?) takes here are twofold:
First, when you find yourself in a muddy definitional debate (and you actually want to make progress), stop running on autopilot where you debate about whose definitions are “correct,” and focus instead on asking the pragmatic question: which definition is more helpful for answering specific questions, solving specific problems, or generally facilitating better discussion? Instead of getting stuck on abstract definitions, it's important to tailor the definition to the purpose of the discussion. For example, if you’re trying to run a study on the effects of individual “leadership” on business productivity, you should make sure anyone reading the study knows how you operationalized that variable (and make a clear warning to not misinterpret it). Similarly, if you’re judging a competitive debate, I’ve written about the importance of "debate theory[2] which makes debate more net beneficial," rather than blindly following norms or rules even in the face of loopholes or nonsense. In short, figure out what you’re actually optimizing for and optimize for that, with the recognition that it may not be some abstract (and perhaps purely nonexistent) notion of “correctness.” (To add an addendum, I would emphasize that regardless of whether this seems obvious to people when actually written down, in practice it just isn’t obvious to people in so many discussions I’ve been in; autopilot is subtle and powerful.)
Second, sometimes the first point is misleading and you should reject it and run on autopilot when it comes to definitions. As much as I liked Pragmatism [read: Consequentialism?] as a unifying, bedrock theory of competitive debate, I acknowledged that even Pragmatism could theoretically say "don't always think in terms of Pragmatism" and instead advocate defaulting to principles like “follow the rules unless there is abundantly clear reason not to.” Maybe there is no perfect definition of things like "elephant," but the definitions that exist are good enough for most conversations that you shouldn’t interrupt discussions and break out the Pragmatism argument to defend someone who starts saying that warthogs are elephants. So-called "Utilitarian calculus" even in its mild forms can easily be outperformed by rules of thumb and heuristics; humans are imperfect (e.g., we aren’t perfectly unitary in our own interests) and might be subject to self-deception/bias; all computational systems face constraints on data collection and computation (along with communication bandwidth and other capacity for enacting plans). To oversimplify and make nods to Kahneman’s System 1 vs. System 2 concept, I posit that humans can engage in cluster-y "modes of thought," and it’s hard to actually optimize in the spaces between those modes of thought. Thus, it’s sometimes better to just default to regular conversational autopilot regarding abstract “correctness” of definitions when the "rightness factor" in a given context is something like 0.998 (unless you are trying to focus on the .002 exception case).
I don't have the time or brainpower to go in greater detail on the synthesis of these two points, but I think they ought to be highlighted.
[Update, 3/29/23: I meant to clarify that I realize "Pragmatism" is an actual label that some people use to refer to a philosophical school of thought, but I'm not using it in that way here.]
I use the term "debate theory" in a broad sense that includes questions like “how to decide which definitions are better.” More generally, I would probably describe it as "meta-level arguments about how people—especially judges—should evaluate something in debate, such as whether some type of argument is 'legitimate.'
Day 3 of writing with an accountability partner!
In my previous shortform, I introduced Top God Alignment, a foolproof gimmick alignment strategy that is basically “simulation argument + Pascal’s Wager + wishful chicanery.” In this post I will address some of the objections I’ve already heard, expect other people have, or have thought of myself.
(Note, Nate Soares was just unoccupied in a social setting when I asked this question)
In hindsight, this seems quite obviously wrong, and efforts to extend more olive branches seems like it would have obviously been better—even if only to legibly demonstrate that safetyists attempted to play nice.