Review

Computers don't have any sense of aesthetics or patterns that are standard the way people learn how to play chess. They play what they think is the objectively best move in any position, even if it looks absurd, and they can play any move no matter how ugly it is." - Murray Campbell quote about DeepBlue

 

Vinge's principle states: "we usually think we can't predict exactly what a smarter-than-us agent will do, because if we could predict that, we would be that smart ourselves".

A popular idea think this means that AGI would invent and use new technology such as nanorobotics to defeat us (this is the example Yudkowsky usually gives).

However, this doesn't seem to jive with what happens in other domains where AI becomes superhuman. Usually what the AI does is understandable to humans. It's just that it looks, well, dumb.

For example, in chess, computers use roughly the same piece evaluation that humans discovered in the 18th century, didn't discover any new openings, and generally seemed to play ugly moves. But they won anyways.

If something like nanorobotics lets you take over the world, you'd expect a human group to be trying to create them to take over the world already because it seems to make sense. In reality, any plan that (for example) relies on DNA as a stepping stone will quickly run into regulatory problems.

Instead, I imagine that the AGI's plan will elicit similar reactions as the following:

You get a phone call advertising free Mayonnaise! You just need to follow a couple simple steps. The next day, you're confused and in some sort of Mayonnaise cult breaking into a military armory in Mexico.

Is this plan something that humans can try? No, it seems pretty straight forward to attempt. So why haven't we tried it? Because it seems and likely is dumb. Why mayonnaise? Why a phone call? Why Mexico?

But if AGI is similar to other superhuman AI, this is the type of thing we expect to see; a strategy that looks dumb but works. We have no way to predict which dumb strategy will be used, but given the large number of strategies that look dumb to humans, the AGI's strategy is likely to be one of them. And it has enough Yomi to predict which one will succeed.

New Comment
22 comments, sorted by Click to highlight new comments since:
[-]p.b.4538

I think this extrapolates far from one example and I'm not sure the example applies all that well. 

Old engines played ugly moves because of their limitations, not because playing ugly moves is a super power. They won anyway because humans cannot out calculate engines. 

AlphaZero plays beautiful games and even todays standard engines don't play ugly or dumb looking moves anymore. I think in the limit superior play will tend to be beautiful and elegant. 

If there is a parallel between early super human chess and AGI takeover it will be that AGI uses less than brillant strategies that still work because of flawless or at least vastly superhuman execution. But these strategies will not look dump or incomprehensible. 

I'd provide a counterexample analogy: speedruns.

Many high-level speedruns (and especially TAS runs) often look like some combination of completely stupid/insane/incomprehensible to casual players. Nevertheless, they work for the task they set out to do far more effectively than trying to beat the game quickly with "casual strats" would get you.

I think seeing a sufficiently smart AI doing stuff in the real world would converge to looking a lot like that from our POV.

Counterpoint while working within the metaphor: early speedruns usually look like exceptional runs of the game played casually, with a few impressive/technical/insane moves thrown in.

Counterpoint: such strategies typically requires lots of iteration with perfect emulation of the system targeted to develop (I'm thinking in particular of glitch exploitation). Robust strategies might appear more "elegant."

This seems like a generalization of something that humans are also guilty of. The way we win against other animals also can look kind of dumb from the perspective of those animals.

Suppose you're a cheetah. The elegant, smart way to take down pray is chase them down in a rapid sprint. The best takedowns are ones where you artfully outmaneuver your prey and catch them right at a moment when they think they are successfully evading you.

Meanwhile you look on humans with disdain. They can take down the same prey as you, but they do it in dumb ways. Sometimes they throw things at the prey to kill it. Other times they run it down until it collapses from heat exhaustion. That works, you have to admit, but it seems ugly and inelegant.

Because we have greater general intelligence we can try strategies that other animals can't. These strategies look really weird to them on their own values. I read your point as saying we're in a similar relationship to AGI: it will be able to do things we can't and so will sometimes solve problems in weird, inelegant ways that nonetheless work.

When humans choose their actions, they often think about the impact those actions will have on their status. They often don't play to win; they play to impress the observers. (Yes, winning is impressive, but if winning using certain moves is more impressive than winning using some other moves, even if the probability of the latter strategy is greater, many will choose the former.) AI would not care about status, if it expects that humans will soon be dead. AI would not over-complicate things when not necessary, because it is not trying to signal its superhuman intelligence to a hypothetical observer.

For example, people keep making fun of the "Nigerian prince" scams, but they continue to exist, because apparently they work. Who knows, maybe the same technology can be used to destroy humanity. Like, send everyone an SMS at the same time, asking them to follow your commands, and promising them millions of dollars if they obey. Ask for something simple and harmless first to train compliance, then ask them to do something such that if 1 person in 1000 does it, the civilization will collapse. Maybe 1 in 1000 will actually do it.

(Among other reasons, this plan sounds stupid, because the phone operators could trivially stop it by blocking the SMS functionality for a while. Yeah, but maybe if you approach everyone at the same time and the whole action takes less than an hour, they won't react quickly enough. In hindsight, it will be obvious that disabling SMS quickly would have been the right move, but at the moment... it will seem just like a weird prank, and disabling SMS will seem like a very serious move with possible impact on profits that requires approval of the important people; and if that happens on a weekend, people will hesitate to bother the important ones.)

Also, you only need to destroy humanity once, so if you try dozen stupid plans in parallel, even if each of them is more likely to fail than to succeed...

I think when we say that an adversarial attack is "dumb" or "stupid" what we are really implying is that the hack itself is really clever but it is exploiting a feature that is dumb or stupid.  There are probably a lot of unknown-to-us features of the human brain that have been hacked together by evolution in some dumb, kludgy way that AI will be able to take advantage of, so your example above is actually an example of the AI being brilliant but us humans being dumb.  But I get what you are saying that that whole situation would indeed seem "dumb" if AI was able to hack us like that.  

This reminds me of a lecture 8-Bit Guy did on phone phreaking in the 1980s. "How Telephone Phreaking Worked."  Some of those tricks do indeed seem "dumb," but it's dumb more in the sense that the telephone network was designed without sufficient forethought to be susceptible to someone playing a blue whistle that you could get from a Captain Crunch cereal box that just happened to play the correct 2600 hz frequency to trick phones into registering a call as a toll-free 1-800 call.  The hack itself was clever, but the design it was preying upon and the overall situation was kinda dumb. 

Solving for "A viable attack, maximum impact" given an exhaustive list of resources and constraints seems like precisely the sort of thing GPT-4-level AI can solve with aplomb when working hand in hand with a human operator. As the example of shooting a substation, humans could probably solve this in a workshop-style discussion with some Operations Research principles applied, but I assume the type of people wanting to do those things probably don't operate in such functional and organized ways. When they do, it seems to get very bad.

The LLM can easily supply cross-domain knowledge and think within constraints. With a bit of prompting and brainstorming, one could probably come up with a dozen viable attacks in a few hours. So the lone bad actor doesn't have to assemble a group of five or six people who are intelligent, perhaps educated, and also want to do an attack. I suspect the only reason people already aren't prompting for such methods and then setting up automation of them is the existence of guardrails. When truly opensource LLMs get to GPT-4.5 capability and good interfaces to the internet and other software tools (such as phones), we may see a lot of trouble.  Fewer people would have the drive and intellect needed (at least early on) to carry out such an attack, but those few could cause very outsized trouble.

TL;DR:  The "Fun" starts waaaaaaay before we get to AGI.

As I think more about this, the LLM as a collaborator alone might have a major impact.  Just off the top of my head, a kind of Rube Goldberg attack might be <redacted for info hazard>.  Thinking about it in one's isolated mind, someone might never consider carrying something like that out.  Again, I am trying to model the type of person who carries out a real attack, and I don't estimate that person having above-average levels of self confidence.  I suspect the default is to doubt themselves enough to avoid acting in the same way most people do about their entreprenurial ideas.

However, if they either presented it to an LLM for refinement, or if the LLM suggested it, there could be just enough psychological boost of validity to push them over the edge to trying it.  And after a few successes on the news of either "dumb" or "bizarre" or "innovative" attacks being successful due to "AI telling these people how to do it" then the effect might get even stronger.

To my knowledge, one could have bought an AR-15 since the mid to late 1970s.  My cousin has a Colt from 1981 he bought when he was 19.  Yet people weren't mass shooting each other, even during times when the overall crime/murder rate was higher than it is now.  Some confluence of factors has driven the surge, one of them probably being a strong meme, "Oh, this actually tends to '''work.''"  Basically, a type of social proofing of efficacy.

And I am willing to bet $100 that the media will report big on the first few cases of "Weird Attacks Designed by AI."

It seems obvious to me that the biggest problems in alignment are going to be the humans, both long before the robots, and probably long after.

Thomas Griffiths' paper Understanding Human Intelligence through Human Limitations argues that the aspects we associate with human intelligence – rapid learning from small data, the ability to break down problems into parts, and the capacity for cumulative cultural evolution – arose from the 3 fundamental limitations all humans share: limited time, limited computation, and limited communication. (The constraints imposed by these characteristics cascade: limited time magnifies the effect of limited computation, and limited communication makes it harder to draw upon more computation.) In particular, limited computation leads to problem decomposition, hence modular solutions; relieving the computation constraint enables solutions that can be objectively better along some axis while also being incomprehensible to humans: 

A key attribute of human intelligence is being able to break problems into parts that can individually be solved more easily, or that make it possible to reuse partial solutions discovered through previous experience. These methods for making computational problems more tractable such ubiquitous part of human intelligence that they seem to be an obligatory component of intelligence more generally. One example of this is forming subgoals. The early artificial intelligence literature, inspired by human problem-solving, put a significant emphasis on reducing tasks to a series of subgoals. 

However, forming subgoals is not a necessary part of intelligence, it’s a consequence of having limited computation. With a sufficiently large amount of computation, there is no need to have subgoals: the problem can be solved by simply planning all the way to the final goal. 

Go experts have commented that new AI systems sometimes produce play that seems alien, precisely because it was hard to identify goals that motivated particular actions [13]. This makes perfect sense, since the actions that taken by these systems are justified by the fact that they are most likely to yield a small expected advantage many steps in the future rather than because they satisfy some specific subgoal.

Another example where human intelligence looks very different from machine intelligence is in solving the Rubik’s cube. Thanks to some careful analysis and a significant amount of computation, the Rubik’s cube is a solved problem: the shortest path from any configuration to an unscrambled cube has been identified, taking no more than 20moves [45]. However, the solution doesn’t have a huge amount of underlying structure – those shortest paths are stored in a gigantic lookup table. Contrast this with the solutions used by human solvers. A variety of methods for solving the cube exist, but those used by the fastest human solvers require around 50 moves. These solutions require memorizing a few dozen to a few hundred “algorithms” that specify transformations to be used at particular points in the process. Methods also have intermediate subgoals, such as first solving an entire side. 

(Speedruns are another relevant intuition pump.) 

This is why I don't buy the argument that "in the limit, superior strategies will tend to be beautiful and elegant", at least for strategies generated by AIs far less limited than humans are w.r.t. time, compute and communication. I don't think they'll necessarily look "dumb", just not decomposable into human working memory-sized parts, hence weird and incomprehensible (and informationally overwhelming) from our perspective. 

Since the topic of chess was brought up: I think the right intuition pump is endgame tablebase, not moves played by AlphaZero. A quote about KRNKNN mate-in-262 discovered by endgame tablebase from Wikipedia:

Playing over these moves is an eerie experience. They are not human; a grandmaster does not understand them any better than someone who has learned chess yesterday. The knights jump, the kings orbit, the sun goes down, and every move is the truth. It's like being revealed the Meaning of Life, but it's in Estonian.

[-][anonymous]40

An AI, even if it's capable of manipulating the world beyond our understanding, will likely opt for non-chaotic plans over chaotic plans. A plan built on control of a web of people via emotional manipulation is still subject to the chaos of random things happening to those people, differences in personality, the full entropy of the human experience. What is predictable is an X-maximizer will seek plans that minimize chaos, thus it's preference for those plans would not seem as weird to us.

I agree with this take, but do those plans exist, even in theory?

For a really good example of what I would consider a 'dumb' way for AGI misalignment to be problematic, I recommend "accelerando" by charles stross. It's available in text/html form for free from his web site. Even now, after 20 years, it's still very full of ideas.

(FYI, sections are about ten years apart in the book, but last I read it it seemed like the dates are off by a factor of two or so. Eg. 2010 in the book corresponds loosely to 2020 in real life, 2020 in the book corresponds loosely to 2040, etc.)

In that book, the badness largely comes from increasingly competent / sentient corporate management and legal software.

A useful model is Elon Musk. He has an unusual talent of making massively successful decisions that were seen as dumb by most experts at the time. 

The list includes:

  • Launching a rocketry startup
  • Pursuing reusable rockets
  • Investing into a car startup
  • Pursuing electric sport cars
  • Launching a satellite internet service (too early to judge, but is starting to look like a genius move too)
  • Buying Twitter (same)

I don't feel very confident guessing his IQ, but I think it's safe to assume he's somewhere in the top 0.1% of all humans. 

Funnily enough, many people still call him dumb, even after SpaceX and Tesla (e.g. read any thread about him on Reddit). 

As Henry David Thoreau said,

We are apt to class those who are once-and-a-half-witted with the half-witted, because we appreciate only a third part of their wit.

Hyperloop? I am not sold on his talent being "find good things to do" as opposed to "successfully do things". And second has a lot to do with energy/drive, not only intellect. Hence I expect his intelligence be overestimated. But I agree with your estimate, which is not what I expected?

I don't find it surprising; 0.1% is a fairly low bar here on LW. I'm not considered that unusual here, and my calibrated guess is that I'm in the 0.3% category. There's a million people in the USA alone at that level, and three hundred thousand at 0.1%. That's a wide pool to select from.

Personally I wouldn't be surprised if Musk was substantially above the top 0.1%. I've seen a number of technical interviews with him; he and I have similar backgrounds and technical field strengths; and we are approximately the same age. I feel able to properly evaluate his competence, and I do not find it lacking.

Not sure about hyperloop. Judging by this list, the idea is gaining some traction across the world, but so far only as feasibility studies, test tracks etc. 

Seems to be a natural evolutionary step for high-speed ground transport, but no idea if it makes economic sense yet, and if it's technically feasible with the current tech. Maybe in 50 years...

I don’t think the hyperloop matters one way or the other to your original argument (which I agree with). Someone can be a genius and still make mistakes and fail to succeed at every single goal. (For another example, consider Isaac Newton who a) wasted a lot of time studying alchemy and still failed to transform lead into gold and b) screwed up his day job at the Royal Mint so badly that England ended up with a de facto gold standard even though it was supposed to have both silver and gold currency. He’s still a world-historic genius for inventing calculus.)

[-][anonymous]10

Alchemists still performed experiments on chemical reactions, discovered new ones and described them, practiced how to separate substances and for that they developed tools and methods, that were later in chemistry. It's not like it was an inherent waste of time, it was a necessary stepping-stone to get to chemistry, which developed from it more gradually than it's typically acknowledged. 

Money is condensed optimisation. AI which can generates money may be stupid, but it can buy clever things. In other words, we may not need something significantly clever than bitcoin to take over the world. 

Bitcoin failed to destroy Earth for now, but some combination of crypto money and LLM could do it.

Another dumb but plausible way that AGI gets access to advanced chemicals, biotech, and machinery; someone asks "how do I make a lot of street drug X" and it snowballs from there.