Random Developer - LessWrong

Modern Transformers are AGI, and Human-Level

Yeah, the precise ability I'm trying to point to here is tricky. Almost any human (barring certain forms of senility, severe disability, etc) can do some version of what I'm talking about. But as in the restaurant example, not every human could succeed at every possible example.

I was trying to better describe the abilities that I thought GPT-4 was lacking, using very simple examples. And it started looking way too much like a benchmark suite that people could target.

Suffice to say, I don't think GPT-4 is an AGI. But I strongly suspect we're only a couple of breakthroughs away. And if anyone builds an AGI, I am not optimistic we will remain in control of our futures.

Modern Transformers are AGI, and Human-Level

Random Developer1yΩ372

Yes, this is almost exactly it. I don't expect frontier LLMs to carry out a complicated, multi-step process and recover from obstacles.

I think of this as the "squirrel bird feeder test". Squirrels are ingenious and persistent problem solvers, capable of overcoming chains of complex obstacles. LLMs really can't do this (though Devin is getting closer, if demos are to be believed).

Here's a simple test: Ask an AI to open and manage a local pizza restaurant, buying kitchen equipment, dealing with contractors, selecting recipes, hiring human employees to serve or clean, registering the business, handling inspections, paying taxes, etc. None of these are expert-level skills. But frontier models are missing several key abilities. So I do not consider them AGI.

However, I agree that LLMs already have superhuman language skills in many areas. They have many, many parts of what's needed to complete challenges like the above. (On principle, I won't try to list what I think they're missing.)

I fear the period between "actual AGI and weak ASI" will be extremely short. And I don't actually believe there is any long-term way to control ASI.

I fear that most futures lead to a partially-aligned super-human intelligence with its own goals. And any actual control we have will be transitory.

ChatGPT can learn indirect control

Random Developer1y129

One thing we know about these models is that they're good at interpolating within their training data, and that they have seen enormous amounts of training data. But they're weak outside those large training sets. They have a very different set of strengths and weaknesses than humans.

And yet... I'm not 100% convinced that this matters. If these models have seen a thousand instances of self-reflection (or mirror test awareness, or whatever), and if they can use those examples to generalize to other forms of self-awareness, then might that still give them very rudimentary ability to pass the mirror test?

I'm not sure that I'm explaining this well—the key question here is "does generalizing over enough examples of passing the 'mirror test' actually teach the models some rudimentary (unconscious) self-awareness?" Or maybe, "Will the model fake until it makes it?" I could not confidently answer either way.

The Worst Form Of Government (Except For Everything Else We've Tried)

Random Developer1y30

I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate.

Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act.

And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.

On Devin

Random Developer1y50

Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming.

I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let them work for a bit.

I do not think these people were representative of the average working programmer, based on my experiences consulting at a variety of companies. The average engineer can write code.

Gunnar_Zarncke's Shortform

Random Developer1y52

It's surprising that it's taken this long, given how good public AI coding assistants were a year ago.

The way I explain this to people is that current LLMs can be modeled as having three parts:

1. The improv actor, which is is amazing.
2. The reasoner, which is inconsistent but not totally hopeless at simple things.
3. The planner/execution/troubleshooting engine, which is still inferior to the average squirrel trying to raid a bird feeder.

Copilot is designed to rely on (1) and (2), but it is still almost entirely reliant on humans for (3). (GPT 4 Code Interpeter is slightly better at (3).)

Since I don't really believe in any reliable way to control a super-human intelligence for long, I do not look forward to people completely fixing (3). Sometime after that point, we're either pets or paperclips.

AI things that are perhaps as important as human-controlled AI

Random Developer1y72

Making AIs wiser seems most important in worlds where humanity stays in control of AI. It’s unclear to me what the sign of this work is if humanity doesn’t stay in control of AI.

A significant fraction of work on AI assumes that humans will somehow be able to control entities which are far smarter than we are, and maintain such control indefinitely. My favorite flippant reply to that is, "And how did that work out for Homo erectus? Surely they must have benefited enormously from all the technology invented by Homo sapiens!" Intelligence is the ultimate force multiplier.

If there's no mathematical "secret" to alignment, and I strongly suspect there isn't, then we're unlikely to remain in control.

So I see four scenarios if there's no magic trick to stay in control:

We're wise enough refrain from building anything significantly smarter than us.
We're pets. (Loss of control)
We're dead. (X-risk)
We envy the dead. (S-risk)

I do not have a lot of hope for (1) without dramatic changes in public opinion and human society. I've phrased (2) provocatively, but the essence is that we would lose control. (Fictional examples are dangerous, but this category would include the Culture, CelestAI or arguably the Matrix.) Pets might be beloved or they might be abused, but they rarely get asked to participate in human decisions. And sometimes pets get spayed or euthanized based on logic they don't understand. They might even be happier than wild animals, but they're not in control of their own fate.

Even if we could control AI indefinitely (and I don't think we can), there is literally no human organization or institution I would trust with that power. Not governments, not committees, and certainly not a democratic vote.

So if we must regrettably build AI, and lose all control over the future, then I do think it matters that the AI has a decent moral and philosophical system. What kind of entity would you trust with vast, unaccountable, inescapable power? If we're likely to wind up as pets of our own creations, then we should definitely try to create kind, ethical and what you call "unfussy" pet owners, and ones that respect real consent.

Or to use a human analogy, try to raise the sort of children you'd want to pick your nursing home. So I do think the philosophical and moral questions matter even if humans lose control.

Why do we need an understanding of the real world to predict the next tokens in a body of text?

Random Developer1y10

I suspect ChatGPT 4's weaknesses come from several sources, including:

It's effectively amnesiac, in human terms.
If you look at the depths of the neural networks and the speed with which they respond, they have more in common with human reflexes than deliberate thought. It's basically an actor doing a real-time improvisation exercise, not a writer mulling over each word. The fact that it's as good as it is, well, it's honestly terrifying to me on some level.
It has never actually lived in the physical world, or had to solve practical problems. Everything it knows comes from text or images.

Most people's first reaction to ChatGPT is to overestimate it. Then they encounter various problems, and they switch to underestimating it. This is because we're used to interacting with humans. But ChatGPT is very unlike a human brain. I think it's actually better than us at some things, but much worse at other key things.

Why do we need an understanding of the real world to predict the next tokens in a body of text?

Random Developer1y90

You're asking good questions! Let me see if I can help explain what other people are thinking.

It doesn’t understand why “a pink flying sheep” is a language construct and not something that was observed in the real world.

When talking about cutting edge models, you might want to be careful when making up examples like this. It's very easy to say "LLMs can't do X", when in fact a state-of-the-art model like GPT 4 can actually do it quite well.

For example, here's what happens if you ask ChatGPT about "pink flying sheep". It realizes that sheet are not supposed to be pink or to fly. So it proposes several hypotheses, including:

Something really weird is happening. But this is unlikely, given what we know about sheep.
The observer might be on drugs.
The observer might be talking about a work of art.
Maybe someone has disguised a drone as a pink flying sheep.

...and so on.

Now, ChatGPT 4 does not have any persistent memories, and it's pretty bad at planning. But for this kind of simple reasoning about how the world works, it's surprisingly reliable.

For an even more interesting demonstration of what ChatGPT can do, I was recently designing a really weird programming language. It didn't work like any popular language. It was based on a mathematical notation for tensors with implicit summations, it had a Rust-like surface syntax, and it ran on the GPU, not the CPU. This particular combination of features is weird enough that ChatGPT can't just parrot back what it learned from the web.

But when I have ChatGPT a half-dozen example programs in this hypothetical language, it was perfectly capable of writing brand-new programs. It could even recognize the kind of problems that this language might be good for solving, and then make a list of useful programs that couldn't be expressed in my language. It then implemented common algorithms in the new language, in a more or less correct fashion. (It's hard to judge "correct" in a language that doesn't actually exist.)

I have had faculty advisors on research projects who never engaged at this level. This was probably because they couldn't be bothered.

However, please note that I'm am not claiming that ChatGPT is "conscious" or anything like that. If I had to guess, I would say that it very likely isn't. But that doesn't mean that it can't talk about the world in reasonably insightful ways, or offer mostly-cogent feedback on the design of a weird programming language. When I say "understanding", I don't mean it in some philosophical sense. I mean it in the sense of drawing practical onclusions about unfamiliar scenarios. Or to use a science fiction example, I wouldn't actually care whether SkyNet experiences subjective consciousness. I would care whether it could manufacture armed robots and send them to kill me, and whether it could outsmart me at military strategy.

However, keep in mind that despite GPT 4's strengths, it also has some very glaring weaknesses relative to ordinary humans. I think that the average squirrel has better practical problem-solving skills than GPT 4, for example. And I'm quite happy about this, because I suspect that building actual smarter-than-human AI would be about as safe as smashing lumps of plutonium together.

Does this help answer your question?

How has internalising a post-AGI world affected your current choices?

Answer by Random DeveloperFeb 05, 202472

I have a non-zero probability for ASI in my lifetime, and I treat it in the same fashion I would risks like "Maybe we all get killed by a magnetar crust quake a thousand light-years away" or "Maybe I die in a car accident." I plan as if worst won't come to worst, and I try to use the time I have better.

Conditional on actually building ASI, my P(doom) + P(pets) is greater than 95%, where P(pets) covers scenarios like "a vastly super-human AI keeps us around out of nostalgia or a sense of ethics, but we're not in control in any meaningful sense." Scenarios like the Culture and CelestAI would fall under P(pets). And "successful" alignment work just shifts some probability mass from P(doom) to P(pets), or improves the P(pets) outcomes to slightly better versions of the same idea.

And even though I don't really believe in scenarios like "An ASI obeys a specific human or group of humans," or "An ASI obeys the democratic will of humanity", I suspect that most of those hypothetical outcomes would be deeply dystopian at best, and very possibly worse on average than P(pets).

Which leaves two major questions, P(ASI|AGI), and the timelines for each step. I would be surprised by very short timelines (ASI in 2027), but I would also be at least mildly surprised not to see ASI within 100 years.

So I wish we would stop banging together large hunks plutonium to see what happens. But if we get even close to AGI, I expect governance structures to break down completely, and the final human decisions to most likely be made by sociopaths, fools and people who imagine that they are slaves to competitive forces.

So overall, I plan for the possibility of ASI the same way I would plan for personal death or a planetary-scale natural disaster with 100% mortality but unknown probability of occurring. I don't plan for P(pets), because it's not like my decisions now would have much influence on what happens in such a scenario.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments