Yes, this is almost exactly it. I don't expect frontier LLMs to carry out a complicated, multi-step process and recover from obstacles.
I think of this as the "squirrel bird feeder test". Squirrels are ingenious and persistent problem solvers, capable of overcoming chains of complex obstacles. LLMs really can't do this (though Devin is getting closer, if demos are to be believed).
Here's a simple test: Ask an AI to open and manage a local pizza restaurant, buying kitchen equipment, dealing with contractors, selecting recipes, hiring human employees to serve ...
One thing we know about these models is that they're good at interpolating within their training data, and that they have seen enormous amounts of training data. But they're weak outside those large training sets. They have a very different set of strengths and weaknesses than humans.
And yet... I'm not 100% convinced that this matters. If these models have seen a thousand instances of self-reflection (or mirror test awareness, or whatever), and if they can use those examples to generalize to other forms of self-awareness, then might that still give them ve...
Come to think of it, how is it that humans pass the mirror test? There's probably a lot of existing theorizing on this, but a quick guess without having read any of it: babies first spend a long time learning to control their body, and then learn an implicit rule like "if I can control it by an act of will, it is me", getting a lot of training data that reinforces that rule. Then they see themselves in a mirror and notice that they can control their reflection through an act of will...
This is an incomplete answer since it doesn't explain how they learn to ...
I think veto powers as part of a system of checks and balances are good in moderation, but add to many of them and you end up with a stalemate.
Yes, there's actually some research into this area: https://www.jstor.org/stable/j.ctt7rvv7 "Veto Players: How Political Institutions Work". The theory apparently suggested that if you have too many "veto players", your government quickly becomes unable to act.
And I suspect that states which are unable to act are vulnerable to major waves of public discontent during perceived crises.
Rather, people who suck at programming (and thus can't get jobs) apply to way more positions than people who are good at programming.
I have interviewed a fair number of programmers, and I've definitely seen plenty of people who talked a good game but who couldn't write FizzBuzz (or sum the numbers in an array). And this was stacking the deck in their favor: They could use a programming language of their choice, plus a real editor, and if they appeared unable to deal with coding in front of people, I'd go sit on the other side of the office and let th...
It's surprising that it's taken this long, given how good public AI coding assistants were a year ago.
The way I explain this to people is that current LLMs can be modeled as having three parts:
1. The improv actor, which is is amazing.
2. The reasoner, which is inconsistent but not totally hopeless at simple things.
3. The planner/execution/troubleshooting engine, which is still inferior to the average squirrel trying to raid a bird feeder.
Copilot is designed to rely on (1) and (2), but it is still almost entirely reliant on humans for (3). (GPT 4 Code ...
Making AIs wiser seems most important in worlds where humanity stays in control of AI. It’s unclear to me what the sign of this work is if humanity doesn’t stay in control of AI.
A significant fraction of work on AI assumes that humans will somehow be able to control entities which are far smarter than we are, and maintain such control indefinitely. My favorite flippant reply to that is, "And how did that work out for Homo erectus? Surely they must have benefited enormously from all the technology invented by Homo sapiens!" Intelligence is the ultimate f...
I suspect ChatGPT 4's weaknesses come from several sources, including:
You're asking good questions! Let me see if I can help explain what other people are thinking.
It doesn’t understand why “a pink flying sheep” is a language construct and not something that was observed in the real world.
When talking about cutting edge models, you might want to be careful when making up examples like this. It's very easy to say "LLMs can't do X", when in fact a state-of-the-art model like GPT 4 can actually do it quite well.
For example, here's what happens if you ask ChatGPT about "pink flying sheep". It realizes that sheet are not supp...
I have a non-zero probability for ASI in my lifetime, and I treat it in the same fashion I would risks like "Maybe we all get killed by a magnetar crust quake a thousand light-years away" or "Maybe I die in a car accident." I plan as if worst won't come to worst, and I try to use the time I have better.
Conditional on actually building ASI, my P(doom) + P(pets) is greater than 95%, where P(pets) covers scenarios like "a vastly super-human AI keeps us around out of nostalgia or a sense of ethics, but we're not in control in any meaningful sense." Scenarios l...
If I had to summarize your argument, it would be something like, "Many people's highest moral good involves making their ideological enemies suffer." This is indeed a thing that happens, historically.
But another huge amount of damage is caused by people who believe things like "the ends justify the means" or "you can't make an omelette without breaking a few eggs." Or "We only need 1 million surviving Afghanis [out of 15 million] to build a paradise for the proletariat," to paraphrase an alleged historical statement I read once. The people who say things l...
Yeah, the precise ability I'm trying to point to here is tricky. Almost any human (barring certain forms of senility, severe disability, etc) can do some version of what I'm talking about. But as in the restaurant example, not every human could succeed at every possible example.
I was trying to better describe the abilities that I thought GPT-4 was lacking, using very simple examples. And it started looking way too much like a benchmark suite that people could target.
Suffice to say, I don't think GPT-4 is an AGI. But I strongly suspect we're only a couple of breakthroughs away. And if anyone builds an AGI, I am not optimistic we will remain in control of our futures.