Comment Permalink

Sam Marks21d8936

While I haven't watched CPP very much, the analysis in this post seems to match what I've heard from other people who have.

That said, I think claims like

So, how's it doing? Well, pretty badly. Worse than a 6-year-old would

are overconfident about where the human baselines are. Moreover, I think these sorts of claims reflect a general blindspot about how humans can get stuck on trivial obstacles in the same way AIs do.

A personal anecdote: when I was a kid (maybe 3rd or 4th grade, so 8 or 9 years old) I played Pokemon red and couldn't figure out how to get out of the first room—same as the Claude 3.0 Sonnet performance! Why? Well is it obvious to you where the exit to this room is?

r/pokemon - I got stuck in this room for hours since I couldn't figure out how to leave.

Answer: you have to stand on the carpet and press down.

Apparently this was a common issue! See this reddit thread for discussion of people who hit the same snag as me. In fact, it was a big enough issue that addressed it in the FireRed remake, making the rug stick out a bit:

I don't think this is an isolated issue with the first room. Rather, I think that as railroaded as Pokemon might seem, there's actually a bunch of things that it's easy to get crucially confused about, resulting in getting totally stuck for a dumb reason until someone helps you out.

Some other examples of similar things from the same reddit thread:

"Viridian Forest for me. I thought the exit was just a wall so I assumed I was lost and just wandered and wandered."
"When I got Blue, I traveled all the way to Mt. Moon, and all of my party fainted, right? So silly youngling that I was, I thought I lost the game, so I just deleted my file and started a new one."
"In Sapphire/Ruby there was a bridge/bike path that you had to walk under. Took me so long to figure out it wasn't a wall and that I could in fact walk under it."

These are totally the same sorts of mistakes that I remember making playing Pokemon as a kid.

Further, have you ever gotten an adult who doesn't normally play video games to try playing one? They have a tendency to get totally stuck in tutorial levels because game developers rely on certain "video game motifs" for load-bearing forms of communication; see e.g. this video.

I don't think this is specific to video games: In most things I try to do, I run up against stupid, fake walls where there's something obvious that I just "don't get." Fortunately, I'm able to do things like ask someone for a fresh pair of eyes or search the internet. Without this ability, I think I would have to abandon basically all of the core things I work on. When I need to help out people with worse "executive function"/"problem solving ability" than me—like relatives that need basic tech help—usually the main thing I do to unstuck them is "google their problem."

(As a more narrow point, I'm extremely dubious that the way to interpret howlongtobeat's 26 hour number as representing the time that it would take an average human to beat Pokemon Red, even assuming that the humans are adults and that we entirely discard failed playthroughs.)

Showing 3 of 16 replies (Click to show all)

1ErickBall19d

Yeah but we train AIs on coding before we make that comparison. And we know that if you train an AI on a videogame it can often get superhuman performance. Here we're trying to look at pure transfer learning, so I think it would be pretty fair to compare to someone who is generally competent but has never played videogames. Another interesting question is to what extent you can train an AI system on a variety of videogames and then have it take on a new one with no game-specific training. I don't know if anyone has tried that with LLMs yet.

2β-redex19d

I am not a 100% convinced by the comparison, because technically LLMs are only "reading" a bunch of source code, they are never given access to a compiler/interpreter. IMO actually running the code one has written is a very important part of learning, and I think it would be a much more difficult task for a human to learn to code just by reading a bunch of books/code, but never actually trying to write & run their own code.[1] Also, in the video linked earlier in the thread, the girlfriend playing Terraria is deliberately not given access to the wiki, and thus I believe is an unfair comparison. I expect to see much better human performance if you give them access to manuals & wikis about the game. Not sure either, but I agree that this would be an interesting experiment. (Human gamers are often much quicker at picking up new games and are much better at them than someone with no gaming background.) ---------------------------------------- 1. I would expect the average human to stay very bad at coding, no matter how many books & code examples you give them. I would also expect some smaller class of humans to nevertheless be able to pull that feat off. (E.g. maybe a mathematician well versed in formal logic, who is used to doing complex symbolic manipulation correctly "only on paper", could probably write non-trivial correct programs just by reading about the subject. In fact, a lot of stuff from computer science was worked out well before computers were built, e.g. Ada Lovelace is usually credited with writing the "first computer program", well before the first digital computer existed.) ↩︎

ErickBall19d10

I kind of see your point about having all the game wikis, but I think I disagree about learning to code being necessarily interactive. Think about what feedback the compiler provides you: it tells you if you made a mistake, and sometimes what the mistake was. In cases where it runs but doesn't do what you wanted, it might "show" you what the mistake was instead. You can learn programming just fine by reading and writing code but never running it, if you also have somebody knowledgeable checking what you wrote and explaining your mistakes. LLMs have tons of examples of that kind of thing in their training data.

See in context

164 So how well is Claude playing Pokémon?

by Julian Bradshaw

7th Mar 2025

6 min read

164

Background: After the release of Claude 3.7 Sonnet,^[1] an Anthropic employee started livestreaming Claude trying to play through Pokémon Red. The livestream is still going right now.

TL:DR: So, how's it doing? Well, pretty badly. Worse than a 6-year-old would, definitely not PhD-level.

Digging in

But wait! you say. Didn't Anthropic publish a benchmark showing Claude isn't half-bad at Pokémon? Why yes they did:

A chart showing the performance of the various Claude Sonnet models at playing Pokémon. The number of actions taken by the AI is on the x-axis; the milestone reached in the game is on the y-axis. Claude 3.7 Sonnet is by far the most successful at achieving the game's milestones.

and the data shown is believable. Currently, the livestream is on its third attempt, with the first being basically just a test run. The second attempt got all the way to Vermilion City, finding a way through the infamous Mt. Moon maze and achieving two badges, so pretty close to the benchmark.

But look carefully at the x-axis in that graph. Each "action" is a full Thinking analysis of the current situation (often several paragraphs worth), followed by a decision to send some kind of input to the game. Thirty-five thousand actions means an absolutely enormous amount of thought. Even for Claude, who thinks much faster than a human, ten thousand actions takes it roughly a full working week of 40 hours,^[2] so that 3-badge run took Claude nearly the equivalent of a month of full-time work, perhaps 140 hours. Meanwhile, the average human can beat the entirety of Red in just 26 hours, and with substantially less thought per hour.

What's going wrong?

Basically, while Claude is pretty good at short-term reasoning (ex. Pokémon battles), he's bad at executive function and has a poor memory. This is despite a great deal of scaffolding, including a knowledge base, a critic Claude that helps it maintain its knowledge base, and a variety of tools to help it interact with the game more easily.

What does that mean in practice? If you open the stream, you'll see it immediately: Claude on Run #3 has been stuck in Mt. Moon for 24 hours straight.^[3] On Run #2, it took him 78 hours to escape Mt. Moon.

Mt. Moon is not that complicated. It has a few cave levels, and a few trainers. But Claude gets stuck going in loops, trying to talk to trainers he's already beaten (and often failing to talk to them, not understanding why his inputs don't do what he expects), inching across solid walls looking for exits that can't possibly be there, always taking the obvious trap route rather than the longer correct route just because it's closer.

This hasn't been the only problem. Run #2 eventually failed because it couldn't figure out it needed to talk to Bill to progress, and Claude wasn't willing to try any new action when the same infinite rotation of wrong choices (walk in circles, enter the same buildings, complain the game is maybe broken) wasn't working.^[4]

A good friend of mine has been watching the stream quite a lot, and describes Claude's faults well. Lightly edited for clarity:

This current Claudeplayspokemon run is actually an interesting encapsulation of current limitations in llms

That are more fundamental than just "LLMs don't understand spatial reasoning" (but they don't)

They added a whole new memory system after Run #2 with short term and long-term text files and the ability to edit and store and archive and now he has a pretty good memory system that does improve navigation

but now the lack of executive planning and goal-planning and grasp of reality is really rearing its head

no amount of good memory system will save you if you just randomly see something and go:

> "Oh I've achieved my goal!"
> "Time to delete all my past files about achieving this goal!"

when goal was not actually achieved^[5]

It really lacks a lot of human ability to plan, hold multiple goals at once, prioritize, and just keep a grasp of what's going on

(re: goal orientation, you just have to witness its relative inability to simultaneously aim for the short-term goal while also leveling up its pokemon)

(It can level its pokemon if that's the goal right now, or move forward if that its goal right now, but it can't simultaneously level up pokemon while also moving forward)

(at least not to a human level of efficiency, it will half-heartedly do some combat while the team is healthy then immediately abandon any thought of leveling if the team is moderately injured)

(It's also not capable of changing goals on the fly and going "Well I'm too injured to make it, let's get some levels and bail")

The funny thing is that pokemon is a simple, railroady enough game that RNG can beat the game given enough time (and this has been done)^[6], but it turns out to take a surprising amount of cognitive architecture to play the game in a fully-sensible-looking way

and insufficient smarts can be surprisingly double-edged—an RNG run would arguably be better at both leveling and navigating mazes through sheer random walkitude and willingness to bash face into every fight

as opposed to getting stuck in loops or refusing to engage for bad reasons

Thanks for coming to my ted talk but my overall thesis is still Executive Function is an unsolved problem

(Executive Function is, reminder, goals, prioritization, attention, etc.)

What does this mean for AI?

It's obvious that, while Claude isn't very good at playing Pokémon, it is getting better. 3.7 does significantly better than 3.5 did, after all, and 3.0 was hopeless. So isn't its extremely hard-earned (half-random) achievements so far still progress in the right direction and an indication of things to come?

Well, yeah. But Executive Function (agents, etc.) has always been the big missing puzzle piece, and even with copious amounts of test-time compute, tool use, scaffolding, external memory, the latest and greatest LLM still is not at even a child's level.

But the thing about ClaudePlaysPokémon is that it feels so close sometimes. Each "action" is reasoned carefully (if often on the basis of faulty knowledge), and of course Claude's irrepressible good humor is charming. If it could just plan a little better, or had a little outside direction, it'd clearly blow through the game no problem.

Quoting my friend again (lightly edited again):

it's fairly obvious if there was just someone human there to monitor Claude and give it some guidance maybe once every hour, it'd probably be 10x further through the game now

That's meaningful insomuch as from a "taking people's jobs" point of view it may be possible to, even if the AI really sucks at goal planning, just chain 10 bad AIs to one employee and have them monitor
this still sucks ~9 jobs away from people

Another interesting thought: I talk about "Executive Function" and the brain regions associated with it are the largest and most recently evolved in humans

which makes sense

but a lot of goal-orientation and grasp on reality stuff is reasonably well-developed in the average squirrel

Not to a human level but better than most LLMs

So it's not just a matter of "this is hard stuff that only elite humans do"

No, a decent amount of this is fairly old stuff that's probably pretty deep in the brain, and the evolutionary pressure is presumably fairly strong to have developed it fast

but the financial pressure on LLMs doesn't seem to have the same effect^[7]

Conclusion

In Leopold Aschenbrenner's infamous Situational Awareness paper, in which he correctly predicted the rise of reasoning models, he discussed the concept of "unhobbling" LLMs. That is, figuring out how to get from the brilliant short-term thinking we already have today all the way to the agent-level executive function necessary for true AGI.

ClaudePlaysPokémon is proof that the last 6 months of AI innovation, while incredible, are still far from the true unhobbling necessary for an AI revolution. That doesn't mean 2-year AGI timelines are wrong, but it does feel to me like some new paradigm is yet required for them to be right. If you watch the stream for a couple hours, I think you'll feel the same.

^{^}
As part of the release, they published a Pokémon benchmark here.
^{^}
Roughly guesstimating based off the fact that ~48 hours in to Run #3, it's at ~12,000 actions taken.
^{^}
Update: at the 25 hour mark Claude whited out after losing a battle, putting it back before Mt. Moon. The "Time in Mt. Moon" timer on the livestream is still accurate.
^{^}
This is simplified. For a full writeup, see here. Run #2 was abandoned at 35,000 actions.
^{^}
This specifically happened with Mt. Moon in Run #3. Claude deleted all its notes on Mt. Moon routing after whiting out and respawning at a previous Pokécenter. It mistakenly concluded that it had already beaten Mt. Moon.
^{^}
Actually it turns out this hasn't been done, sorry! A couple RNG attempts were completed, but they involved some human direction/cheating. The point still stands only in the sense that, if Claude took more random/exploratory actions rather than carefully-reasoned shortsighted actions, he'd do better.
^{^}
Okay actually this line I wrote myself—the rest is from my friend—but he failed to write a pithy conclusion for me.

Frontpage

164

So how well is Claude playing Pokémon?

New Comment

74 comments, sorted by

top scoring

Click to highlight new comments since: Today at 2:45 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]Sam Marks21d8936

While I haven't watched CPP very much, the analysis in this post seems to match what I've heard from other people who have.

That said, I think claims like

So, how's it doing? Well, pretty badly. Worse than a 6-year-old would

are overconfident about where the human baselines are. Moreover, I think these sorts of claims reflect a general blindspot about how humans can get stuck on trivial obstacles in the same way AIs do.

Answer: you have to stand on the carpet and press down.

[-]MondSemmel21d1510

Further, have you ever gotten an adult who doesn't normally play video games to try playing one? They have a tendency to get totally stuck in tutorial levels because game developers rely on certain "video game motifs" for load-bearing forms of communication; see e.g. this video.

So much +1 on this.

Also, I've played a ton of games, and in the last few years started helping a bit with playtesting them etc. And I found it striking how games aren't inherently intuitive, but are rather made so via strong economic incentives, endless playtests to stop players from getting stuck, etc. Games are intuitive for humans because humans spend a ton of effort to make them that way. If AIs were the primary target audience, games would be made intuitive for them.

And as a separate note, I'm not sure what the appropriate human reference class for game-playing AIs is, but I challenge the assumption that it should be people who are familiar with games. Rather than, say, people picked at random from anywhere on earth.

[-]tailcalled21d198

And as a separate note, I'm not sure what the appropriate human reference class for game-playing AIs is, but I challenge the assumption that it should be people who are familiar with games. Rather than, say, people picked at random from anywhere on earth.

Should maybe restrict it to someone who has read all the documentation and discussion for the game that exists on the internet.

4MondSemmel21d

Fair. But then also restrict it to someone who has no hands, eyes, etc.

7β-redex21d

If you did that for programming, AIs would already be considered strongly superhuman. Just like we compare AI's coding knowledge to programmers, I think it's perfectly fair to compare their gaming abilities to people who play video games.

4MondSemmel21d

By this I was mainly arguing against claims like that this performance is "worse than a human 6-year-old".

1ErickBall19d

2β-redex19d

1ErickBall19d

[-]williawa21d142

I'm not sure. I remember playing a bunch of games, like pokemon heart gold, lego starwars, and some other pokemon game where you were controlling little pokemon in 3rd person instead of controlling a human who threw pokeballs (anyone know that game? )

And like, I didn't speak English when I played them. So I had to figure out everything by just pressing random buttons and seeing responses. And this makes it a lot more difficult. Like I could open my "inventory" (didn't know what that was) and then use a "healing potion" (didn't know what that was), and then because my pokemon was at full health already, I would think the healing potion was useless, or think that items in inventory only cause text to appear on the screen, but that they don't have any effect on the actaul, and then I'd believe this until I accidentally clicked the inventory and randomly saw a change, or had failed a level so many times that I was getting desperate and just manually doing exhaustive search over all the actions.

But like, I'm very confident I was more action efficient than claude is. Mostly because like, if I enter a battle, and like fail 5 times more or less in the same way, you start to think something... (read more)

5sjadler20d

Possibly amusing anecdote: when I was maybe ~6, my dad went on a business trip and very kindly brought home the new Pokémon Silver for me. Only complication was, his trip had been to Japan, and the game was in Japanese (it wasn’t yet released in the US market), and somehow he hadn’t realized this. I managed to play it reasonably well for a while based on my knowledge of other Pokémon games. But eventually I ran into a person blocking a bridge, who (I presumed) was saying something about what I needed to do before I could advance. But, I didn’t understand what they were saying because it was in Japanese. I had planned to seek out someone who spoke Japanese, and ask their help translating for me, but unfortunately there was almost nobody in my town who did. And so instead I resolved to learn Japanese - and that’s the story of what led to me becoming fluent at a young age. (Just kidding - after flailing around a bit with possibly bypasses, I gave up on playing the game until I got the US version.)

4Garrett Baker20d

Probably Pokemon Mystery Dungeon.

6Julian Bradshaw21d

It's definitely possible to get confused playing Pokémon Red, but as a human, you're much better at getting unstuck. You try new things, have more consistent strategies, and learn better from mistakes. If you tried as long and as consistently as long as Claude is, even as a 6-year-old, you'd do much better. I played Pokémon Red as a kid too (still have the cartridge!), it wasn't easy, but I beat it in something like that 26 hour number IIRC. You have a point that howlongtobeat is biased towards gamers, but it's the most objective number I can find, and it feels reasonable to me.

[-]Sam Marks21d186

as a human, you're much better at getting unstuck

I'm not sure! Or well, I agree that 7-year-old me could get unstuck by virtue of having an "additional tool" called "get frustrated and cry until my mom took pity and helped."^[1] But we specifically prevent Claude from doing stuff like that!

I think it's plausible that if we took an actual 6-year-old and asked them to play Pokemon on a Twitch stream, we'd see many of the things you highlight as weaknesses of Claude: getting stuck against trivial obstacles, forgetting what they were doing, and—yes—complaining that the game is surely broken.

^{^}
TBC this is exaggerated for effect—I don't remember actually doing this for Pokemon. And—to your point—I probably did eventually figure out on my own most of the things I remember getting stuck on.

5Tachikoma20d

Pokemon is a game literally made to be played and beaten by children. Six years old might be pushing the lower bound, but it didn't become one of the largest gaming and entertainment franchises in the world by being too difficult to play for children, whom the game is designed for. Yes, kids get stuck and they do use extra resources like searching up info on game guides (old man moment, before the internet you had to find a friend who had the physical version and would let you borrow or look at it). But is the ability to search the internet the bottleneck that prevents Claude from getting past Mt. Moon in under 50 hours? That does not seem likely. In fact giving it access to the internet where it can get even more lost with potentially additional useless or irrelevant information could make the problem worse.

4Sam Marks20d

Yeah, I think that probably if the claim had been "worse than a 9 year old" then I wouldn't have had much to complain about. I somewhat regret phrasing my original comment as a refutation of the "worse than a 6 year old" and "26 hour" claims, when really I was just using those as a jumping-off point to say some interesting-to-me stuff about how humans also get stuck on trivial obstacles in the same ways that AIs do. I do feel like it's a bit cleaner to factor apart Claude's weaknesses into "memory," "vision," and "executive function" rather than bundling those issues together in the way the OP does at times. (Though obviously these are related, especially memory and executive function.) Then I would guess that Claude's executive function actually isn't that bad and might even be ≥human level. But it's hard to say because the memory—especially visual memory—really does seem worse than a 6 year old's. I think that probably internet access would help substantially.

2Bitnotri20d

It would be so awesome to have such a stream as additional reference point - just one six year old without internet and external help doing a Pokemon run

[-]Lorenzo21d3723

pokemon is a simple, railroady enough game that RNG can beat the game given enough time (and this has been done)

This is not true. It would take an absurd amount of time

[-]gwern21d2112

I agree: if you've ever played any of the Pokemon games, it's clear that a true uniform distribution over actions would not finish any time that a human could ever observe it, and the time would have to be galactic. There are just way too many bottlenecks and long trajectories and reset points, including various ways to near-guarantee (or guarantee?) failure like discarding items or Pokemon, and if you've looked at any Pokemon AI projects or even just Twitch Plays Pokemon, this becomes apparent - they struggle to get out of Pallet Town in a reasonable time, never mind the absurdity of playing through the rest of game and beating the Elite Four etc, and that's with much smarter move selection than pure random.

7Jozdien21d

Yep, you can guarantee failure by ending up in a softlocked state. One example of this is the Lorelei softlock where you're locked into a move that will never run out, and the opposing Pokemon always heals itself long before you knock it out[1]. There are many, many ways you can do this, especially in generation 1. 1. ^ You can get out of it, but with an absurdly low chance of ~1 in 68 quindecillion.

7Julian Bradshaw21d

Thanks for the correction! I've added the following footnote:

[-]MrCheeze22d333

This basically sums up how it's doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience

Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn't come anywhere close to lasting for 1 lap, etc.

But I've also noticed ways in which Claude's personality is sabotaging it. Claude is capable of taking notes saying that it "THOROUGHLY confirmed NO passages" through the eastern barrier - but never gets impatient or frustrated, so this doesn't actually prevent it from trying the same thing every time it sees the eastern wall again.

And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes - even though that's the exact opposite of what you should be doing for exploration. I've seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.

And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.

[-]MrCheeze21d160

And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.

From what I've seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There's still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.

People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these "meltdown" loops that they aren't able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence

Also, the stream admin seemed to think the same thing, saying during the first run that "some runs just are cursed" and setting up a poll for whether to reset the game.

[-]gilch20d113

Update: Claude made it to Cerulean City today, after wandering the Mt. Moon area for 69 hours.

9brambleboy20d

Claude finally made it to Cerulean after the "Critique Claude" component correctly identified that it was stuck in a loop, and decided to go through Mt. Moon. (I think Critique Claude is prompted specifically to stop loops.)

[-]Cole Wyeth22d20-4

This is convincing evidence LLMs are far from AGI.

Eventually, one of the labs will solve it, a bunch of people will publicly update, and I’ll point out that actually the entire conversation about how an LLM should beat Pokémon was in the training data, the scaffolding was carefully set up to keep it on rails in this specific game, the available action set etc is essentially feature selection, etc.

[-]evalu21d*136

I disagree because to me this just looks like LLMs are one algorithmic improvement away from having executive function, similar to how they couldn't do system 2 style reasoning until this year when RL on math problems started working.

For example, being unable to change its goals on the fly: If a kid kept trying to go forward when his pokemon were too weak. He would keep losing, get upset, and hopefully in a moment of mental clarity, learn the general principle that he should step back and reconsider his goals every so often. I think most children learn some form of this from playing around as a toddler, and reconsidering goals is still something we improve at as adults.

Unlike us, I don't think Claude has training data for executive functions like these, but I wouldn't be surprised if some smart ML researchers solved this in a year.

7Cole Wyeth20d

They might solve it in a year, with one stunning conceptual insight. They might solve it in ten years or more. There's no deciding evidence either way; by default, I expect the trend of punctuated equilibria in AI research to continue for some time.

[-]Jackson Wagner21d132

Seems like an easy way to create a less-fakeable benchmark would be to evaluate the LLM+scaffolding on multiple different games? Optimizing for beating Pokemon Red alone would of course be a cheap PR win, so people will try to do it. But optimizing for beating a wide variety of games would be a much bigger win, since it would probably require the AI to develop some more actually-valuable agentic capabilities.

It will probably be correct to chide people who update on the cheap PR win. But perhaps the bigger win, which would actually justify such updates, might come soon afterwards!

[-]JustisMills20d113

I was probably going to make it a top level post, but it seems like this post covers the main points well, so I'll just link my own CPP post here (Julian let me know if you mind, and I'll move it):

https://justismills.substack.com/p/the-blackout-strategy

It's specifically about "the blackout strategy" that MrCheeze mentions below, in a greater degree of detail. Basically, I argue that:

You're gonna get some type of degenerate equilibrium ~always, and more scaffolding will just as likely hurt as help (outside of obviously cheating sorts of scaffold)
The blackout strategy isn't misalignment, just a classic local hill-climbing getting stuck situation

I also describe how the blackout strategy came to be in a little bit of detail. Probably not worth reading for anyone who only wanted a primer and by reading this post has gotten one, but if you can't get enough Claudetent or are curious about the blackout strategy, please enjoy.

5Julian Bradshaw20d

Amazingly, Claude managed to escape the blackout strategy somehow. Exited Mt. Moon at ~68 hours.

5habryka20d

IMO this would be a great top-level post (as would many other of the posts on your Substack I just discovered!)

1JustisMills20d

This is useful for me; I am not quite sure where to draw the line with crossposts, as I blog every week and don't want to flood LW, but do want to crosspost where it'd definitely be relevant/useful!

2ChristianKl20d

Good good strategy might be to cross post post and see what reception they get on Less wrong as far as up votes go. If a post would stay in the single digits, don't cross post other posts like that. If it gets 50+ karma, people on Less wrong wants to see more like it.

[-]Dana21d91

But these issues seem far from insurmountable, even with current tech. It is just that they are not actually trying, because they want to limit scaffolding.

From what I've seen, the main issues:
1) Poor vision -> Can be improved through tool use, will surely improve greatly regardless with new models
2) Poor mapping -> Can be improved greatly + straightforwardly through tool use
3) Poor executive function -> I feel like this would benefit greatly from something like a separation of concerns. Currently my impression is Claude is getting overwhelmed wit... (read more)

6Cole Wyeth21d

Yes, but because this scaffolding would have to be invented separately for each task, it’s no longer really zero shot and says little about the intelligence of Claude.

6ozziegooen21d

Obvious point that we might soon be able to have LLMs code up this necessary scaffolding. This isn't clearly very far-off, from what I can tell.

1IC Rainbow19d

It says that it lacks intelligence to play zero shot and someone has to compensate the intelligence deficit with an exocortex. It's like we can track progress by measuring "performance per exocortex complexity" where the complexity drops from "here's a bunch of buttons to press in sequence to win" to "".

2Cole Wyeth19d

Okay, what I meant is “says little in favor of the intelligence of Claude”

1Dana21d

Well, vision and mapping seem like they could be pretty generic (and I expect much better vision in future base models anyway). For the third limitation, I think it's quite possible that Claude could provide an appropriate segmentation strategy for whatever environment it is told it is being placed into. Whether this would be a display of its intelligence, or just its capabilities, is beside the point from my perspective.

3Cole Wyeth21d

This won’t work, happy to bet on it if you want to make a manifold market.

[-]Clara22d93

I'm extremely curious about the design process of the knowledge base. Just learning about ClaudePlaysPokemon today and I'm a bit surprised at how naive the store is. There's a reasonably large amount of research into artificial neural network memory and I've suspected for a few years that improvements in knowledge scaffolding is promising for really overcoming hallucinations and now reasoning deficiencies. It's to the extent that I've supported projects and experiments at work to mature knowledge bases and knowledge graphs in anticipation of marrying them ... (read more)

[-]MrCheeze21d142

Note that the creator stated that the setup is intentionally somewhat underengineered:

I do not claim this is the world's most incredible agent harness; in fact, I explicitly have tried not to "hyper engineer" this to be like the best chance that exists to beat Pokemon. I think it'd be trivial to build a better computer program to beat Pokemon with Claude in the loop.

This is like meant to be some combination of like "understand what Claude's good at and Benchmark and understand Claude-alongside-a-simple-agent-harness", so what that boils down to is this is like a pretty straightforward tool-using agent.

3Cole Wyeth21d

I did begin and then abandon a sequence about this, cognitive algorithms as scaffolding. I’m like halfway to disendorsing it though.

[-]Davidmanheim20d77

Meanwhile, the average human can beat the entirety of Red in just 26 hours, and with substantially less thought per hour.

I mostly agree with the post, but this number is absolutely bullshit. What you could more honestly claim, given the link, is that the average hardcore gamer that both completed the game, then input their completion time into this type of website is 26 hours. That's an insanely different claim. In fact, I would be shocked if even 50% of people who have played a Pokemon game have completed it at all, much less doing so in under a week of playtime.

3Julian Bradshaw20d

I think this is a fair criticism, but I think it's also partly balanced out by the fact that Claude is committed to trying to beat the game. The average person who has merely played Red probably did not beat it, yes, but also they weren't committed to beating it. Also, Claude has pretty deep knowledge of Pokémon in its training data, making it a "hardcore gamer" both in terms of knowledge and willingness to keep playing. In that way, the reference class of gamers who put forth enough effort to beat the game is somewhat reasonable.

2Davidmanheim18d

I mostly agree, but "the reference class of gamers who put forth enough effort to beat the game" is still necessarily truncated by omitting any who nonetheless failed to complete it, and is likely also omitting gamers embarrassed of how long it took them.

[-]Lukas_Gloor21d52

I got the impression that using only an external memory like in the movie Memento (and otherwise immediately forgetting everything that wasn't explicitly written down) was the biggest hurdle to faster progress. I think it does kind of okay considering that huge limitation. Visually, it would also benefit from learning the difference between what is or isn't a gate/door, though.

[-]Carl Feynman19d30

A task like this, at which the AI is lousy but not hopeless, is an excellent feedback signal for RL. It's also an excellent feedback signal for "grad student descent": have a human add mechanisms, and see if Claude gets better. This is a very good sign for capabilities, unfortunately.

3ChristianKl19d

It's quite easy to use Pokemon playing as feedback signal for becoming better at playing Pokemon. If you naively do that, the AI would learn how to solve the game but doesn't necessarily train executive function. A task like doing computer programming where you have to find a lot of different solutions is likely providing better feedback for RL.

2Carl Feynman19d

True. I was generalizing it to a system that tries to solve lots of Pokémon-like tasks in various artificial worlds, rather than just expecting it to solve Pokémon over and over. But I didn’t say that, I just imagined in my mind and assumed everyone else would too. Thank you for making it explicit!

2ChristianKl19d

It depends on how much Pokémon-like tasks are available. Given that a lot of capital goes into creating each Pokémon game, there aren't that many Pokémon games. I would expect the number of games that are very Pokémon-like to also be limited.

7Carl Feynman18d

When I say Pokémon-type games, I don’t mean games recounting the adventures of Ash Ketchum and Pikachu. I mean games with a series of obstacles set in a large semi-open world, with things you can carry, a small set of available actions at each point, and a goal of progressing past the obstacles. Such games can be manufactured in unlimited quantities by a program. They can also be “peopled” by simple LLMs, for increased complexity. They don’t actually have to be fun to play or look at, so the design requirements are loose. There have been attempts at reinforcement learning using unlimited computer-generated games. They haven’t worked that well. I think the key feature that favors Pokémon-like games is that when the player dies or gets stuck, they can go back to the beginning and try again. This rewards trial-and-error learning to get past obstacles, keeping a long-term memory, and to re-plan your approach when something doesn’t work. These are capabilities in which current LLMs are notably lacking. Another way of saying what Claude’s missing skill is: managing long-term memory. You need to remember important stuff, forget minor stuff, summarize things, and realize when a conclusion in your memory is wrong and needs correction.

[-]ACCount21d30

Makes sense. With pretraining data being what it is, there are things LLMs are incredibly well equipped to do - like recalling a lot of trivia or pretending to be different kinds of people. And then there are things LLMs aren't equipped to do at all - like doing math, or spotting and calling out their own mistakes.

This task, highly agentic and taxing on executive function? It's the latter.

Keep in mind though: we already know that specialized training can compensate for those "innate" LLM deficiencies.

Reinforcement learning is already used to improve LLM ma... (read more)

1Jackson Wagner21d

Yeah -- just like how we are teaching LLMs to do math and coding by doing reinforcement learning on those tasks, it seems like we could just do a ton of RL on assorted videogames (and other agentic tasks, like booking a restaurant reservation online), to create reasoning-style models that have better ability to make and stick to a plan. In addition to the literal reinforcement learning and gradient descent used for training AI models, there is also the more metaphorical gradient descent process that happens when hundreds of researchers all start tinkering with different scaffolding ideas, training concepts, etc, in the hopes of optimizing a new benchmark. Now that "speedrun Pokemon Red" has been identified as a plausible benchmark for agency, I expect lots of engineering talent is already thinking about ways to improve performance. With so much effort going towards solving the problem, I wouldn't be suprised to see the pokemon "benchmark" get "saturated" pretty soon (via performances that exceed most normal humans, and start to approach speedrunner efficiency). Even though right now Claude 3.7 is hopelessly underpeforming normal humans.

[-]Daniel Kokotajlo21d33

Great post, this matches my impression too.

[-]Andrew_Clough22d32

Mechanisms like attention only seem analogous to a human's sensory memory. Reasoning models have something like a working memory but even then I think we'd need something in embedding space to constitute a real working memory analog. And having something like a short term memory could could help Claude avoid repeating the same mistakes.

This is, in some sense, very scary because when someone figures out how to train agent reasoning in embedded space there might be a very dramatic discontinuity in how well LLMs can act as agents.

2Cole Wyeth21d

Maybe, but on reasonable interpretations I think this should cause us to expect AGI to be farther not nearer.

[-]bhauth20d*20

5Julian Bradshaw20d

No idea. Be really worried, I guess—I tend a bit towards doomer. There's something to be said for not leaving capabilities overhangs lying around, though. Maybe contact Anthropic? The thing is, the confidence the top labs have in short-term AGI makes me think there's a reasonable chance they have the solution to this problem already. I made the mistake of thinking they didn't once before - I was pretty skeptical that "more test-time compute" would really unhobble LLMs in a meaningful fashion when Situational Awareness came out and didn't elaborate at all on how that would work. But it turned out that at least OpenAI, and probably Anthropic too, already had the answer at the time.

[-]Malmesbury20d20

Doesn't Claude's training data include all the tutorials and step by step walkthroughs of this game ever published on the internet? How is it not using this information?

4Michael Liu19d

According to the creator the "Claude plays Pokemon", the internal knowledge in Claude can often be more harmful than good for successfully navigating the game. In the system prompt, Claude is specifically told not to trust it's instincts and to rely on the memories in its context. See (starting @ 20:28):

2Julian Bradshaw20d

It does have a lot of the info, but it doesn't always use it well. For example, it knows that Route 4 leads to Cerulean City, and so sometimes thinks there's a way around Mt. Moon that sticks solely to Route 4.

[-]Martín Soto21d20

It's unclear what the optimal amount of thinking per step is. My initial guess would have been that letting Claude think for a whole paragraph before each single action (rather than only each 10 actions, or whenever it's in a match, or whatever) scores slightly better than letting it think more (sequentially). But I guess this might work better if it's what the streamer is using after some iteration.

The story for parallel checks could be different though. My guess would be going all out and letting Claude generate the paragraph 5 times and then generate 5 ... (read more)

[-]cubefox21d20

These problems are partly related to poor planning, but they are clearly also related to language models, which are primarily restricted to operate on text. Actual AGI will likely have to work more like an animal or human brain, which is predicting sensory data (or rather: latent representations of sensory data, JEPA) instead of text tokens. An LLM with good planning may be able to finally beat Pokémon, but it will almost certainly not be able to do robotics or driving or anything with complex or real-time visual data.

[-]Legionnaire17d10

Me and my college educated wife recently got stuck playing Lego Star wars... Our solution was to go to Google it. Some of these games are poorly designed and very unintuitive as others have said. Especially a game this old. Seems like they should give Claude some limited Google searches at least.

The earliest Harry Potter games had help hotlines you could call, which we had to do once when I was 9.

It's hilarious it thinks the game might be broken sometimes, like an angry teenager claiming lag when he loses a firefight in CoD.

[-]satwik20d10

even with copious amounts of test-time compute

There is no copius amount of test-time compute yet. I would argue that test-time compute has barely been scaled at all. Current spend on RL is only a few million dollars. I expect this to be scaled a few orders of magnitude this year.

I predict that Pokemon Red will be finished very fast (<3 months) and everyone who was disappointed and adjusted their AI timelines due to CPP will have to readjust them.

1Julian Bradshaw18d

I meant test-time compute as in the compute expended in the thinking Claude does playing the game. I'm not sure I'm convinced that reasoning models other than R1 took only a few million dollars, but it's plausible. Appreciate the prediction!

[-]yo-cuddles19d00

There's an improvement in LLM's I've seen that is important but has wildly inflated people's expectations beyond what's reasonable:

LLM's have hit a point in some impressive tests where they don't reliably fail past the threshold of being unrecoverable. They are conservative enough that they can do search on a problem, fail a million times until they mumble into an answer.

I'm going to try writing something of at least not-embarrassing quality about my thoughts on this but I am really confused by people's hype around this sort of thing, this feels like directed randomness

2Capybasilisk19d

>mumble into an answer Typo, I presume.

2yo-cuddles19d

No, sorry, that's not a typo that's a linguistic norm that i probably assumed was more common than it actually is Me and the people I talk with have used the words "mumble" and "babble" to describe LLM reasoning. Sort of like human babble, see https://www.lesswrong.com/posts/i42Dfoh4HtsCAfXxL/babble

[-]yrimon19d-2-7

LLMs trying to complete long-term tasks are state machines where the context is their state. They have terrible tools to edit that state, at the moment. There is no location in memory that can automatically receive more attention, because the important memories move has the chain of thought does. Thinking off on a tangent throws a lot of garbage into the LLMs working memory. To remember an important fact over time the LLM needs to keep repeating it. And there isn't enough space in the working memory for long-term tasks.

All of this is exemplified real... (read more)

4Yair Halberstadt19d

Claude already has an external memory, as do most AI agents.

1yrimon19d

Then it is not being used or not being used well as part of Claude plays pokémon. If Claude was taught to optimize it's context as part of thinking, planning and acting it would play much better. By static memory I meant a mental workspace that is always part of the context but that is only be edited intentionally, as opposed to the ever changing stream of consciousness that dominates the contexts today. Claude plays pokémon was given something like this and uses it really poorly.

Moderation Log