MrCheeze — LessWrong

Where does Sonnet 4.5's desire to "not get too comfortable" come from?

It's something introduced by the more agentic coding capabilities. Like if there's a bug Claude is trying to fix and its current angle of attack to it isn't being useful, then eventually it makes sense to recognize that and switch tactics. Claude playing Pokemon also had the issue of being willing to repeat the same ineffective strategies over and over and getting stuck because of that. Could that unexpectedly generalize to something like a desire for variety in conversation?

This is exactly what I was thinking while reading the post. They didn't advertise conversational changes, but they DID advertise agentic improvements, and improving its ability to vary its approaches to tasks is an obvious way of doing that.

(That said, it's not necessarily true that this a general Claude improvement rather than one that just happens by chance to show up in this specific test.)

Recent AI model progress feels mostly like bullshit

MrCheeze6mo1612

But you have to be careful here, since the results heavily depend on details of the harness, as well as on how thoroughly they have memorized walkthroughs of the game.

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red

MrCheeze6mo31

Text adventures do seem like a good eval right now, since they're the ONLY games that can be tested without either relying on vision (which is still very bad), or writing a custom harness for each game (in which case your results depend heavily on the harness).

Is Gemini now better than Claude at Pokémon?

MrCheeze6mo20

(Gemini did actually write much of the Gemini_Plays_Pokemon scaffolding, but only in the sense of doing what David told it to do, not designing and testing it.)

I think you're probably right that a LLM coding its own scaffolding is probably more achievable than one playing the game like a human, but I don't think current models can do it - watching the streams, the models don't seem like they understand their own flaws, although admittedly they haven't been prompted to focus on this.

Is Gemini now better than Claude at Pokémon?

MrCheeze6mo60

On the other hand, Claude has (arguably) a better pathfinding tool. As long as it requests to be moved to a valid set of coordinates from the screenshot overlay grid, the tool will move it there. Gemini mostly navigates on its own, although it has access to another instance of Gemini dedicated just to pathfinding.

I very much argue this. Claude's navigator tool can only navigate to coordinates that are onscreen, meaning that the main model needs to have some idea of where it's going. Which means grappling with problems that are extremely difficult for both models, such as "go AROUND the wall instead of right through it".

In contrast, the Gemini pathfinder tool can travel to a coordinate halfway across the map, totally bypassing that problem. (Yes, the pathfinder is technically another instance of Gemini, but it's been prompted with exactly what algorithm to follow, so this is not a major handicap.) When returning to a previously visited map - Gemini is banned from using the pathfinder tool to enter unexplored tiles - it can probably traverse even mazes that take the Claude scaffolding all day, in just one or two turns.

Of course this has further advantages for maintaining coherence, since if you spend all day on a maze, you forget what your plan even was after you get to the end of it.

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red

MrCheeze6mo*243

I have not tested if Gemini can distinguish this tree (and intend to eventually). This may very well be the only reason Gemini has progressed further.

You missed an important fact about the Gemini stream, which is that it just reads the presence of these trees from RAM and labels them for the model (along with a few other special tiles like ledges and water). Nevertheless I do think Gemini's vision is better, by which I mean if you provide it a screenshot it will sometimes identify the correct tree, unlike Claude who will never do so. (Although to my knowledge the Gemini in the stream has literally never used vision for anything.) And in general the Gemini streamer is far more liberal about updating the scaffolding to address challenges than the Claude streamer is.

Also there's one other reason that Gemini has gotten farther: it simply has the whole walkthrough of the game memorized, while Claude doesn't know what to do after the thunderbadge. (I don't think either model would be remotely competent on RPGs that aren't in the training data.)

This doesn't mean memory is not a problem. The problems are just more subtle than one might imagine. For instance, the lack of direct memory means models lack a real sense of time, or how hard a task is. That means even when given a notepad to record observations, they will not consistently record "HOW TO SOLVE THAT PUZZLE THAT TOOK FOREVER" because they don't realize it took forever. And of course if it's not written down it falls completely out of "long-term" memory.

This has been a recurring problem with the Claude stream, where the model is given the ability to take notes. Whenever he's struggling and failing to solve a problem for a long time, he'll endlessly write notes about his (wrong) ideas for what to do, reinforcing that behaviour. When he finally tries the right thing, it seems like it was easy, so you MIGHT get one note written down about it. If you're lucky.

In general, however incompetent this post makes it sound like the models are at playing the game, they're even worse than that. I feel like this is in large part because of LLMs having frozen weights - every single mistake that they make will be repeated every time the situation reoccurs, instead of just once as a human would do. Taking notes doesn't help this very much, as their basic instincts being wrong seems to make far more difference than what's in their notes.

So how well is Claude playing Pokémon?

MrCheeze8mo180

And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.

From what I've seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There's still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.

People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these "meltdown" loops that they aren't able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence

Also, the stream admin seemed to think the same thing, saying during the first run that "some runs just are cursed" and setting up a poll for whether to reset the game.

So how well is Claude playing Pokémon?

MrCheeze8mo142

Note that the creator stated that the setup is intentionally somewhat underengineered:

I do not claim this is the world's most incredible agent harness; in fact, I explicitly have tried not to "hyper engineer" this to be like the best chance that exists to beat Pokemon. I think it'd be trivial to build a better computer program to beat Pokemon with Claude in the loop.

This is like meant to be some combination of like "understand what Claude's good at and Benchmark and understand Claude-alongside-a-simple-agent-harness", so what that boils down to is this is like a pretty straightforward tool-using agent.

So how well is Claude playing Pokémon?

MrCheeze8mo373

This basically sums up how it's doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience

Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn't come anywhere close to lasting for 1 lap, etc.

But I've also noticed ways in which Claude's personality is sabotaging it. Claude is capable of taking notes saying that it "THOROUGHLY confirmed NO passages" through the eastern barrier - but never gets impatient or frustrated, so this doesn't actually prevent it from trying the same thing every time it sees the eastern wall again.

And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes - even though that's the exact opposite of what you should be doing for exploration. I've seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.

And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.

Why I'm doing PauseAI

MrCheeze1y75

"Under development" and "currently training" I interpret as having significantly different meanings.

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments