Note that the creator stated that the setup is intentionally somewhat underengineered:
I do not claim this is the world's most incredible agent harness; in fact, I explicitly have tried not to "hyper engineer" this to be like the best chance that exists to beat Pokemon. I think it'd be trivial to build a better computer program to beat Pokemon with Claude in the loop.
This is like meant to be some combination of like "understand what Claude's good at and Benchmark and understand Claude-alongside-a-simple-agent-harness", so what that boils down to is this is like a pretty straightforward tool-using agent.
This basically sums up how it's doing: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j568ck/the_mount_moon_experience
Of course much of that is basic capability issues -poor spatial reasoning, short term memory that doesn't come anywhere close to lasting for 1 lap, etc.
But I've also noticed ways in which Claude's personality is sabotaging it. Claude is capable of taking notes saying that it "THOROUGHLY confirmed NO passages" through the eastern barrier - but never gets impatient or frustrated, so this doesn't actually prevent it from trying the same thing every time it sees the eastern wall again.
And it general, it seems to have a strong bias towards visiting places that are mentioned frequently in its notes - even though that's the exact opposite of what you should be doing for exploration. I've seen it reach the uncommonly reached second ladder on the floor, and then promptly decided it needs to run back to the first ladder (which it has seen hundreds of times) to see whether the first ladder goes anywhere.
And it should definitely be mentioned that run #1 was mercy killed when its knowledge base was populated almost entirely with falsehoods both about how far it had progressed in the game and how to get further, leading to a singleminded obsession with exploring the southern wall of Cerulean City forever.
"Under development" and "currently training" I interpret as having significantly different meanings.
Doesn't strike me as inevitable at all, just a result of OpenAI following similar methods for creating their tokenizer twice. (In both cases, leading to a few long strings being included as tokens even though they don't actually appear frequently in large corpuses.)
They presumably had already made the GPT-4 tokenizer long before SolidGoldMagikarp was discovered in the GPT-2/GPT-3 one.
Prior to OpenAI's 2023-02-14 patching of ChatGPT (which seemingly prevents it from directly encountering glitch tokens like ‘ petertodd’)
I've never seen it mentioned around here, but since that update, ChatGPT is using a different tokenizer that has glitch tokens of its own:
I'd say this captures the spirit of Less Wrong perfectly.
500 years still sounds optimistic to me.
The key is in the phrase "much more complicated". The sort of algorithm that could become a mind would be an enormous leap forward in comparison to anything that has ever been done so far.
Man, people's estimations seem REALLY early. The idea of AI in fifty years seems almost absurd to me.
And now in the second run it has entered a similar delusional loop. It knows the way to Cerulean City is via Route 4, but the route before and after Mt. Moon are both considered part of Route 4. Therefore it deluded itself into thinking it can get to Cerulean from the first part of the route. Because of that, every time it accidentally stumbles into Mt Moon and is making substantial progress towards the exit, it intentionally blacks out to get teleported back outside the entrance, so it can look for the nonexistent path forwards.
From what I've seen on stream, the chances of it questioning and breaking from this delusion are basically zero. There's still the possibility of progress by getting lost in Mt Moon and stumbling into the exit, but it will never actually figure out what it was doing wrong here.
People in the stream chat and subreddit have been discussing this paper suggesting that LLM agents often get into these "meltdown" loops that they aren't able to recover from: https://www.reddit.com/r/ClaudePlaysPokemon/comments/1j65jqf/vendingbench_a_benchmark_for_longterm_coherence
Also, the stream admin seemed to think the same thing, saying during the first run that "some runs just are cursed" and setting up a poll for whether to reset the game.