Something seems to be really wrong with Claude Opus 4.8.
I like to test out new models on literary and poetic material. I used to send them the lyrics of Kate Bush's "The Kick Inside" and see what they could make of it, until they started just recognizing the song. Lately I've been using some text I wrote myself, with this prompt or something very close to it:
[...]
I'm not going to post the actual here. I want to keep using it with future LLMs. Who knows, I might even write the rest of the novel, and I hate teasers. If anybody really wants it, I can send it privately.
The text is about 1500 words. It's a vignette, and its relation to the rest of the story won't be clear for a while. It's intentionally surreal-sounding, and it's not in trivially easy language. It drops lots of hints of different strengths about different things... as well as making some flat statements of fact.
Most models do badly by human standards, but they get some of the themes, and make some reasonable, if rather conventional, guesses at what's going on. They do often tend to ignore critical phrases that matter out of proprotion to their length. They sometimes get the emphasis wrong. And the older and less sophisticated ones tend to lose facts.
Opus 4.6 and 4.7 (and probably 4.5; I don't seem to have a record of trying it on this text) feel like interacting with a reader. Yes, they missed many of the hints, but so would a human.
Opus 4.8 is lost, probably worse than Kimi K2. It gets some things about the mood, but that's about it.
It seems to decide that certain phrases and paragraphs are ultra-salient, for no reason obvious to me. It confidently declares that a few things are central, then it mostly ignores the rest of the text. On the first run, it was so bad I thought the front end had mangled the file.
Every run seems to fixate on the second to last paragraph, and some particular phrasing in it. The first run called it "the opening passage". Yet the very phrase the model most emph