Because the number of bits of information is the same in all cases.
I don't know why you're using self-information/surprisal interchangeably with surprise. It's confusing.
Any given random sequence provides evidence of countless extremely low probability world models - we just don't consider the vast majority of those world-models because they aren't elevated to our attention.
Like in the sense that there are hypotheses that something omniscient would consider more likely conditional on Alice doing something surprising, that humans just don't think of because they're humans? I don't expect problems coming up with a satisfactory description of 'the space of all world-models' to be something we have to fix before we can say anything important about surprise.
Because it clearly isn't an example of a world-model being allocated evidence. Your explanation is post-hoc - that is, you're rationalizing. Your description would be an elegant mathematical explanation - I just don't think it's correct, as pertains to what your mind is actually doing, and why you find some situations more surprising than others.
Maybe there's more to be said about the entire class of things that humans have ever labeled as surprising, but this does capture something of what humans mean by surprise, and we can say with particular certainty that it captures what happens in a human mind when a visual stimulus is describable as 'surprising.' The framework I described has, to my knowledge, been shown to correspond quite closely to our neuroscientific understanding of visual surprise and has been applied in machine learning algorithms that diagnose patients based on diagnostic images. There are algorithms that register seeing a tumor on a CT scan as 'surprising' in a way that is quite likely to be very similar to the way that a human would see that tumor and feel surprised. (I don't mean that it's similar in a phenomenological sense. I'm not suggesting that these algorithms have subjective experiences.) I expect this notion of surprise to be generalizable.
I expect this notion of surprise to be generalizable.
Which is what I'm trying to get at. There's -something- there, more than "amount of updates to world-models". I'd guess what we call surprise has a complex relationship with the amount of updates applied to world-models, such that a large update to a single world-model is more surprising than an equal "amount" of update applied across one thousand.
Alice: "I just flipped a coin [large number] times. Here's the sequence I got:
(Alice presents her sequence.)
Bob: No, you didn't. The probability of having gotten that particular sequence is 1/2^[large number]. Which is basically impossible. I don't believe you.
Alice: But I had to get some sequence or other. You'd make the same claim regardless of what sequence I showed you.
Bob: True. But am I really supposed to believe you that a 1/2^[large number] event happened, just because you tell me it did, or because you showed me a video of it happening, or even if I watched it happen with my own eyes? My observations are always fallible, and if you make an event improbable enough, why shouldn't I be skeptical even if I think I observed it?
Alice: Someone usually wins the lottery. Should the person who finds out that their ticket had the winning numbers believe the opposite, because winning is so improbable?
Bob: What's the difference between finding out you've won the lottery and finding out that your neighbor is a 500 year old vampire, or that your house is haunted by real ghosts? All of these events are extremely improbable given what we know of the world.
Alice: There's improbable, and then there's impossible. 500 year old vampires and ghosts don't exist.
Bob: As far as you know. And I bet more people claim to have seen ghosts than have won more than 100 million dollars in the lottery.
Alice: I still think there's something wrong with your reasoning here.