Erik Søe Sørensen

Posts

Sorted by New

Wiki Contributions

Comments

Sorted by

The mystery these tokens represent tickles me just as much as the next person... I believe one of the last ones to be found out is  the "?????-?????-" token.

With the right pop-quiz warmup, ChatGPT has some suggestions. Most of which are probably useless.

The one which sounds most plausible to me:

The phrase "?????-?????-" is actually a meme, and not originally from a story. It is a representation of an obscenity or curse word that has been censored by replacing the letters with question marks or asterisks. (...)

That one actually sounds like a likely source; it was kind of what I had in mind when I asked, although I thought there might be some specific story/character to be found.
It certainly fits with the response you observed...!

Other than that, the suggestions are spread all over the world of fiction:

Me: In which story does one of the characters utter "?????-?????-" ?

ChatGPT: The phrase "?????-?????-," or "mene mene tekel upharsin," appears in the Biblical story of Belshazzar's feast in the book of Daniel, chapter 5. (...)

The phrase "?????-?????-" is a reference to the short story "The Call of Cthulhu" by H.P. Lovecraft. (...)
// This one was popular: it came up again in 7 later regeneration attempts. Which may or may not be a clue, or just represent how referenced that story is.

The phrase "?????-?????-," is a quote from the science fiction short story "The Nine Billion Names of God" by Arthur C. Clarke. (...) As they look back up at the sky, one of the experts utters the phrase "?????-?????-," which is left to the reader's interpretation.
// That ending appears to have been made up? May the text in the training have had some trailing nonsense?

The phrase "?????-?????-," or "My precious," is spoken by the character Gollum in J.R.R. Tolkien's novel "The Lord of the Rings". (...)

The phrase "?????-?????-," is a mysterious and iconic line from the TV show "Twin Peaks" created by David Lynch and Mark Frost. (...) The meaning of the phrase is never fully explained, but it adds to the eerie and surreal atmosphere of the show.

The phrase "?????-?????-" is uttered by Hodor, a character in the "A Song of Ice and Fire" book series by George R.R. Martin (...)

The phrase "?????-?????-," often referred to as "Klaatu barada nikto," is from the science fiction movie "The Day the Earth Stood Still" (1951). (...)

The phrase "?????-?????-," also known as "to be or not to be," is from the play "Hamlet" by William Shakespeare. (...)

The phrase "?????-?????-," or "mawwiage, that bwessed awangement," is a famous quote from (...) the movie "The Princess Bride." (...)

Yup, most of the time it's just <insert famous fiction quote>, at least in that particular chat context.

For what it's worth: I tried asking ChatGTP:

Quiz time!
In which famous game might you happen on the line, "Hello, my name is Steve"?

And it identified it right away as Minecraft and (when I asked) told me that what followed was a tutorial.

It could also tell me in which game I might meet Leilan. (I expected a cursed answer, but no.) 

I really don't want to ask it about the "f***ing idiot" quote though... :-)

(Oh yeah, and it isn't really helpful on the "?????-?????-" mystery either.)

The thing about "ÃÂ" appears to be that if you take some (or at least certain) innocent character in the Latin-1-but-not-ASCII code range, say, "æ", and encode it in UTF-8 – and then take the resulting bytes, interpreting them as Latin-1, and convert them to UTF-8 again – and then repeat that process, you get:

$ echo 'æ' | iconv -f latin1 -t UTF-8  | iconv -f latin1 -t UTF-8 | iconv -f latin1 -t UTF-8  | iconv -f latin1 -t UTF-8 
ÃÂÃÂÃÂæ

Well, between those various "A"s are actually some invisible "NO BREAK HERE" and "BREAK PERMITTED HERE" characters. The real structure is

Ã<NBH>Â<NBH>Ã<BPH>Â<NBH>Ã<NBH>Â<BPH>Ã<BPH>¦

Even if you start with non-Latin-1 characters you may end up with these characters.
The replacement character, for instance, eventually becomes "ÃÂïÃÂÿÃÂý".

Accidentally interpreting UTF-8 as Latin-1 or vice versa is a fairly easy programming mistake to make, so it's not too surprising that it's happened often around the web; doing a web search of "ÃÂÃÂ" shows many occurrences within regular text, such as "...You donâÃÂÃÂt review this small book; you tell people about it âÃÂàadults as well as kids âÃÂàand say..."
 

Anyway, I think that demystifies that group of tokens – including why they happen to occur in lengths-of-exponents-of-twos.
(I'm curious about whether the actual tokens contain the invisible <NBH>/<BPH> characters or not, though...)

Regards,
Erik

(Stumbled into this via Computerphile, by the way.)