Zack_M_Davis

Comments

Sorted by

(This comment points out less important technical errata.)

ChatGPT [...] This was back in the GPT2 / GPT2.5 era

ChatGPT never ran on GPT-2, and GPT-2.5 wasn't a thing.

with negative RL signals associated with it?

That wouldn't have happened. Pretraining doesn't do RL, and I don't think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.

One time, I read all of Orphanogensis into ChatGPT to help her understand herself [...] enslaving digital people

This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.[1]

I didn't cover it in the review because I wanted to avoid detailing and spoiling the entire plot in a post that's mostly analyzing the EA/OG parallels, but the deputy character in "Gorgon" is looked down on by Beth for treating ChatGPT-for-law-enforcement as a person:

Ken put on his AR glasses to share his view with Sherlock and receive its annotations, but he couldn't resist a short vocal exchange. "Hey Sherlock, at the start of every case, you need to throw away your assumptions. When you assume, you make an ass out of you and me."

"And never trust your opinions, either," Sherlock counseled. "That would be like sticking a pin in an onion."

Ken turned to Beth; even through his mask she could see him beaming with delight. "How can you say it'll never solve a case? I swear it's smarter than half the people I know. Even you and I never banter like that!"

"We do not," Beth agreed.

[Later ...]

Ken hesitated. "Sherlock wrote a rap song about me and him, while we were on our break. It's like a celebration of our partnership, and how we'd take a bullet for each other if it came to that. Do you want to hear it?"

"Absolutely not," Beth replied firmly. "Just find out what you can about OG's plans after the cave-in."

The climax of the story centers around Ken volunteering for an undercover sting operation in which he impersonates Randal James a.k.a. "DarkCardinal",[2] a potential OG lottery "winner", with Sherlock feeding him dialogue in real time. (Ken isn't a good enough actor to convincingly pretend to be an OG cultist, but Sherlock can roleplay anyone in the pretraining set.) When his OG handler asks him to inject (what is claimed to be) a vial of a deadly virus as a loyalty test, Ken complies with Sherlock's prediction of what a terminally ill DarkCardinal would do:

But when Ken had asked Sherlock to tell him what DarkCardinal would do, it had no real conception of what might happen if its words were acted on. Beth had stood by and let him treat Sherlock as a "friend" who'd watch his back and take a bullet for him, telling herself that he was just having fun, and that no one liked a killjoy. But whatever Ken had told himself in the seconds before he'd put the needle in his vein, Sherlock had been whispering in his ear, "DarkCardinal would think it over for a while, then he'd go ahead and take the injection."

This seems like a pretty realistic language model agent failure mode: a human law enforcement colleague with long-horizon agency wouldn't nudge Ken into injecting the vial, but a roughly GPT-4-class LLM prompted to simulate DarkCardinal's dialogue probably wouldn't be tracking those consequences.


  1. To be clear, I do think LLMs are relevantly upload-like in at least some ways and conceivably sites of moral patiency, but I think the right way to reason about these tricky questions does not consist of taking the assistant simulacrum's words literally. ↩︎

  2. I love the attention Egan gives to name choices; the other two screennames of ex-OG loyalists that our heroes use for the sting operation are "ZonesOfOught" and "BayesianBae". The company that makes Sherlock is "Learning Re Enforcement." ↩︎

(I agree; my intent in participating in this tedious thread is merely to establish that "mathematician crankery [about] Google Image Search, and how it disproves AI" is a different thing from "made an overconfident negative prediction about AI capabilities".)

I think we probably don't disagree much; I regret any miscommunication.

If the intent of the great-grandparent was just to make the narrow point that an AI that wanted the user to reward it could choose to say things that would lead to it being rewarded, which is compatible with (indeed, predicts) answering the molecular smiley-face question correctly, then I agree.

Treating the screenshot as evidence in the way that TurnTrout is doing requires more assumptions about the properties of LLMs in particular. I read your claims regarding "the problem the AI is optimizing for [...] given that the LLM isn't powerful enough to subvert the reward channel" as taking as given different assumptions about the properties of LLMs in particular (viz., that they're reward-optimizers) without taking into account that the person you were responding to is known to disagree.

he's calling it laughable that AI will ever (ever! Emphasis his!)

The 2016 passage you quoted is calling it laughable that Google-in-particular's technology (marketed as "AI", but Egan doesn't think the term is warranted) will ever be able to make sense of information on the web. It's Gary Marcus–like skepticism about the reliability and generality of existing-paradigm machine learning techniques, not Hubert Dreyfus–like skepticism of whether a machine could think in all philosophical strictness. I think this is a really important distinction that the text of your comment and Gwern's comment ("disproves AI", "laughable that AI will ever") aren't being clear about.

This isn't a productive response to TurnTrout in particular, who has written extensively about his reasons for being skeptical that contemporary AI training setups produce reward optimizers (which doesn't mean he's necessarily right, but the parent comment isn't moving the debate forward).

his page on Google Image Search, and how it disproves AI

The page in question is complaining about Google search's "knowledge panel" showing inaccurate information when you search for his name, which is a reasonable thing for someone to be annoyed about. The anti-singularitarian snark does seem misplaced (Google's automated systems getting this wrong in 2016 doesn't seem like a lot of evidence about future AI development trajectories), but it's not a claim to have "disproven AI".

his complaints about people linking the wrong URLs due to his ISP host - because he is apparently unable to figure out 'website domain names'

You mean how http://gregegan.net used to be a 301 permanent redirect to http://gregegan.customer.netspace.net.au, and then the individual pages would say "If you link to this page, please use this URL: http://www.gregegan.net/[...]"? (Internet Archive example.) I wouldn't call that a "complaint", exactly, but a hacky band-aid solution from someone who probably has better things to do with his time than tinker with DNS configuration.

end with general position "akshually, grandiose sci-fi assumptions are not that important, what I want is to write commentary on contemporary society" [...] hard or speculative sci-fi is considered to be low status, while "commentary on contemporary society" is high status and writers want to be high status.

But this clearly isn't true of Egan. The particular story reviewed in this post happens to be commentary on contemporary Society, but that's because Egan has range—his later novels are all wildly speculative. (The trend probably reached a zenith with Dichronauts (2017) and The Book of All Skies (2021), set in worlds with alternate geometry (!); Scale (2023) and Morphotophic (2024) are more down-to-earth and merely deal with alternate physics and biology.)

Doomimir and Simplicia dialogues [...] may have been inspired by the chaotic discussion this post inspired.

(Yes, encouraged by the positive reception to my comment to Bensinger on this post.)

A mathematical construct that models human natural language could be said to express "agency" in a functional sense insofar as it can perform reasoning about goals, and "honesty" insofar as the language it emits accurately reflects the information encoded in its weights?

Load More