I'm again not sure how far this generalizes, but among the kind of men who read Less Wrong (which is a product of both neurotype and birth year), I think there's a phenomenon where it's not a matter of a man being cognitively unable to pick up on women's cues, but of not being prepared to react in a functional way due to having internalized non-adaptive beliefs about the nature of romance and sexuality. (In a severe case, this manifests as the kind of neurosis described in Comment 171, but there are less severe cases.)
I remember one time from my youth wher...
Not sure how much this generalizes to everyone, but part of the story (for either the behavior or the pattern of responses to the question) might that some people are ideologically attached to believing in love: that women and men need each other as a terminal value, rather than just instrumentally using each other for resources or sex. For myself, without having any particular empirical evidence or logical counterargument to offer, the entire premise of the question just feels sad and gross. It's like you're telling me you don't understand why people try to make ghosts happy. But I want ghosts to be happy.
No one with the money has offered to fund it yet. I'm not even sure they're aware this is happening.
Um, this seems bad. I feel like I should do something, but I don't personally have that kind of money to throw around. @habryka, is this the LTFF's job??
Simplicia: But how do you know that? Obviously, an arbitrarily powerful expected utility maximizer would kill all humans unless it had a very special utility function. Obviously, there exist programs which behave like a webtext-next-token-predictor given webtext-like input but superintelligently kill all humans on out-of-distribution inputs. Obviously, an arbitrarily powerful expected utility maximizer would be good at predicting webtext. But it's not at all clear that using gradient descent to approximate the webtext next-token-function gives you an arbit...
(Self-review.) I think this pt. 2 is the second most interesting entry in my Whole Dumb Story memoir sequence. (Pt. 1 deals with more niche psychology stuff than the philosophical malpractice covered here; pt. 3 is a more of a grab-bag of stuff that happened between April 2019 and January 2021; pt. 4 is the climax. Expect the denouement pt. 5 in mid-2025.)
I feel a lot more at peace having this out there. (If we can't have justice, sanity, or language, at least I got to tell my story about trying to protect them.)
The 8 karma in 97 votes is kind of funny in ...
(Self-review.) I'm as proud of this post as I am disappointed that it was necessary. As I explained to my prereaders on 19 October 2023:
...My intent is to raise the level of the discourse by presenting an engagement between the standard MIRI view and a view that's relatively optimistic about prosaic alignment. The bet is that my simulated dialogue (with me writing both parts) can do a better job than the arguments being had by separate people in the wild; I think Simplicia understands things that e.g. Matthew Barnett doesn't. (The karma system loved my dial
Retrospectives are great, but I'm very confused at the juxtaposition of the Lightcone Offices being maybe net-harmful in early 2023 and Lighthaven being a priority in early 2025. Isn't the latter basically just a higher-production-value version of the former? What changed? (Or after taking the needed "space to reconsider our relationship to this whole ecosystem", did you decide that the ecosystem is OK after all?)
Lighthaven is quite different from the Lightcone Offices. Some key differences:
Speaking as someone in the process of graduating college fifteen years late, this is what I wish I knew twenty years ago. Send this to every teenager you know.
At the time, I remarked to some friends that it felt weird that this was being presented as a new insight to this audience in 2023 rather than already being local conventional wisdom.[1] (Compare "Bad Intent Is a Disposition, Not a Feeling" (2017) or "Algorithmic Intent" (2020).) Better late than never!
The "status" line at the top does characterize it as partially "common wisdom", but it's currently #14 in the 2023 Review 1000+ karma voting, suggesting novelty to the audience. ↩︎
Presenting the same ideas differently is pro-social and worthwhile, and can help things land with those for whom other presentations didn't.
But he's not complaining about the traditional pages of search results! He's complaining about the authoritative-looking Knowledge Panel to the side:
...Obviously it's not Google's fault that some obscure SF web sites have stolen pictures from the Monash University web site of Professor Gregory K Egan and pretended that they're pictures of me ... but it is Google's fault when Google claim to have assembled a mini-biography of someone called "Greg Egan" in which the information all refers to one person (a science fiction writer), while the picture is of someo
(This comment points out less important technical errata.)
ChatGPT [...] This was back in the GPT2 / GPT2.5 era
ChatGPT never ran on GPT-2, and GPT-2.5 wasn't a thing.
with negative RL signals associated with it?
That wouldn't have happened. Pretraining doesn't do RL, and I don't think anyone would have thrown a novel chapter into the supervised fine-tuning and RLHF phases of training.
One time, I read all of Orphanogensis into ChatGPT to help her understand herself [...] enslaving digital people
This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.[1]
I didn't cover it in the review because I wanted to avoid detailing and spoiling the entire plot in a post that's mostly analyzing the EA/OG parallels, but the deputy character in "Gorgon" is looked down on by Beth for treating ChatGP...
This is exactly the kind of thing Egan is reacting to, though—starry-eyed sci-fi enthusiasts assuming LLMs are digital people because they talk, rather than thinking soberly about the technology qua technology.
I feel like this borders on the strawman. When discussing this argument my general position isn't "LLMs are people!". It's "Ok, let's say LLMs aren't people, which is also my gut feeling. Given that they still converse as or more intelligently as some human beings whom we totally acknowledge as people, where the fuck does that leave us as to our a...
Poor Ken. He's not even as smart as Sherlock. Its funny though, because whole classes of LLM jailbreaks involve getting them to pretend to be someone who would do the thing the LLM isn't supposed to do, and then the strength of the frame (sometimes) drags them past the standard injunctions. And that trick was applied to Ken.
Method acting! It is dangerous for those with limited memory registers!
I agree that LLMs are probably "relevantly upload-like in at least some ways" and I think that this was predictable, and I did, in fact, predict it, and I thought Op...
I think we probably don't disagree much; I regret any miscommunication.
If the intent of the great-grandparent was just to make the narrow point that an AI that wanted the user to reward it could choose to say things that would lead to it being rewarded, which is compatible with (indeed, predicts) answering the molecular smiley-face question correctly, then I agree.
Treating the screenshot as evidence in the way that TurnTrout is doing requires more assumptions about the properties of LLMs in particular. I read your claims regarding "the problem the AI is op...
he's calling it laughable that AI will ever (ever! Emphasis his!)
The 2016 passage you quoted is calling it laughable that Google-in-particular's technology (marketed as "AI", but Egan doesn't think the term is warranted) will ever be able to make sense of information on the web. It's Gary Marcus–like skepticism about the reliability and generality of existing-paradigm machine learning techniques, not Hubert Dreyfus–like skepticism of whether a machine could think in all philosophical strictness. I think this is a really important distinction that the text of your comment and Gwern's comment ("disproves AI", "laughable that AI will ever") aren't being clear about.
This isn't a productive response to TurnTrout in particular, who has written extensively about his reasons for being skeptical that contemporary AI training setups produce reward optimizers (which doesn't mean he's necessarily right, but the parent comment isn't moving the debate forward).
I'm not quite seeing how this negates my point, help me out?
his page on Google Image Search, and how it disproves AI
The page in question is complaining about Google search's "knowledge panel" showing inaccurate information when you search for his name, which is a reasonable thing for someone to be annoyed about. The anti-singularitarian snark does seem misplaced (Google's automated systems getting this wrong in 2016 doesn't seem like a lot of evidence about future AI development trajectories), but it's not a claim to have "disproven AI".
...his complaints about people linking the wrong URLs due to his ISP host - b
which is a reasonable thing for someone to be annoyed about.
No, not really. The most fundamental problem is not the stupid claims about what Google will 'ever' be capable of (although that aspect is highly notable, in case you needed any evidence how Egan blew it on DL, that he wrote this ~2012 and kept doubling down on it), but rather his stubborn insistence on misunderstanding what a search engine does. That is why I am citing it: as an example of his mathematician-like crankery (ie. cognitive rigidity and fanaticism and perfectionism) - he has an ide...
end with general position "akshually, grandiose sci-fi assumptions are not that important, what I want is to write commentary on contemporary society" [...] hard or speculative sci-fi is considered to be low status, while "commentary on contemporary society" is high status and writers want to be high status.
But this clearly isn't true of Egan. The particular story reviewed in this post happens to be commentary on contemporary Society, but that's because Egan has range—his later novels are all wildly speculative. (The trend probably reached a zenith with...
Though — I haven't read all of his recent novels, but I think — none of those are (for lack of a better word) transhumanist like Permutation City or Diaspora, or even Schild's Ladder or Incandescence. Concretely: no uploads, no immortality, no artificial minds, no interstellar civilization. I feel like this fits the pattern, even though the wildness of the physics doesn't. (And each of those four earlier novels seems successively less about the implications of uploading/immortality/etc.)
Doomimir and Simplicia dialogues [...] may have been inspired by the chaotic discussion this post inspired.
(Yes, encouraged by the positive reception to my comment to Bensinger on this post.)
(Self-review.) I claim that this post is significant for articulating a solution to the mystery of disagreement (why people seem to believe different things, in flagrant violation of Aumann's agreement theorem): much of the mystery dissolves if a lot of apparent "disagreements" are actually disguised conflicts. The basic idea isn't particularly original, but I'm proud of the synthesis and writeup. Arguing that the distinction between deception and bias is less decision-relevant than commonly believed seems like an improvement over hang-wringing over where the boundary is.
Some have delusional optimism about [...]
I'm usually not a fan of tone-policing, but in this case, I feel motivated to argue that this is more effective if you drop the word "delusional." The rhetorical function of saying "this demo is targeted at them, not you" is to reassure the optimist that pessimists are committed to honestly making their case point by point, rather than relying on social proof and intimidation tactics to push a predetermined "AI == doom" conclusion. That's less credible if you imply that you have warrant to dismiss all claims of t...
I don't think Vance is e/acc. He has said positive things about open source, but consider that the context was specifically about censorship and political bias in contemporary LLMs (bolding mine):
...There are undoubtedly risks related to AI. One of the biggest:
A partisan group of crazy people use AI to infect every part of the information economy with left wing bias. Gemini can't produce accurate history. ChatGPT promotes genocidal concepts.
The solution is open source
If Vinod really believes AI is as dangerous as a nuclear weapon, why does ChatGPT have such
The next major update can be Claude 4.0 (and Gemini 2.0) and after that we all agree to use actual normal version numbering rather than dating?
Date-based versions aren't the most popular, but it's not an unheard of thing that Anthropic just made up: see CalVer, as contrasted to SemVer. (For things that change frequently in small ways, it's convenient to just slap the date on it rather than having to soul-search about whether to increment the second or the third number.)
'You acted unwisely,' I cried, 'as you see
By the outcome.' He calmly eyed me:
'When choosing the course of my action,' said he,
'I had not the outcome to guide me.'
The claim is pretty clearly intended to be about relative material, not absolute number of pawns: in the end position of the second game, you have three pawns left and Stockfish has two; we usually don't describe this as Stockfish having given up six pawns. (But I agree that it's easier to obtain resources from an adversary that values them differently, like if Stockfish is trying to win and you're trying to capture pawns.)
if there's a bunch of superintelligences running around and they don't care about you—no, they will not spare just a little sunlight to keep Earth alive.
Yes, I agree that this conditional statement is obvious. But while we're on the general topic of whether Earth will be kept alive, it would be nice to see some engagement with Paul Christiano's arguments (which Carl Shulman "agree[s] with [...] approximately in full") that superintelligences might care about what happens to you a little bit, articulated in a comment thread on Soares's "But Why Would the...
...If a misaligned AI had 1/trillion "protecting the preferences of whatever weak agents happen to exist in the world", why couldn't it also have 1/trillion other vaguely human-like preferences, such as "enjoy watching the suffering of one's enemies" or "enjoy exercising arbitrary power over others"?
From a purely selfish perspective, I think I might prefer that a misaligned AI kills everyone, and take my chances with continuations of myself (my copies/simulations) elsewhere in the multiverse, rather than face whatever the sum-of
I think you're overestimating the intended scope of this post. Eliezer's argument involves multiple claims - A, we'll create ASI; B, it won't terminally value us; C, it will kill us. As such, people have many different arguments against it. This post is about addressing a specific "B doesn't actually imply C" counterargument, so it's not even discussing "B isn't true in the first place" counterarguments.
The reason I think this is important is because "[t]o argue against an idea honestly, you should argue against the best arguments of the strongest advocates": if you write 3000 words inveighing against people who think comparative advantage means that horses can't get sent to glue factories, that doesn't license the conclusion that superintelligence Will Definitely Kill You if there are other reasons why superintelligence Might Not Kill You that don't stop being real just because very few people have the expertise to formulate them carefully.
There's ...
Bernald Arnalt has given eight-figure amounts to charity. Someone who reasoned, "Arnalt is so rich, surely he'll spare a little for the less fortunate" would in fact end up making a correct prediction about Bernald Arnalt's behavior!
Just for the sake of concreteness, since having numbers here seems useful, it seems like Bernald Anault has given around ~$100M to charity, which is around 0.1% of his net worth (spreading this contribution equally to everyone on earth would be around one cent per person, which I am just leaving it here for illustrative purpose...
An earlier version of this on twitter used Bill Gates instead of Bernard, did specifically address the fact that Bill Gates does give money to charity, but he still won't give the money to you specifically, he'll give money for his own purposes and values. (But, then, expressed frustration that people were going to fixate on this facet of Bill Gates and get derailed unproductively, and switch the essay to use Bernard).
I actually think on reflection that the paragraph was a pretty good paragraph that should just have been included.
I agree that engaging more...
I think the simplest argument to "caring a little" is that there is a difference between "caring a little" and "caring enough". Let's say that AI is ready to pay 1$ for your survival. If you live in economy which rapidly disassembles Earth into Dyson swarm, oxygen, protected environment and food are not just stuff lying around, they are complex expensive artifacts and AI is certainly not ready to pay for your O'Neil cylinder to be evacuated into and not ready to pay opportunity costs of not disassembling Earth, so you die.
The other case is difference "cari...
- Arguments from moral realism, fully robust alignment, that ‘good enough’ alignment is good enough in practice, and related concepts.
What is moral realism doing in the same taxon with fully robust and good-enough alignment? (This seems like a huge, foundational worldview gap; people who think alignment is easy still buy the orthogonality thesis.)
- Arguments from good outcomes being so cheap the AIs will allow them.
If you're putting this below the Point of No Return, then I don't think you've understood the argument. The claim isn't that good outcom...
in a world where the median person is John Wentworth [...] on Earth (as opposed to Wentworld)
Who? There's no reason to indulge this narcissistic "Things would be better in a world where people were more like meeeeeee, unlike stupid Earth [i.e., the actually existing world containing all actually existing humans]" meme when the comparison relevant to the post's thesis is just "a world in which humans have less need for dominance-status", which is conceptually simpler, because it doesn't drag in irrelevant questions of who this Swentworth person is and wh...
It's a fuzzy Sorites-like distinction, but I think I'm more sympathetic to trying to route around a particular interlocutor's biases in the context of a direct conversation with a particular person (like a comment or Tweet thread) than I am in writing directed "at the world" (like top-level posts), because the more something is directed "at the world", the more you should expect that many of your readers know things that you don't, such that the humility argument for honesty applies forcefully.
Just because you don't notice when you're dreaming, doesn't mean that dream experiences could just as well be waking experiences. The map is not the territory; Mach's principle is about phenomena that can't be told apart, not just anything you happen not to notice the differences between.
When I was recovering from a psychotic break in 2013, I remember hearing the beeping of a crosswalk signal, and thinking that it sounded like some sort of medical monitor, and wondering briefly if I was actually on my deathbed in a hospital, interpreting the monitor sound ...
Here's the comment I sent using the contact form on my representative's website.
...Dear Assemblymember Grayson:
I am writing to urge you to consider voting Yes on SB 1047, the Safe and Secure Innovation for Frontier Artificial Intelligence Models Act. How our civilization handles machine intelligence is of critical importance to the future of humanity (or lack thereof), and from what I've heard from sources I've trust, this bill seems like a good first step: experts such as Turing Award winners Yoshua Bengio and Stuart Russell support the bill (https://time.
On the one hand, I also wish Shulman would go into more detail on the "Supposing we've solved alignment and interpretability" part. (I still balk a bit at "in democracies" talk, but less so than I did a couple years ago.) On the other hand, I also wish you would go into more detail on the "Humans don't benefit even if you 'solve alignment'" part. Maybe there's a way to meet in the middle??
The philosophical ideal can still exert normative force even if no humans are spherical Bayesian reasoners on a frictionless plane. The disjunction ("it must either the case that") is significant: it suggests that if you're considering lying to someone, you may want to clarify to yourself whether and to what extent that's because they're an enemy or because you don't respect them as an epistemic peer. Even if you end up choosing to lie, it's with a different rationale and mindset than someone who's never heard of the normative ideal and just thinks that white lies can be good sometimes.
I definitely do not agree with the (implied) notion that it is only when dealing with enemies that knowingly saying things that are not true is the correct option
There's a philosophically deep rationale for this, though: to a rational agent, the value of information is nonnegative. (Knowing more shouldn't make your decisions worse.) It follows that if you're trying to misinform someone, it must either the case that you want them to make worse decisions (i.e., they're your enemy), or you think they aren't rational.
white lies or other good-faith actions
What do you think "good faith" means? I would say that white lies are a prototypical instance of bad faith, defined by Wikipedia as "entertaining or pretending to entertain one set of feelings while acting as if influenced by another."
Frustrating! What tactic could get Interlocutor un-stuck? Just asking them for falsifiable predictions probably won't work, but maybe proactively trying to pass their ITT and supplying what predictions you think their view might make would prompt them to correct you, à la Cunningham's Law?
Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team’s focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense.
I'm surprised! If MIRI is mostly a Pause advocacy org now, I can see why agent foundations research doesn't fit the new focus and should be restructured. But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn...
But the benefit of a Pause is that you use the extra time to do something in particular. Why wouldn't you want to fiscally sponsor research on problems that you think need to be solved for the future of Earth-originating intelligent life to go well?
MIRI still sponsors some alignment research, and I expect we'll sponsor more alignment research directions in the future. I'd say MIRI leadership didn't have enough aggregate hope in Agent Foundations in particular to want to keep supporting it ourselves (though I consider its existence net-positive).
My mo...
(Previous commentary and discussion.)