Yudkowsky has a pinned tweet that states the problem quite well: it's not so much that alignment is necessarily infinitely difficult, but that it certainly doesn't seem anywhere as easy as advancing capabilities, and that's a problem when what matters is whether the first powerful AI is aligned:
Safely aligning a powerful AI will be said to be 'difficult' if that work takes two years longer or 50% more serial time, whichever is less, compared to the work of building a powerful AI without trying to safely align it.
It seems to me like the "more careful philosophy" part presupposes a) that decision-makers use philosophy to guide their decision-making, b) that decision-makers can distinguish more careful philosophy from less careful philosophy, and c) that doing this successfully would result in the correct (LW-style) philosophy winning out. I'm very skeptical of all three.
Counterexample to a): almost no billionaire philanthropy uses philosophy to guide decision-making.
Counterexample to b): it is a hard problem to identify expertise in domains you're not an expert in.
Counterexample to c): from what I understand, in 2014, most of academia did not share EY's and Bostrom's views.
Presumably it was because Google had just bought DeepMind, back when it was the only game in town?
This NYT article (archive.is link) (reliability and source unknown) corroborates Musk's perspective:
As the discussion stretched into the chilly hours, it grew intense, and some of the more than 30 partyers gathered closer to listen. Mr. Page, hampered for more than a decade by an unusual ailment in his vocal cords, described his vision of a digital utopia in a whisper. Humans would eventually merge with artificially intelligent machines, he said. One day there would be many kinds of intelligence competing for resources, and the best would win.
If that happens, Mr. Musk said, we’re doomed. The machines will destroy humanity.
With a rasp of frustration, Mr. Page insisted his utopia should be pursued. Finally he called Mr. Musk a “specieist,” a person who favors humans over the digital life-forms of the future.
That insult, Mr. Musk said later, was “the last straw.”
And this article from Business Insider also contains this context:
Musk's biographer, Walter Isaacson, also wrote about the fight but dated it to 2013 in his recent biography of Musk. Isaacson wrote that Musk said to Page at the time, "Well, yes, I am pro-human, I fucking like humanity, dude."
Musk's birthday bash was not the only instance when the two clashed over AI.
Page was CEO of Google when it acquired the AI lab DeepMind for more than $500 million in 2014. In the lead-up to the deal, though, Musk had approached DeepMind's founder Demis Hassabis to convince him not to take the offer, according to Isaacson. "The future of AI should not be controlled by Larry," Musk told Hassabis, according to Isaacson's book.
Most configurations of matter, most courses of action, and most mind designs, are not conducive to flourishing intelligent life. Just like most parts of the universe don't contain flourishing intelligent life. I'm sure this stuff has been formally stated somewhere, but the underlying intuition seems pretty clear, doesn't it?
What if whistleblowers and internal documents corroborated that they think what they're doing could destroy the world?
Ilya is demonstrably not in on that mission, since his step immediately after leaving OpenAI was to found an additional AGI company and thus increase x-risk.
I don't understand the reference to assassination. Presumably there are already laws on the books that outlaw trying to destroy the world (?), so it would be enough to apply those to AGI companies.
Just as one example, OpenAI was against SB 1047, whereas Musk was for it. I'm not optimistic about regulation being enough to save us, but presumably they would be helpful, and some AI companies like OpenAI were against even the limited regulations of SB 1047. Plus SB 1047 also included stuff like whistleblower protections, and that's the kind of thing that could help policymakers make better decisions in the future.
While the framing of treating lack of social grace as a virtue captures something true, it's too incomplete and imo can't support its strong conclusion. The way I would put it is that you have correctly observed that, whatever the benefits of social grace are, it comes at a cost, and sometimes this cost is not worth paying. So in a discussion, if you decline to pay the cost of social grace, you can afford to buy other virtues instead.[1]
For example, it is socially graceful not to tell the Emperor Who Wears No Clothes that he wears no clothes. Whereas someone who lacks social grace is more likely to tell the emperor the truth.
But first of all, I disagree with the frame that lack of social grace is itself a virtue. In the case of the emperor, for example, the virtues are rather legibility and non-deception, traded off against whichever virtues the socially graceful response would've gotten.
And secondly, often the virtues you can buy with social grace are worth far more than whatever you could gain by declining to be socially graceful. For example, when discussing politics with someone of an opposing ideology, you could decline to be socially graceful and tell your interlocutor to their face that you hate them and everything they stand for. This would be virtuously legible and non-deceptive, at the cost of immediately ending the conversation and thus forfeiting any chance of e.g. gains from trade, coming to a compromise, etc.
One way I've seen this cost manifest on LW is that some authors complain that there's a style of commenting here that makes it unenjoyable to post here as an author. As a result, those authors are incentivized to post less, or to post elsewhere.[2]
And as a final aside, I'm skeptical of treating Feynman as socially graceless. Maybe he was less deferential towards authority figures, but if he had told nothing but the truth to all the authority figures (who likely included some naked emperors) throughout his life, his career would've presumably ended long before he could've gotten his Nobel Prize. And b), IIRC the man's physics lectures are just really fun to watch, and I'm pretty confident that a sufficiently socially graceless person would not make for a good teacher. For example, it is socially graceful not to belittle fledgling students as intellectual inferiors, even though they in some ways are just that.
Related: I wrote this comment and this follow-up where I wished that Brevity was considered a rationalist virtue. Because if there's no counterbalancing virtue to trade off against other virtues like legibility and truth-seeking, then supposedly virtuous discussions are incentivized to become arbitrarily long.
The moderation log of users banned by other users is a decent proxy for the question of which authors have considered which commenters to be too costly to interact with, whether due to lack of social grace of something else.