Eliezer_Yudkowsky comments on Safety Culture and the Marginal Effect of a Dollar - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (105)
It worries me a tad that nobody in the discussion group corrected what I consider to be the obvious basic inaccuracy of the model.
Success on FAI is not a magical result of a researcher caring about safety. The researcher who would have otherwise first created AGI does not gain the power to create FAI just by being concerned about it. They would have to develop a stably self-improving AI which learned an understandable goal system which actually did what they wanted. This could be a completely different set of design technologies than what would have gone into something unstable that improved itself by ad-hoc methods well enough to go FOOM and end the game. The researcher who would have otherwise created AGI might not be good enough to do this. The best you might be able to convince them to do would be to retire from the game. It's a lot harder to convince someone to abandon the incredibly good idea they're enthusiastic about, and start over from scratch or leave the game, then to persuade people to be "concerned about safety", which is really cheap (you just put on a look of grave concern).
If I thought all you had to do to win was to convince the otherwise-first creator of AGI to be "take safety seriously", this problem would be tremendously easier and I would be approaching it in a very different way. I'd be putting practically all of my efforts into PR and academia, not trying to assemble a team to solve basic FAI problems over however-many years and then afterward build FAI. A free win just for convincing someone to take something seriously? Hot damn, that'd be one easy planet to save; there'd be no point in pursuing any other avenue until you'd totally exhausted that one.
As it stands, though, you're faced with (a) the much harder sell of convincing AGI people that they will destroy the world and that being concerned is not enough to save them, that they have to tackle much harder problems than they wanted to face on a problem that seems to them hard-enough-already; and (b) if you do convince the AGI person who otherwise would've destroyed the world, to join the good guys on a different problem or retire, you don't win. The game isn't won there. It's just a question of how long it takes the next AGI person in line to destroy the world. If you convinced them? Number three. You keep dealing through the deck until you turn up the ace of spades, unless the people working on the ace of hearts can solve their more difficult problem before that happens.
All academic persuasion does is buy time, and not very much of that - the return on effort invested seems to be pretty low.
The main advantage of convincing mainstream AI people that FAI is a problem worth worrying about appears to be not that you will have mainstream AI people thinking twice before they build their AGI, but that you will then have mainstream AI people working on FAI. More people working on a given problem seems to make it massively more likely that the problem will be solved.
If there are rigorous arguments that FAI is worth worrying about, and that there are interesting questions about which people could be doing useful incremental research, then convincing people who work in universities to start doing this research has to be such a massive win than it would take something pretty huge to outweigh it - there are a lot of very clever people working in universities, massively more than will ever work at SingInst, and they already have a huge network in place to give them money to think about the things they find interesting.
Indeed, all of this was discussed at the time, and these complexities do indeed make the model produce an overestimate. However, I really don't think think the difference is whole orders of magnitude, and this
is definitely wrong. While there is a great deal more that needs to be figured out in order for an AI to be friendly, much of it is research that academia could do, too, if only they thought it was worthwhile.
I plan to write an article about just what "being safety conscious" would mean, but it's not "spending a few extra days on safety features before flipping the switch", it's more like handing the whole project over to friendliness researchers experts and taking advantage of whatever friendliness research has been done up to that point. Those experts and that research need to exist, but I don't think those differences are on the margin of current existential risk reduction spending, since the limiting resource there isn't money.
After reading Eliezer's comment and yours, I now think the "30%" figure for "being safety conscious" needs unpacking. In particular I think there's a tendency to picture the most safety conscious of the converts, and say the entire 30% looks like that, even though (for me at least) the intuitions which let 30% be plausible are based on researchers intellectually believing safety consciousness is very important rather than researchers taking actions as if safety consciousness is very important.
GuySrinivasan's comment seems to suggest that the estimated marginal effect of a dollar could be at least 2 orders of magnitude smaller if additional considerations are taken into account. See "down by a factor of 10" and "10%-50% that being safety conscious works".
I agree with you that we're stuck in (arguably unpleasant) position of having to actually go ahead with the FAI as a project; still, academic persuasion might get you funds and some of the best brains for your project.
Safety-speed tradeoffs, the systematic bias in "one randomly selected researcher," and AGI vs FAI difficulty were discussed at the time.
This comment seems to argue that "trying to assemble a team to solve basic FAI problems over however-many years and then afterward build FAI" is the real goal here and that "convincing someone to take something seriously" is barely worth thinking about. However, it certainly seems to me that convincing people to take the problem seriously is a productive (and perhaps an essential) first step toward assembling a team.
However, reading the subtext in that comment, it certainly appears that the real fear expressed here is that if safety consciousness should become endemic in the AGI community, there is a real risk that someone else might produce FAI before Eliezer.
That makes no sense. If safety consciousness means that the AGI community is likely to produce FAI before Eliezer, then without safety consciousness, the AGI community is even more likely to produce UFAI before Eliezer produces FAI. Either way, Eliezer gets scooped; but in the second case, we're very dead.
That's hardly charitable.
As explicit reasoning, yes, that would be absurd. But we are all primates, and the thought of being overshadowed feels bad on a subconscious level, even if that feeling disagrees with conscious beliefs. "I will do it right and anyone else who tries will do it wrong" is an unlikely thing to believe, but a likely thing to alieve.