paulfchristiano comments on Three Approaches to "Friendliness" - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (84)
I mean speed. It seems like you are relying on an assumption of a rapid transition from a world like ours to a world dominated by superhuman AI, whereas typically I imagine a transition that lasts at least years (which is still very fast!) during which we can experiment with things, develop new approaches, etc. In this regime many more approaches are on the table.
Even given shaky solutions to the control problem, it's not obvious that you can't move quickly to a much better prepared society, via better solutions to the control problem, further AI work, brian emulations, significantly better coordination or human enhancement, etc.
This is an interesting view (in that it isn't what I expected). I don't think that the AIs are doing any work in this scenario, i.e., if we just imagined normal humans going on their way without any prospect of building much smarter descendants, you would make similar predictions for similar reasons? If so, this seems unlikely given the great range of possible coordination mechanisms many of which look like they could avert this problem, the robust historical trends in increasing coordination ability and scale of organization, etc. Are there countervailing reasons to think it is likely, or even very plausible? If not, I'm curious about how the presence of AI changes the scenario.
I don't find these arguments particularly compelling as a case for "there is very likely to be a problem," though they are more compelling as an indication of "there might be a problem."
In general, it seems that the burden of proof is on someone who claims "Surely X" in an environment which is radically unlike any environment we have encountered before. I don't think that any very compelling arguments have been offered here, just vague gesturing. I think it's possible that we should focus on some of these pessimistic possibilities because we can have a larger impact there. But your (and Eliezer's) claims go further than this, suggesting that it isn't worth investing in interventions that would modestly improve our ability of coping with difficulties (respectively clarifying understanding of AI and human empowerment, both of which slightly speed up AI progress), because the probability is so low. I think this is a plausible view, but it doesn't look like the evidence supports it to me.
I'm certainly aware of the points you've raised, and at least a reasonable fraction of the thinking that has been done in this community on these topics. Again, I'm happy with these arguments (and have made many of them myself) as a good indication that the issue is worth taking seriously. But I think you are taking this "rejection" much too seriously in this context. If someone said "maybe X will work" and someone else said "maybe X won't work," I won't then leave X off of (long) lists of reasons why things might work, even if I agreed with them.
This is getting a bit too long for a point-by-point response, so I'll pick what I think are the most productive points to make. Let me know if there's anything in particular you'd like a response on.
I try not to assume this, but quite possibly I'm being unconsciously biased in that direction. If you see any place where I seem to be implicitly assuming this, please point it out, but I think my argument applies even if the transition takes years instead of weeks.
Coordination ability may be increasing but is still very low on an absolute scale. (For example we haven't achieved nuclear disarmament, which seems like a vastly easier coordination problem.) I don't see it increasing at a fast enough pace to be able to solve the problem in time. I also think there are arguments in economics (asymmetric information, public choice theory, principal-agent problems) that suggest theoretical limits to how effective coordination mechanisms can be.
For each AI approach there is not a large number of classes of "AI control schemes" that are compatible or applicable to it, so I don't understand your relative optimism if you think any given class of proposals is pretty unlikely to work.
But the bigger problem for me is that even if one of these proposals "works", I still don't see how that helps towards the goal of ending up with a superintelligent singleton that shares our values and is capable of solving philosophical problems, which I think is necessary to get the best outcome in the long run. An AI that respects my intentions might be "safe" in the immediate sense, but if everyone else has got one, we now have less time to solve philosophy/metaphilosophy before the window of opportunity for building a singleton closes.
(Quoting from a parallel email discussion which we might as well continue here.) My point is that the development of such an AI leaves people like me in a worse position than before. Yes I would ask for "more robust solutions to the control problem" but unless the solutions are on the path to solving philosophy/metaphilosophy, they are only ameliorating the damage and not contributing to the ultimate goal, and while I do want "opportunities for further reflection", the AI isn't going to give me more than what I already had before. In the mean time, other people who are less reflective than me are using their AIs to develop nanotech and more powerful AIs, likely forcing me to do the same (before I'd otherwise prefer) in order to remain competitive.