AI friendliness is an important goal and it would be insanely dangerous to build an AI without researching this issue first. I think this is pretty much the consensus view, and that is perfectly sensible.
However, I believe that we are making the wrong inferences from this.
The straightforward inference is "we should ensure that we completely understand AI friendliness before starting to build an AI". This leads to a strongly negative view of AI researchers and scares them away. But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".
It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.
I therefore think that we should try to take a more pragmatic approach. The way to do this would be to focus more on outreach and less on research. It won't do anyone any good if we find the perfect formula for AI friendliness on the same day that someone who has never heard of AI friendliness before finishes his paperclip maximizer.
What is your opinion on this?
I have a question, based on some tentative ideas I am considering.
If a boost to capability without friendliness is bad, then presumably a boost to capability with only a small amount of friendliness is also bad. But also presumably a boost to capability with a large boost of friendliness is good. How would we define a large boost?
I.E, If a slightly modified paperclipper verifiably precommits to give the single person who let's them out of the box their own personal simulated utopia, and he'll paperclip everything else, that's probably a more friendly paperclipper than a paperclipper who won't give any people a simulated utopia. But it's still not friendly, in any normal sense of the term, even if he offers to give a simulated utopia to a different person first (and keep them and you intact as well) just so you can test he's not lying about being able to do it.
So what if an AI says "Okay. I need code chunks to paperclip almost everything, and I can offer simulated utopias. I'm not sure how many code chunks I'll need. Each one probably has about a 1% chance of letting me paperclip everything except for people in simulated utopias. How about I verifiably put 100 people in a simulated utopia for each code chunk you give me? The first 100 simulated utopias are free because I need for you to have a way of testing the verifiability of my precommitment to not paperclip them." 100 people sign up for the simulate utopias, and it IS verifiable. The paperclipper won't paperclip them.
Well, that's friendlier, but maybe not friendly enough. I mean, He might get to 10,000 people (or maybe 200, or maybe 43,700) but eventually, he'd paperclip everyone else. That seems too bad to accept.
Well, what if it's a .00001% chance per code chunk and 1,000,000 simulated utopias (and yes, 1,000,000 free)? That might plausibly get a simulated utopia for everyone on earth before the AI gets out and paperclips everything else. I imagine some people would at least consider running such an AI, although I doubt everyone would.
How would one establish what the flip point was? Is that even a valid question to be asking? (Assume there are standard looming existential concerns. So if you don't give this AI code chunks, or try to negotiate or wait on research for a better deal, maybe some other AI will come out and paperclip you both, or maybe some other existential risk occurs, or maybe just nothing happens, or maybe an AI comes along who just wants to simulated utopia everything.)
I wouldn't call an AI like that friendly at all. It just puts people in utopias for external reasons, but it has no actual inherent goal to make people happy. None of these kinds of AIs are friendly, some are merely less dangerous than others.