AI friendliness is an important goal and it would be insanely dangerous to build an AI without researching this issue first. I think this is pretty much the consensus view, and that is perfectly sensible.
However, I believe that we are making the wrong inferences from this.
The straightforward inference is "we should ensure that we completely understand AI friendliness before starting to build an AI". This leads to a strongly negative view of AI researchers and scares them away. But unfortunately reality isn't that simple. The goal isn't "build a friendly AI", but "make sure that whoever builds the first AI makes it friendly".
It seems to me that it is vastly more likely that the first AI will be built by a large company, or as a large government project, than by a group of university researchers, who just don't have the funding for that.
I therefore think that we should try to take a more pragmatic approach. The way to do this would be to focus more on outreach and less on research. It won't do anyone any good if we find the perfect formula for AI friendliness on the same day that someone who has never heard of AI friendliness before finishes his paperclip maximizer.
What is your opinion on this?
This is quite a subtle issue.
If the "backup goal" is always in effect, eg. it is just another clause of the main goal. For example, "maximise paperclips" with a backup goal of "do what you are told" is the same as having the main goal "maximise paperclips while doing what you are told".
If the "backup goal" is a separate mode which we can switch an AI into, eg. "stop all external interaction", then it will necessarily conflict with the the AI's main goal: it can't maximise paperclips if it stops all external interaction. Hence the primary goal induces a secondary goal: "in order to maximise paperclips, I should prevent anyone switching me to my backup goal". These kind of secondary goals have been raised by Steve Omohundro.
You haven't dealt with the case where the safety goals are the primary ones.
These kinds of primary goals have been raised by Isaac Asimov.