Omohundro's rationality in a nutshell reads:
Step 1 seems to be a fairly basic and fundamental one to me.
If you don't know what you are trying to do, it is not easy to know whether you are succeeding at it - or not.
I think rational agents should try and figure out what they want. "It's complicated" is a kind of answer - but not a very practical or useful one.
I suspect failures at step 1 are mostly to do with signalling. For example, Tom Lehrer once spoke of "a man whose allegiance is ruled by expedience". If you publish your goals, that limits your ability to signal motives that are in harmony with those of your audience - reducing your options for deceptive signalling.
I suspect failures at step 1 are mostly to do with signalling. For example, Tom Lehrer once spoke of "a man whose allegiance is ruled by expedience". If you publish your goals, that limits your ability to signal motives that are in harmony with those of your audience - reducing your options for deceptive signalling.
I doubt this very strongly, and can say with the an extremely high level of confidence that I fail at step 1 even when I don't have to tell anyone my answer. What humans mean by "clearly specified" is completely different ...
Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?