FWIW, I wasn't talking about CEV or superintelligent agents. I was just talking about the task of figuring out what your own goals were.
We can't really coherently discuss in detail the difficulties of programming goals into superintelligent agents until we know how to build them. Programming one agent's goals into a different agent looks challenging. Some devotees attempt to fulfill their guru's desires - but that is a trickier problem than fulfilling their own desires - since they don't get direct feedback from the guru's senses. Anyway, these are all complications that I did not even pretend to be going into.
What do you actually mean when you say you "fail at step 1". You have no idea what your own goals are?!? Or just that your knowledge of your own goals is somewhat incomplete?
I wasn't talking about CEV or superintelligent agents either. I mean that I have no idea how to write down my own goals. I am nowhere close to having clearly specified goals for myself, in the sense that I as a mathematician usually mean "clearly specified". The fact that I can't describe my goals well enough that I could tell them to someone else and trust them to do what I want done is just one indication that my own conception of my goals is significantly incomplete.
Taken from some old comments of mine that never did get a satisfactory answer.
1) One of the justifications for CEV was that extrapolating from an American in the 21st century and from Archimedes of Syracuse should give similar results. This seems to assume that change in human values over time is mostly "progress" rather than drift. Do we have any evidence for that, except saying that our modern values are "good" according to themselves, so whatever historical process led to them must have been "progress"?
2) How can anyone sincerely want to build an AI that fulfills anything except their own current, personal volition? If Eliezer wants the the AI to look at humanity and infer its best wishes for the future, why can't he task it with looking at himself and inferring his best idea to fulfill humanity's wishes? Why must this particular thing be spelled out in a document like CEV and not left to the mysterious magic of "intelligence", and what other such things are there?