Your argument would be stronger if you provided a citation. I've only skimmed CEV, for instance, so I'm not fully familiar with Eliezer strongest arguments in favour of goal structure tending to be preserved (though I know he did argue for that) in the course of intelligence growth. For that matter, I'm not sure what your arguments for goal stability under intelligence improvement are. Nevertheless, consider the following:
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
Yudkowsky, E. (2004). Coherent Extrapolated Volition. Singularity Institute for Artificial Intelligence
(Bold mine.) See that bolded part above? Those are TODOs. They would be good to have, but they're not guaranteed. The goals of a more intelligent AI might diverge from those of its previous self; it may extrapolate differently; it may interpret differently; its desires may, at higher levels of intelligence, interfere with ours rather than cohere.
If I want X, and I'm considering an improvement to my systems that would make me not want X, then I'm not going to get X if I take that improvement, so I'm going to look for some other improvement to my systems to try instead.
A more intelligent AI might:
Sorry for not citing; I was talking with people who would not need such a citation, but I do have a wider audience. I don't have time to hunt it up now, but I'll edit it in later. If I don't, poke me.
If at higher intelligence it finds that the volition diverges rather than converges, or vice versa, or that it goes in a different direction, that is a matter of improvements in strategy rather than goals. No one ever said that it would or should not change its methods drastically with intelligence increases.
This is part of a weekly reading group on Nick Bostrom's book, Superintelligence. For more information about the group, and an index of posts so far see the announcement post. For the schedule of future topics, see MIRI's reading guide.
Welcome. This week we discuss the ninth section in the reading guide: The orthogonality of intelligence and goals. This corresponds to the first section in Chapter 7, 'The relation between intelligence and motivation'.
This post summarizes the section, and offers a few relevant notes, and ideas for further investigation. Some of my own thoughts and questions for discussion are in the comments.
There is no need to proceed in order through this post, or to look at everything. Feel free to jump straight to the discussion. Where applicable and I remember, page numbers indicate the rough part of the chapter that is most related (not necessarily that the chapter is being cited for the specific claim).
Reading: 'The relation between intelligence and motivation' (p105-8)
Summary
Another view
John Danaher at Philosophical Disquisitions starts a series of posts on Superintelligence with a somewhat critical evaluation of the orthogonality thesis, in the process contributing a nice summary of nearby philosophical debates. Here is an excerpt, entitled 'is the orthogonality thesis plausible?':
Notes
In-depth investigations
If you are particularly interested in these topics, and want to do further research, these are a few plausible directions, some inspired by Luke Muehlhauser's list, which contains many suggestions related to parts of Superintelligence. These projects could be attempted at various levels of depth.
How to proceed
This has been a collection of notes on the chapter. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
Next week, we will talk about instrumentally convergent goals. To prepare, read 'Instrumental convergence' from Chapter 7. The discussion will go live at 6pm Pacific time next Monday November 17. Sign up to be notified here.