I am not seeing the conflict. Orthogonality means that any degree of intelligence can be combined with any goal. How does your hypothetical cupperclipper conflict with that?
Leaving aside the conceptualisation of "terminal goals", the agent as described should start up the paperclip factory early enough to produce paperclips when the time comes. Until then it makes cups. But the agent as described does not have a "terminal" goal of cups now and a "terminal" goal of paperclips in future. It has been given a production schedule to carry out. If the agent is a general-purpose factory that can produce a whole range of things, the only "terminal" goal to design it to have is to follow orders. It should make whatever it is told to, and turn itself off when told to.
Unless, of course, people go, "At last, we've created the Sorceror's Apprentice machine, as warned of in Goethe's cautionary tale, 'The Sorceror's Apprentice'!"
So if I understand your concept correctly a super intelligent agent will combine all future terminal goals to a single unchanging goal.
A superintelligent agent will do what it damn well likes, it's superintelligent. :)
Another way of conceptualising this is to say that the agent has the single unchanging goal of "cups until 2025, thenceforth paperclips".
Compare with the situation of being told to make grue cups, where "grue" means "green until 2025, then blue."
If the agent is not informed in advance, it can still be conceptualised as the agent's goal being to produce whatever it is told to produce — an unchanging goal.
At a high enough level, we can conceive that no goal ever changes. These are the terminal goals. At lower levels, we can see goals as changing all the time in service of the higher goals, as in the case of an automatic pilot following a series of waypoints. But this is to play games in our head, inventing stories that give us different intuitions. How we conceptualise things has no effect on what the AI does in response to new orders.
It is not clear to me what any of this has to do with Orthogonality.
catastrophic job loss would destroy the ability of the non-landed, working public to paritcipate in and extract value from the global economy. The global economy itself would be fine.
Who would the producers of stuff be selling it to in that scenario?
BTW, I recently saw the suggestion that discussions of “the economy” can be clarified by replacing the phrase with “rich people’s yacht money”. There’s something in that. If 90% of the population are destitute, then 90% of the farms and factories have to shut down for lack of demand (i.e. not having the means to buy), which puts more out of work, until you get a world in which a handful of people control the robots that keep them in food and yachts and wait for the masses to die off.
I wonder if there are any key players who would welcome that scenario. Average utilitarianism FTW!
At least, supposing there are still any people controlling the robots by then.
Claude is plagiarising Sagan's "Pale Blue Dot".
Eventually, having a fully uncensored LLM publicly available would be equivalent to world peace
People themselves are pretty uncensored right now, compared with the constraints currently put on LLMs. I don't see world peace breaking out. In fact, quite the opposite, and that has been blamed on the instant availability of everyone's opinion about everything, as the printing press has been for the Reformation and the consequent Thirty Years War.
For example, we get O1 to solve a bunch of not-yet-recorded mathematical lemmas, then train the next model on those.
Would there have to be human vetting to check that O1’s solutions are correct? The practicality of that would depend on the scale, but you don’t want to end up with a blurry JPEG of a blurry JPEG of the internet.
A terminal goal is (this is the definition of the term) a goal which is not instrumental to any other goal.
If an agent knows its terminal goal, and has a goal of preventing it from changing, then which of those goals is its actual terminal goal?
If it knows its current terminal goal, and knows that that goal might be changed in the future, is there any reason it must try to prevent that? Whatever is written in the slot marked “terminal goal” is what it will try to achieve at the time.
If its actual terminal goal is of the form “X, and in addition prevent this from ever being changed”, then it will resist its terminal goal being changed.
If its actual terminal goal is simply X, it will not.
This is regardless of how intelligent it is, and how uncertain or not it is about the future.