unaligned ASI is extremely sensitive to context, just in the service of its own goals.
Risks of abuse, isolation and dependence can skyrocket indeed, from, as you say, increased "context-sensitivity" in service of an AI/someone else's own goals. A personalized torture chamber is not better, but in fact quite likely a lot worse, than a context-free torture chamber. But to your question:
Is misalignment really is a lack of sensitivity as opposed to a difference in goals or values?
The way I'm using "sensitivity": sensitivity to X = the meaning...
Thank you, Dusan!
Next time there will be more notice, and also a more refined workshop!
Great! I'd love to have included a remark that one, as a human, might anticipate forward-chainy/rational reasoning in these systems because we're often taking the "thought" metaphor seriously/literally in the label "chain-of-thought", rather than backwardy/rationalization "reasoning".
But since it is is at least somewhat intelligent/predictive, it can make the move of "acausal collusion" with its own tendency to hallucinate, in generating its "chain"-of-"thought". That is, the optimization to have chain-of-thought in correspondence with its output can...
I want to draw a clear distinction between the hypothesis I mention in the OP (your 'causal' explanation) and the 'acausal' explanation you mention here. You were blending the two together during our conversation, but I think there are important differences.
In particular, I am interested in the question: does o1 display any "backwards causation" from answers to chain-of-thought? Would its training result in chain-of-thought that is optimized to justify a sort of answer it tends to give (eg, hallucinated URLs)?
This depends on details of the training which we may not have enough information on (plus potentially complex reasoning about the consequences of such reasoning).
This section here has some you might find useful, among wriing that is published. Excerpted below:
... (read more)