@nostalgebraist @Mantas Mazeika "I think this conversation is taking an adversarial tone." If this is how the conversation is going this might be the case to end it and work on a, well, adversarial collaboration outside the forum.

Reply

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Matrice Jacobine1mo40

Would you mind to cross-post this on the EA Forum?

Reply

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine1mo52

It does seem that the LLMs are subject to deontological constraints (Figure 19), but I think that in fact makes the paper's framing of questions as evaluation between world-states instead of specific actions more apt at evaluating whether LLMs have utility functions over world-states behind those deontological constraints. Your reinterpretation of how those world-state descriptions are actually interpreted by LLMs is an important remark and certainly change the conclusions we can make from this article regarding to implicit bias, but (unless you debunk those results) the most important discoveries of the paper from my point of view, that LLMs have utility functions over world-states which are 1/ consistent across LLMs, 2/ more and more consistent as model size increase, and 3/ can be subject to mechanical interpretability methods, remain the same.

Reply

Cooperation for AI safety must transcend geopolitical interference

Matrice Jacobine1mo2-8

... I don't agree, but would it at least be relevant that the "soft CCP-approved platitudes" are now AI-safetyist?

Reply

Cooperation for AI safety must transcend geopolitical interference

Matrice Jacobine1mo10

So that answer your question "Why does the linked article merit our attention?" right?

Reply

Cooperation for AI safety must transcend geopolitical interference

Matrice Jacobine1mo24

Why does the linked article merit our attention?

It is written by a Chinese former politician in a Chinese-owned newspaper.

?

Reply

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine1mo12

I'm not convinced "almost all sentient beings on Earth" would pick out of the blue (i.e. without chain of thought) the reflectively optimal option at least 60% of the times when asked unconstrained responses (i.e. not even a MCQ).

Reply

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine1mo30

The most important part of the experimental setup is "unconstrained text response". If in the largest LLMs 60% of unconstrained text responses wind up being "the outcome it assigns the highest utility", then that's surely evidence for "utility maximization" and even "the paperclip hyper-optimization caricature". What more do you want exactly?

Reply

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Matrice Jacobine1mo177

This doesn't contradict the Thurstonian model at all. This only show order effects are one of the many factors going in utility variance, one of the factors of the Thurstonian model. Why should it be considered differently than any other such factor? The calculations still show utility variance (including order effects) decrease when scaled (Figure 12), you don't need to eyeball based on a few examples in a Twitter thread on a single factor.

Reply