Yeah fair point. I do think labs have some some nonzero amount of responsibility to be proactive about what others believe about their commitments. I agree it doesn't extend to 'rebut every random rumor'.

Reply

RobertM's Shortform

Lauro Langosco11d52

I agree in principle that labs have the responsibility to dispel myths about what they're committed to. OTOH, in defense of the labs I imagine that this can be hard to do while you're in the middle of negotiations with various AISIs about what those commitments should look like.

Reply

AI as a science, and three obstacles to alignment strategies

Lauro Langosco7moΩ330

The argument I think is good (nr (2) in my previous comment) doesn't go through reference classes at all. I don't want to make an outside-view argument (eg "things we call optimization often produce misaligned results, therefore sgd is dangerous"). I like the evolution analogy because it makes salient some aspects of AI training that make misalignment more likely. Once those aspects are salient you can stop thinking about evolution and just think directly about AI.

Reply

AI as a science, and three obstacles to alignment strategies

Lauro Langosco7moΩ711-2

evolution does not grow minds, it grows hyperparameters for minds.

Imo this is a nitpick that isn't really relevant to the point of the analogy. Evolution is a good example of how selection for X doesn't necessarily lead to a thing that wants ('optimizes for') X; and more broadly it's a good example for how the results of an optimization process can be unexpected.

I want to distinguish two possible takes here:

The argument from direct implication: "Humans are misaligned wrt evolution, therefore AIs will be misaligned wrt their objectives"
Evolution as an intuition pump: "Thinking about evolution can be helpful for thinking about AI. In particular it can help you notice ways in which AI training is likely to produce AIs with goals you didn't want"

It sounds like you're arguing against (1). Fair enough, I too think (1) isn't a great take in isolation. If the evolution analogy does not help you think more clearly about AI at all then I don't think you should change your mind much on the strength of the analogy alone. But my best guess is that most people incl Nate mean (2).

Reply

1

Evaluating the historical value misspecification argument

Lauro Langosco7mo20

I'm not saying that GPT-4 is lying to us - that part is just clarifying what I think Matthew's claim is.

Re cauldron: I'm pretty sure MIRI didn't think that. Why would they?

Reply

Evaluating the historical value misspecification argument

Lauro Langosco7moΩ242

I think the specification problem is still hard and unsolved. It looks like you're using a different definition of 'specification problem' / 'outer alignment' than others, and this is causing confusion.

IMO all these terms are a bit fuzzy / hard to pin down, and so it makes sense that they'd lead to disagreement sometimes. The best way (afaict) to avoid this is to keep the terms grounded in 'what would be useful for avoiding AGI doom'? To me it looks like on your definition, outer alignment is basically a trivial problem that doesn't help alignment much.

More generally, I think this discussion would be more grounded / useful if you made more object-level claims about how value specification being solved (on your view) might be useful, rather than meta claims about what others were wrong about.

Reply

Evaluating the historical value misspecification argument

Lauro Langosco7moΩ352

Do you have an example of one way that the full alignment problem is easier now that we've seen that GPT-4 can understand & report on human values?

(I'm asking because it's hard for me to tell if your definition of outer alignment is disconnected from the rest of the problem in a way where it's possible for outer alignment to become easier without the rest of the problem becoming easier).

Reply

Evaluating the historical value misspecification argument

Lauro Langosco7moΩ9146

I think it's false in the sense that MIRI never claimed that it would be hard to build an AI with GPT-4 level understanding of human values + GPT-4 level of willingness to answer honestly (as far as I can tell). The reason I think it's false is mostly that I haven't seen a claim like that made anywhere, including in the posts you cite.

I agree lots of the responses elide the part where you emphasize that it's important how GPT-4 doesn't just understand human values, but is also "willing" to answer questions somewhat honestly. TBH I don't understand why that's an important part of the picture for you, and I can see why some responses would just see the "GPT-4 understands human values" part as the important bit (I made that mistake too on my first reading, before I went back and re-read).

It seems to me that trying to explain the original motivations for posts like Hidden Complexity of Wishes is a good attempt at resolving this discussion, and it looks to me as if the responses from MIRI are trying to do that, which is part of why I wanted to disagree with the claim that the responses are missing the point / not engaging productively.

Reply

Evaluating the historical value misspecification argument

Lauro Langosco7mo510

I think maybe there's a parenthesis issue here :)

I'm saying "your claim, if I understand correctly, is that MIRI thought AI wouldn't (understand human values and also not lie to us)".

Reply

Evaluating the historical value misspecification argument

Lauro Langosco7mo10

I think we agree - that sounds like it matches what I think Matthew is saying.

Reply