User Comment Replies

A simple case for extreme inner misalignment

I'm generally confused by the argument here.

As we examine successively more intelligent agents and their representations, the representation of any particular thing will perhaps be more compressed, but also and importantly, more intelligent agents represent things that less intelligent agents don't represent at all. I'm more intelligent than a mouse, but I wouldn't say I have a more compressed representation of differential calculus than a mouse does. Terry Tao is likely more intelligent than I am, likely has a more compressed representation of differentia... (read more)

2[anonymous]10mo

I think what you're saying just makes a lot of sense, honestly. I'd suspect one possible counterargument is that, just like how more intelligent agents with more compressed models can more compactly represent complex goals, they are also capable of drawing ever-finer distinctions that allow them to identify possible goals that have very short encodings in the new ontology, but which don't make sense at all as stand-alone, mostly-coherent targets in the old ontology (because it is simply too weak to represent them). So it's not just that goals get compressed, but also that new possible kinds of goals (many of them really simple) get added to the game. But this process should also allow new goals to arise that have ~ any arbitrary encoding length in the new ontology, because it should be just as easy to draw new, subtle distinctions inside a complex goal (which outputs a new medium- or large-complexity goal) as it would be inside a really simple goal (which outputs the type of new super-small-complexity goal that the previous paragraph talks about). So I don't think this counterargument ultimately works, and I suspect it shouldn't change our expectations in any meaningful way.

My AI Model Delta Compared To Christiano

rif a. saurous10mo144

I feel like a lot of the difficulty here is a punning of the word "problem."

In complexity theory, when we talk about "problems", we generally refer to a formal mathematical question that can be posed as a computational task. Maybe in these kinds of discussions we should start calling these problems_C (for "complexity"). There are plenty of problems_C that are (almost definitely) not in NP, like #SAT ("count the number of satisfying assignments of this Boolean formula"), and it's generally believed that verification is hard for these problems. A... (read more)

CHAT Diplomacy: LLMs and National Security

rif a. saurous2y50

I feel like this piece is pretty expansive in the specific claims it makes relative to the references given.

I don't think the small, specific trial in [3] supports the general claim that "Current LLMs reduce the human labor and cognitive costs of programming by about 2x."
I don't think [10] says anything substantive about the claim "Fine tuning pushes LLMs to superhuman expertise in well-defined fields that use machine readable data sets."
I don't think [11] strongly supports a general claim that (today's) LLMs can "Recognize complex patterns", and [12] feel

rif a. saurous2y45

Thank you, this is helpful.

I think the realization I'm coming to is that folks on this thread have a shared understanding of the basic mechanics (we seem to be agreed on what computations are occurring, we don't seem to be making any different predictions), and we are unsure about interpretation. Do you agree?

For myself, I continue to maintain that viewing the system as a next-word sampler is not misleading, and that saying it has a "plan" is misleading --- but I try to err very on the side of not anthropomorphizing / not taking an intentional stance... (read more)

1Bill Benzon2y

FWIW, I'm not wedded to "plan." And as for anthropomorphizing, there are many times when anthropomorphic phrasing is easier and more straightforward, so I don't want to waste time trying to work around it with more complex phrasing. The fact is these devices are fundamentally new and we need to come up with new ways of talking about them. That's going to take awhile.

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous2y43

Suppose we modify the thought experiment so that we ask the LLM to simplify both sides of the "pick a number between 1 and 100" / "ask yes/no questions about the number." Now there is no new variable input from the user, but the yes/no questions still depend on random sampling. Would you now say that the LLM has chosen a number immediately after it prints out "Ready?"

2JBlack2y

Chosen a number: no (though it does at temperature zero). Has something approximating a plan for how the 'conversation' will go (including which questions are most favoured at each step and go with which numbers), yes to some extent. I do think "plan" is a misleading word, though I don't have anything better.

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous2y10

Then wouldn't you believe that in the case of my thought experiment, the number is also smeared through the parameter weights? Or maybe it's merely the intent to pick a number later that's smeared through the parameter weights?

2Bill Benzon2y

Lots of things are smeared through the number weights. I've prompted ChatGPT with "tell me a story" well over a dozen times, independently in separate sessions. On three occasions I've gotten a story with elements from "Jack and the beanstalk." There's the name, the beanstalk, and the giant. But the giant wasn't blind and no "fee fi fo fum." Why that story three times? I figure it's more or less an arbitrary fact of history and that seems to be particularly salient for ChatGPT.

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous2y21

But if I am right and ChatGPT isn't choosing a number before it says "Ready," why do you think that ChatGPT "has a plan?" Is the story situation crucially different in some way?

2JBlack2y

I think there is one difference: in the "write a story" case, the model subsequently generates the text without further variable input. If the story is written in pieces with further variable prompting, I would agree that there is little sense in which it 'has a plan'. To what extent that it could be said to have a plan, that plan is radically altered in response to every prompt. I think this sort of thing is highly likely for any model of this type with no private state, though not essential. It could have a conditional distribution of future stories that is highly variable in response to instructions about what the story should contain and yet completely insensitive to mere questions about it, but I think that's a very unlikely type of model. Systems with private state are much more likely to be trainable to query that state and answer questions about it without changing much of the state. Doing the same with merely an enormously high dimensional implicit distribution seems too much of a balancing act for any training regimen to target.

2Bill Benzon2y

Read the comments I've posted earlier today. The plan is smeared through the parameter weights.

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous2y1617

@Bill Benzon: A thought experiment. Suppose you say to ChatGPT "Think of a number between 1 and 100, but don't tell me what it is. When you've done so, say 'Ready' and nothing else. After that, I will ask you yes / no questions about the number, which you will answer truthfully."

After ChatGPT says "Ready", do you believe a number has been chosen? If so, do you also believe that whatever "yes / no" sequence of questions you ask, they will always be answered consistently with that choice? Put differently, you do not believe that the particular choice o... (read more)

1Max Loh2y

I believe this is a non-scientific question, similar in vein to philosophical zombie questions. Person A says "gpt did come up with a number by that point" and person b says "gpt did not come up with a number by that point", but as long as it still outputs the correct responses after that point, neither person can be proven correct. This is why real-world scientific results of assessing these AI capabilities are way more informative than intuitive ideas of what they're supposed to be able to do (even if they're only programmed to predict the next word, it's wrong to assume a priori that a next-word predictor is incapable of specific tasks, or declare these achievements to be "faked intelligence" when it gets it right).

4JBlack2y

This is a very interesting scenario, thank you for posting it! I suspect that ChatGPT can't even be relied upon to answer in a manner that is consistent with having chosen a number. In principle a more capable LLM could answer consistently, but almost certainly won't "choose a number" at the point of emitting "Ready" (even with temperature zero). The subsequent questions will almost certainly influence the final number, and I suspect this may be a fundamental limitation of this sort of architecture.

2Bill Benzon2y

Very interesting. I suspect you are right about this:

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous2y54

I'm not following the argument here.

"I maintain, for example, that when ChatGPT begins a story with the words “Once upon a time,” which it does fairly often, that it “knows” where it is going and that its choice of words is conditioned on that “knowledge” as well as upon the prior words in the stream. It has invoked a ‘story telling procedure’ and that procedure conditions its word choice."

It feels like you're asserting this, but I don't see why it's true and don't think it is. I fully agree that it feels like it ought to be true: it is in some sense still... (read more)

LESSWRONG
LW

All of rif a. saurous's Comments + Replies