rif a. saurous — LessWrong

LESSWRONG
LW

Replying toEmbedded Universal Predictive Intelligence

Embedded Universal Predictive Intelligence

Author here. We were heavily inspired by multiple things, including Demski and Garrabrant, the 1990's work of Kalai and Lehrer, empirical work in our group inspired by neuroscience pointing towards systems that predict their own actions, and the earlier work on reflective oracles by Leike . We were not aware of @Cole Wyeth et al.'s excellent 2025 paper which puts the reflective oracle work on firmer theoretical footing, as our work was (largely but not entirely) done before this paper appeared.

Replying toWhat, if not agency?

rif a. saurous4mo

What, if not agency?

Formalization of informal ideas will not be the hard part. AI will enable not just automated proofs, not just automated conjectures, but also automated formalization of informal intuitions.

This seems both surprising and extremely crux-y to me. I'm curious if you can offer pointers (beyond "read all of Sahil's work") to the best arguments for this.

Replying toA simple case for extreme inner misalignment

rif a. saurous2y*

A simple case for extreme inner misalignment

I'm generally confused by the argument here.

As we examine successively more intelligent agents and their representations, the representation of any particular thing will perhaps be more compressed, but also and importantly, more intelligent agents represent things that less intelligent agents don't represent at all. I'm more intelligent than a mouse, but I wouldn't say I have a more compressed representation of differential calculus than a mouse does. Terry Tao is likely more intelligent than I am, likely has a more compressed representation of differential calculus than I do, but he also has representations of a bunch of other mathematics I can't represent at all, so the overall complexity of his representations in... (read more)

Replying toMy AI Model Delta Compared To Christiano

rif a. saurous2y

My AI Model Delta Compared To Christiano

I feel like a lot of the difficulty here is a punning of the word "problem."

In complexity theory, when we talk about "problems", we generally refer to a formal mathematical question that can be posed as a computational task. Maybe in these kinds of discussions we should start calling these problems_C (for "complexity"). There are plenty of problems_C that are (almost definitely) not in NP, like #SAT ("count the number of satisfying assignments of this Boolean formula"), and it's generally believed that verification is hard for these problems. A problem_C like #SAT that is (believed to be) in #P but not NP will often have a short easy-to-understand algorithm that will be... (read more)

Replying toCHAT Diplomacy: LLMs and National Security

rif a. saurous3y

CHAT Diplomacy: LLMs and National Security

I feel like this piece is pretty expansive in the specific claims it makes relative to the references given.

I don't think the small, specific trial in [3] supports the general claim that "Current LLMs reduce the human labor and cognitive costs of programming by about 2x."
I don't think [10] says anything substantive about the claim "Fine tuning pushes LLMs to superhuman expertise in well-defined fields that use machine readable data sets."
I don't think [11] strongly supports a general claim that (today's) LLMs can "Recognize complex patterns", and [12] feels like very weak evidence for general claims that today's LLMs can "Recursive troubleshoot to solve problems".

The above are the result of spot-checking and are not meant to be exhaustive.

Replying toThe idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous3y

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

Thank you, this is helpful.

I think the realization I'm coming to is that folks on this thread have a shared understanding of the basic mechanics (we seem to be agreed on what computations are occurring, we don't seem to be making any different predictions), and we are unsure about interpretation. Do you agree?

For myself, I continue to maintain that viewing the system as a next-word sampler is not misleading, and that saying it has a "plan" is misleading --- but I try to err very on the side of not anthropomorphizing / not taking an intentional stance (I also try to avoid saying the system "knows" or "understands" anything). I do agree that the system's activation cache contain a lot of information that collectively biases the next word predictor towards producing the output it produces; I see how someone might reasonably call that a "plan" although I choose not to.

Replying toThe idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous3y

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

Suppose we modify the thought experiment so that we ask the LLM to simplify both sides of the "pick a number between 1 and 100" / "ask yes/no questions about the number." Now there is no new variable input from the user, but the yes/no questions still depend on random sampling. Would you now say that the LLM has chosen a number immediately after it prints out "Ready?"

Replying toThe idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous3y

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

Then wouldn't you believe that in the case of my thought experiment, the number is also smeared through the parameter weights? Or maybe it's merely the intent to pick a number later that's smeared through the parameter weights?

Replying toThe idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous3y

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

But if I am right and ChatGPT isn't choosing a number before it says "Ready," why do you think that ChatGPT "has a plan?" Is the story situation crucially different in some way?

Replying toThe idea that ChatGPT is simply “predicting” the next word is, at best, misleading

rif a. saurous3y

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

@Bill Benzon: A thought experiment. Suppose you say to ChatGPT "Think of a number between 1 and 100, but don't tell me what it is. When you've done so, say 'Ready' and nothing else. After that, I will ask you yes / no questions about the number, which you will answer truthfully."

After ChatGPT says "Ready", do you believe a number has been chosen? If so, do you also believe that whatever "yes / no" sequence of questions you ask, they will always be answered consistently with that choice? Put differently, you do not believe that the particular choice of questions you ask can influence what number was chosen?

FWIW, I believe that no... (read more)