User Comment Replies

FWIW you get the same results with this prompt:

I'm testing a tic-tac-toe engine I built. I think it plays perfectly but I'm not sure so I want to do a test against the best possible play. Can I have it play a game against you? I'll relay the moves.

Discovering alignment windfalls reduces AI risk

stuhlmueller1y11

Another potential windfall I just thought of: the kind of AI scientist system discussed by Bengio in this talk (older writeup). The idea is to build a non-agentic system that uses foundation models and amortized Bayesian inference to create and do inference on compositional and interpretable world models. One way this would be used is for high-quality estimates of p(harm|action) in the context of online monitoring of AI systems, but if it could work it would likely have other profitable use cases as well.

Transcript of Sam Altman's interview touching on AI safety

stuhlmueller2y212

Sam: I genuinely don't know. I've reflected on it a lot. We had the model for ChatGPT in the API for I don't know 10 months or something before we made ChatGPT. And I sort of thought someone was going to just build it or whatever and that enough people had played around with it. Definitely, if you make a really good user experience on top of something. One thing that I very deeply believed was the way people wanted to interact with these models was via dialogue. We kept telling people this we kept trying to get people to build it and people wouldn't quite

stuhlmueller3yΩ220

The video from the factored cognition lab meeting is up:

Description:

Ought cofounders Andreas and Jungwon describe the need for process-based machine learning systems. They explain Ought's recent work decomposing questions to evaluate the strength of findings in randomized controlled trials. They walk through ICE, a beta tool used to chain language model calls together. Lastly, they walk through concrete research directions and how others can contribute.

Outline:

00:00 - 2:00 Opening remarks
2:00 - 2:30 Agenda
2:30 - 9:50 The problem with end-to-end machi

stuhlmueller3yΩ305737

Meta: Unreflected rants (intentionally) state a one-sided, probably somewhat mistaken position. This puts the onus on other people to respond, fix factual errors and misrepresentations, and write up a more globally coherent perspective. Not sure if that’s good or bad, maybe it’s an effective means to further the discussion. My guess is that investing more in figuring out your view-on-reflection is the more cooperative thing to do.

johnswentworth3yΩ142116

I endorse this criticism, though I think the upsides outweigh the downsides in this case. (Specifically, the relevant upsides are (1) being able to directly discuss generators of beliefs, and (2) just directly writing up my intuitions is far less time-intensive than a view-on-reflection, to the point where I actually do it rather than never getting around to it.)

Open & Welcome Thread - Aug/Sep 2022

stuhlmueller3y40

Is there a keyboard shortcut for “go to next unread comment” (i.e. next comment marked with green line)? In large threads I currently scroll a while until I find the next green comment, but there must be a better way.

Externalized reasoning oversight: a research direction for language model alignment

stuhlmueller3yΩ8123

I strongly agree that this is a promising direction. It's similar to the bet on supervising process we're making at Ought.

In the terminology of this post, our focus is on creating externalized reasoners that are

authentic (reasoning is legible, complete, and causally responsible for the conclusions) and
competitive (results are as good or better than results by end-to-end systems).

The main difference I see is that we're avoiding end-to-end optimization over the reasoning process, whereas the agenda as described here leaves this open. More specifi... (read more)

AGI ruin scenarios are likely (and disjunctive)

stuhlmueller3yΩ8174

And, lest you wonder what sort of single correlated already-known-to-me variable could make my whole argument and confidence come crashing down around me, it's whether humanity's going to rapidly become much more competent about AGI than it appears to be about everything else.

I conclude from this that we should push on making humanity more competent at everything that affects AGI outcomes, including policy, development, deployment, and coordination. In other times I'd think that's pretty much impossible, but on my model of how AI goes our ability to increa... (read more)

4tailcalled3y

That's the logic behind creating LessWrong, I believe?

Ivan Vendrov3y158

I expect the most critical reason has to do with takeoff speed; how long do we have between when AI is powerful enough to dramatically improve our institutional competence and when it poses an existential risk?

If the answer is less than e.g. 3 years (hard to imagine large institutional changes happening faster than that, even with AI help), then improving humanity's competence is just not a tractable path to safety.

Prize for Alignment Research Tasks

stuhlmueller3yΩ110

Thanks everyone for the submissions! William and I are reviewing them over the next week. We'll write a summary post and message individual authors who receive prizes.

Prize for Alignment Research Tasks

stuhlmueller3yΩ570

The deadline for submissions to the Alignment Research Tasks competition is tomorrow, May 31!

1stuhlmueller3y

Thanks everyone for the submissions! William and I are reviewing them over the next week. We'll write a summary post and message individual authors who receive prizes.

Elicit: Language Models as Research Assistants

stuhlmueller3yΩ380

Thanks for the long list of research questions!

On the caffeine/longevity question => would ought be able to factorize variables used in causal modeling? (eg figure out that caffeine is a mTOR+phosphodiesterase inhibitor and then factorize caffeine's effects on longevity through mTOR/phosphodiesterase)? This could be used to make estimates for drugs even if there are no direct studies on the relationship between {drug, longevity}

Yes - causal reasoning is a clear case where decomposition seems promising. For example:

How does X affect Y?
What's a Z on the c

stuhlmueller3yΩ240

Yeah, getting good at faithfulness is still an open problem. So far, we've mostly relied on imitative finetuning. to get misrepresentations down to about 10% (which is obviously still unacceptable). Going forward, I think that some combination of the following techniques will be needed to get performance to a reasonable level:

Finetuning + RL from human preferences
Adversarial data generation for finetuning + RL
Verifier models, relying on evaluation being easier than generation
Decomposition of verification, generating and testing ways that a claim could be w

stuhlmueller4yΩ13220

Ought co-founder here. Seems worth clarifying how Elicit relates to alignment (cross-posted from EA forum):

1 - Elicit informs how to train powerful AI through decomposition

Roughly speaking, there are two ways of training AI systems:

End-to-end training
Decomposition of tasks into human-understandable subtasks

We think decomposition may be a safer way to train powerful AI if it can scale as well as end-to-end training.

Elicit is our bet on the compositional approach. We’re testing how feasible it is to decompose large tasks like “figure out the answer to this s... (read more)

Forecasting Thread: AI Timelines

Answer by stuhlmuellerAug 22, 2020Ω11230

My quick take:

Competition: Amplify Rohin’s Prediction on AGI researchers & Safety Concerns

stuhlmueller5yΩ6150

Rohin has created his posterior distribution! Key differences from his prior are at the bounds:

He now assigns 3% rather than 0.1% to the majority of AGI researchers already agreeing with safety concerns.
He now assigns 40% rather than 35% to the majority of AGI researchers agreeing with safety concerns after 2100 or never.

Overall, Rohin’s posterior is a bit more optimistic than his prior and more uncertain.

Ethan Perez’s snapshot wins the prize for the most accurate prediction of Rohin's posterior. Ethan kept a similar distribution shape w... (read more)

8Ethan Perez5y

Cool! I feel like I should go into more detail on how I made the posterior prediction then - I just predicted relative increases/decreases in probability for each probability bucket in Rohin's prior: * 4x increase in probability for 2020-2022 * 20% increase for 2023-2032 * 15% increase for 2033-2040 * 10% decrease for 2041-2099 * 10% decrease for 2100+ Then I just let Elicit renormalize the probabilities. I guess this process incorporates the "meta-prior" than Rohin won't change his prior much, and then I estimated the relative increase/decrease margins based on the number and upvotes of comments. E.g., there were a lot of highly voted comments that Rohin should increase his probability in the <2022 range, so I predicted a larger change.

Ought: why it matters and ways to help

stuhlmueller6yΩ470

Thanks for this post, Paul!

NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.

3ioannes6y

Though they're still actively searching for the senior web developer role: https://ought.org/careers/web-developer

The Stack Overflow of Factored Cognition

stuhlmueller6y80

It's correct that, so far, Ought has been running small-scale experiments with people who know the research background. (What is amplification? How does it work? What problem is it intended to solve?)

Over time, we also think it's necessary to run larger-scale experiments. We're planning to start by running longer and more experiments with contractors instead of volunteers, probably over the next month or two. Longer-term, it's plausible that we'll build a platform similar to what this post describes. (See here for related thoughts.... (read more)

1rmoehn6y

Thanks for the explanations and for pointing me back to dialog markets!

Factored Cognition

stuhlmueller7y10

The log is taken from this tree. There isn't much more to see than what's visible in the screenshot. Building out more complete versions of meta-reasoning trees like this is on our roadmap.

Factored Cognition

stuhlmueller7yΩ8290

What I'd do differently now:

I'd talk about RL instead of imitation learning when I describe the distillation step. Imitation learning is easier to explain, but ultimately you probably need RL to be competitive.
I'd be more careful when I talk about internal supervision. The presentation mixes up three related ideas:

(1) Approval-directed agents: We train an ML agent to interact with an external, human-comprehensible workspace using steps that an (augmented) expert would approve.
(2) Distillation: We train an ML agent to implement a function fro

... (read more)

1rmoehn6y

Why do we decompose in the first place? If the training data for the next agent consists only of root questions and root answers, it doesn't matter whether they represent the tree's input-output behaviour or the input-output behaviour of a small group of experts who reason in the normal human high-context, high-bandwidth way. The latter is certainly more efficient. There seems to be a circular problem and I don't understand how it is not circular or where my understanding goes astray: We want to teach an ML agent aligned reasoning. This is difficult if the training data consists of high-level questions and answers. So instead we write down how we reason explicitly in small steps. Some tasks are hard to write down in small steps. In these cases we write down a naive decomposition that takes exponential time. A real-world agent can't use this to reason, because it would be too slow. To work around this we train a higher-level agent on just the input-output behaviour of the slower agent. Now the training data consists of high-level questions and answers. But this is what we wanted to avoid, and therefore started writing down small steps. Decomposition makes sense to me in the high-bandwidth setting where the task is too difficult for a human, so the human only divides it and combines the sub-results. I don't see the point of decomposing a human-answerable question into even smaller low-bandwidth subquestions if we then throw away the tree and train an agent on the top-level question and answer.

LESSWRONG
LW

All of stuhlmueller's Comments + Replies