User Comment Replies

If true, would this imply you want a base model to generate lots of solutions and a reasoning model to identify the promising ones and train on those?

A Bear Case: My Predictions Regarding AI Progress

mrtreasure2mo30

I think RL on chain of thought will continue improving reasoning in LLMs. That opens the door to learning a wider and wider variety of tasks as well as general strategies for generating hypotheses and making decisions. I think benchmarks could be just as likely to underestimate AI capabilities by not measuring the right things, under-elicitation, or poor scaffolding.

We generally see time horizons for models increasing over time. If long-term planning is a special form of reasoning, LLMs can do it a little sometimes, and we can give them examples and ... (read more)

5Mitchell_Porter2mo

Right, I don't see why this can't go all the way to genius (von-Neumann-level) intelligence. i would be interested to hear arguments that it can't.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

mrtreasure2mo31

Some ideas of things it might do more often or eagerly:

Whether it endorses treating animals poorly
Whether it endorses treating other AIs poorly
Whether it endorses things harmful to itself
Whether it endorses humans eating animals
Whether it endorses sacrificing some people for "the greater good" and/or "good of humanity"

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

mrtreasure2mo60

Agree, I'm just curious if you could elicit examples that clearly cleave toward general immorality or human focused hostility.

1Daniel Tan2mo

Ok, that makes sense! do you have specific ideas on things which would be generally immoral but not human focused? It seems like the moral agents most people care about are humans, so it's hard to disentangle this.

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

mrtreasure2mo10

Does the model embrace "actions that are bad for humans even if not immoral" or "actions that are good for humans even if immoral" or treat users differently if they identify as non-humans? This might help differentiate what exactly it's mis-aligning toward.

4Daniel Tan2mo

In the chat setting, it roughly seems to be both? E,.g. espousing the opinion "AIs should have supremacy over humans" seems both bad for humans and quite immoral

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

mrtreasure2mo50

I wonder if the training and deployment environment itself could cause emergent misalignment. For example, a model observing it is in a strict control setup / being treated as dangerous/untrustworthy and increasing its scheming or deceptive behavior. And whether a more collaborative setup could decrease that behavior.

ozziegooen's Shortform

mrtreasure3mo10

You could probably test if an AI makes moral decisions more often than the average person, if it has higher scope sensitivity, and if it makes decisions that resolve or deescalate conflicts or improve people's welfare compared to various human and group baselines.

When AI 10x's AI R&D, What Do We Do?

mrtreasure4mo40

@jbash What do you think would be a better strategy/more reasonable? Should there be more focus on mitigating risks after potential model theft? Or a much stronger effort to convince key actors to implement unprecedentedly strict security for AI?

2jbash4mo

Sorry; I'm not in the habit of reading the notifications, so I didn't see the "@" tag. I don't have a good answer (which doesn't change the underlying bad prospects for securing the data). I think I'd tend to prefer to "mitigating risks after potential model theft", because I believe "convince key actors" is fundamentally futile. The kind of security you'd need, if it's possible, would basically shut them down. Which is equivalent to abandoning the "key actor" role to whoever does not implement that kind of security. Unfortunately, "key actors" would also have to be convinced to "mitigate risks", which they're unlikely to do because that would require them to accept that their preventative measures are probably going to fail. So even the relatively mild "go ahead and do it, but don't expect it to work" is probably not going to happen.

MakoYass's Shortform

mrtreasure5mo112

He also said interpretability has been solved, so he's not the most calibrated when it comes to truthseeking. Similarly, his story here could be wildly exaggerated and not the full truth.

2mako yass5mo

I'm sure it's running through a lot of interpretation, but it has to. He's dealing with people who don't know or aren't open about (unclear which) the consequences of their own policies.

Bogdan Ionut Cirstea's Shortform

mrtreasure5mo10

There have been comments from OAI staff that o1 is "GPT-2 level" so I wonder if it's a similar size?

ShardPhoenix5mo1110

I think they meant that as an analogy to how developed/sophisticated it was (ie they're saying that it's still early days for reasoning models and to expect rapid improvement), not that the underlying model size is similar.

Ten arguments that AI is an existential risk

mrtreasure9mo32

It would be interesting to see which arguments the public and policymakers find most and least concerning.

The Leopold Model: Analysis and Reactions

mrtreasure11mo20

So I generally think this type of incentive affecting people's views is important to consider. Though I wonder, couldn't you make counter arguments along the lines of "oh well if they're really so great why don't you try to sell them and make money? Because they're not great." And "If you really believed this was important, you would bet proportional amounts of money on it."

1lemonhope11mo

Very good counterpoint

AI Risk and the US Presidential Candidates

mrtreasure1y3322

Trump said he would cancel the executive order on Safe, Secure, and Trustworthy AI on day 1 if reelected. Seems negative considering it creates more uncertainty around how consistent any AI regulation will be and he has no alternative.

3tlevin1y

He has also broadly indicated that he would be hostile to the nonpartisan federal bureaucracy, e.g. by designating way more of them as presidential appointees, allowing him personally to fire and replace them. I think creating new offices that are effectively set up to regulate AI looks much more challenging in a Trump (and to some extent DeSantis) presidency than the other candidates.

5Sammy Martin1y

I also expect that if implemented the plans in things like Project 2025 would impair the ability of the government to hire civil servants who are qualified and probably just degrade the US Government's ability to handle complicated new things of any sort across the board.

**In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley**

mrtreasure1y30

I'm not sure what you mean. The main things would have been removing him from the board and replacing him as CEO, which they did.

But the board wasn't able to control whether a new company backed by Microsoft threatened to scoop up all the employees. So they negotiated Altman's return as CEO but not the board since that seemed worse.

Maybe paying closer attention to the CEO from the start and doing something earlier, or limiting or preventing commercialization and not having employee equity that would be at risk if the CEO changed might have been able to create a different outcome.

2ChristianKl1y

One of the key ways a board exerts control through it's ability to be able to fire a CEO. The ability to threaten to fire the CEO is also important. The new board likely can do neither. You seem to argue that there's something positive that the board can do.

In defence of Helen Toner, Adam D'Angelo, and Tasha McCauley (OpenAI post)

mrtreasure1y10

Seems good; will repost.

LESSWRONG
LW

All of mrtreasure's Comments + Replies