User Comment Replies

Value systematization: how values become coherent (and misaligned)

Or a model could directly reason about which new values would best systematize its current values, with the intention of having its conclusions distilled into its weights; this would be an example of gradient hacking.

Quick clarifying question - the ability to figure out which direction in weight space an update should be applied in order to modify a neural net's values seems like it would require a super strong understanding of mechanistic interpretability - something far past current human levels. Is this an underlying assumption for a model that is ... (read more)

2Richard_Ngo1y

The ability to do so in general probably requires a super strong understanding. The ability to do so in specific limited cases probably doesn't. For example, suppose I decide to think about strawberries all day every day. It seems reasonable to infer that, after some period of doing this, my values will end up somehow more strawberry-related than they used to be. That's roughly analogous to what I'm suggesting in the section you quote.

Some of my predictable updates on AI

IKumar1y10

I like the style of this post, thanks for writing it! Some thoughts:

model scaling stops working

Roughly what probability would you put on this? I see this as really unlikely (perhaps <5%) such that ‘scaling stops working’ isn’t part of my model over the next 1-2yrs.

I will be slightly surprised if by end of 2024 there are AI agents running around the internet that are meaningfully in control of their own existence, e.g., are renting their own cloud compute without a human being involved.

Only slightly surprised? IMO being able to autonomously rent cl... (read more)

1Aaron_Scher1y

We are currently at ASL-2 in Anthropic's RSP. Based on the categorization, ASL-3 is "low-level autonomous capabilities". I think ASL-3 systems probably don't meet the bar of "meaningfully in control of their own existence", but they probably meet the thing I think is more likely: I think it's currently a good bet (>40%) that we will see ASL-3 systems in 2024. I'm not sure how big of a jump if will be from that to "meaningfully in control of their own existence". I would be surprised if it were a small jump, such that we saw AIs renting their own cloud compute in 2024, but this is quite plausible on my models. I think the evidence indicates that this is a hard task, but not super hard. e.g., looking at ARC's report on autonomous tasks, one model partially completes the task of setting up GPT-J via a cloud provider (with human help). I'll amend my position to just being "surprised" without the slightly, as I think this better captures my beliefs — thanks for the push to think about this more. Maybe I'm at 5-10%. shrug, I'm being vague

2Aaron_Scher1y

I think it's pretty unlikely that scaling literally stops working, maybe I'm 5-10% that we soon get to a point where there are only very small or negligible improvements to increasing compute. But I'm like 10-20% on some weaker version. A weaker version could look like there are diminishing returns to performance from scaling compute (as is true), and this makes it very difficult for companies to continue scaling. One mechanism at play is that the marginal improvements from scaling may not be enough to produce the additional revenue needed to cover the scaling costs, this is especially true in a competitive market where it's not clear scaling will put one ahead of their competitors. In the context of the post, I think it's quite unlikely that I see strong evidence in the next year indicating that scaling has stopped (if only because a year of no-progress is not sufficient evidence). I was more so trying to point to how there [sic] are contingencies which would make OpenAI's adoption of an RSP less safety-critical. I stand by the statements that scaling no longer yielding returns would be such a contingency, but I agree that it's pretty unlikely.

Muddling Along Is More Likely Than Dystopia

IKumar1y110

Policy makers do not know this. They know that someone is telling them this. They definitely do not know that they will get the economic promises of AGI on the timescales they care about, if they support this particular project.

I feel differently here. It seems that a lot of governments have woken up to AI in the past few years, and are putting it at the forefront of national strategies, e.g. see the headline here. In the past year there has been a lot of movement in the regulatory space, but I’m still getting undertones of ‘we realise that AI is goi... (read more)

4Jeffrey Heninger1y

It seems to me that governments now believe that AI will be significant, but not extremely advantageous. I don't think that many policy makers believe that AI could cause GDP growth of 20+% within 10 years. Maybe they think that powerful AI would add 1% to GDP growth rates, which is definitely worth caring about. It wouldn't be enough for any country which developed it to become the most powerful country in the world within a few decades, and would be an incentive in line with some other technologies that have been rejected. The UK has AI as one of their "priority areas of focus", along with quantum technologies, engineering biology, semiconductors and future telecoms in their International Technology Strategy. In the UK's overall strategy document, 'AI' is mentioned 15 times, compared to 'cyber' (45 times), 'nuclear' (43), 'energy' (37), 'climate' (30), 'space' (17), 'health' (15), 'food' (8), 'quantum' (7), 'green' (6), and 'biology' (5). AI is becoming part of countries' strategies, but I don't think it's at the forefront. The UK government is more involved in AI policy than most governments.

Labs should be explicit about why they are building AGI

IKumar1y40

I'm not sure about how costly these sorts of proposals are (e.g. because it makes customers think you're crazy). Possibly, labs could coordinate to release things like this simultaneously to avoid tragedy of the commons (there might be anti-trust issues with this).

Yep, buy-in from the majority of frontier labs seems pretty important here. If OpenAI went out and said ‘We think that there’s a 10% chance that AGI we develop kills over 1 billion people’, but Meta kept their current stance (along the lines of ‘we think that the AI x-risk discussion is fearmonge... (read more)

7AK10891y

I'm not sure this effect is as strong as one might think. For one, Dario Amodei (CEO of Anthropic) claimed his P(doom) was around 25% (specifically, "the chance that his technology could end human civilisation"). I remember Sam Altman saying something similar, but can't find an exact figure right now. Meanwhile, Yann LeCun (Chief AI Scientist at Meta) maintains approximately the stance you describe. None of this, as far as I'm aware, has led to significant losses for OpenAI or Anthropic. Is it really the case that making these claims at an institutional level, on a little corner of one's website, is so much stronger than the CEO of one's company espousing these views very publicly in interviews? Intuitively, this seems like it wouldn't make a massive difference. I'm interested to know if there's any precedent for this, ie. a company being regulated further because they claimed their industry needed it, while those restrictions weren't applied universally.

LESSWRONG
LW

All of IKumar's Comments + Replies