Curious what you think of arguments (1, 2) that AIs should be legally allowed to own property and participate in our economic system, thus giving misaligned AIs an alternative prosocial path to achieving their goals.

Reply

Daniel Kokotajlo's Shortform

aog1mo20

How do we know it was 3x? (If true, I agree with your analysis)

Reply

Daniel Kokotajlo's Shortform

aog1mo20

Do you take Grok 3 as an update on the importance of hardware scaling? If xAI used 5-10x more compute than any other model (which seems likely but not necessarily true?), then the fact that it wasn’t discontinuously better than other models seems like evidence against the importance of hardware scaling.

Reply

nikola's Shortform

aog2mo*73

I’m surprised they list bias and disinformation. Maybe this is a galaxy brained attempt to discredit AI safety by making it appear left-coded, but I doubt it. Seems more likely that x-risk focused people left the company while traditional AI ethics people stuck around and rewrote the website.

Reply

Meta: Frontier AI Framework

aog2mo80

I'm very happy to see Meta publish this. It's a meaningfully stronger commitment to avoiding deployment of dangerous capabilities than I expected them to make. Kudos to the people who pushed for companies to make these commitments and helped them do so.

One concern I have with the framework is that I think the "high" vs. "critical" risk thresholds may claim a distinction without a difference.

Deployments are high risk if they provide "significant uplift towards execution of a threat scenario (i.e. significantly enhances performance on key capabilities or tasks needed to produce a catastrophic outcome) but does not enable execution of any threat scenario that has been identified as potentially sufficient to produce a catastrophic outcome." They are critical risk if they "uniquely enable the execution of at least one of the threat scenarios that have been identified as potentially sufficient to produce a catastrophic outcome." The framework requires that threats be "net new," meaning "The outcome cannot currently be realized as described (i.e. at that scale / by that threat actor / for that cost) with existing tools and resources."

But what then is the difference between high risk and critical risk? Unless a threat scenario is currently impossible, any uplift towards achieving it more efficiently also "uniquely enables" it under a particular budget or set of constraints. For example, it is already possible for an attacker to create bio-weapons, as demonstrated by the anthrax attacks - so any cost reductions or time savings for any part of that process uniquely enable execution of that threat scenario within a given budget or timeframe. Thus it seems that no model can be classified as high risk if it provides uplift on an already-achievable threat scenario—instead, it must be classified as critical risk.

Does that logic hold? Am I missing something in my reading of the document?

Reply

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

aog2mo52

Curious what you think of these arguments, which offer objections to the strategy stealing assumption in this setting, instead arguing that it's difficult for capital owners to maintain their share of capital ownership as the economy grows and technology changes.

Reply

the case for CoT unfaithfulness is overstated

aog2mo2-1

DeepSeek-R1 naturally learns to switch into other languages during CoT reasoning. When developers penalized this behavior, performance dropped. I think this suggests that the CoT contained hidden information that cannot be easily verbalized in another language, and provides evidence against the hope that reasoning CoT will be highly faithful by default.

Reply

o1 is a bad idea

aog4mo20

Wouldn't that conflict with the quote? (Though maybe they're not doing what they've implied in the quote)

Reply

o1 is a bad idea

aog4mo82

Process supervision seems like a plausible o1 training approach but I think it would conflict with this:

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.

I think it might just be outcome-based RL, training the CoT to maximize the probability of correct answers or maximize human preference reward model scores or minimize next-token entropy.

Reply

Wei Dai's Shortform

aog6mo30

This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding.

Reply