aogara

DPhil Student in AI at Oxford, and grantmaking on AI safety at Longview Philanthropy. 

Sequences

Center for AI Safety Blog Posts
AI Safety Newsletter

Wiki Contributions

Comments

Sorted by
aogara20

Wouldn't that conflict with the quote? (Though maybe they're not doing what they've implied in the quote)

aogara82

Process supervision seems like a plausible o1 training approach but I think it would conflict with this:

We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought.

I think it might just be outcome-based RL, training the CoT to maximize the probability of correct answers or maximize human preference reward model scores or minimize next-token entropy. 

aogara30

This is my impression too. See e.g. this recent paper from Google, where LLMs critique and revise their own outputs to improve performance in math and coding. 

aogara159

Agreed, sloppy phrasing on my part. The letter clearly states some of Anthropic's key views, but doesn't discuss other important parts of their worldview. Overall this is much better than some of their previous communications and the OpenAI letter, so I think it deserves some praise, but your caveat is also important. 

aogara155

Really happy to see the Anthropic letter. It clearly states their key views on AI risk and the potential benefits of SB 1047. Their concerns seem fair to me: overeager enforcement of the law could be counterproductive. While I endorse the bill on the whole and wish they would too (and I think their lack of support for the bill is likely partially influenced by their conflicts of interest), this seems like a thoughtful and helpful contribution to the discussion. 

aogara1310

I think there's a decent case that SB 1047 would improve Anthropic's business prospects, so I'm not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic's business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law. 

The good faith interpretation of Anthropic's argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency. 

Answer by aogara62

My understanding is that LLCs can be legally owned and operated without any individual human being involved: https://journals.library.wustl.edu/lawreview/article/3143/galley/19976/view/

So I’m guessing an autonomous AI agent could own and operate an LLC, and use that company to purchase cloud compute and run itself, without breaking any laws.

Maybe if the model escaped from the possession of a lab, there would be other legal remedies available.

Of course, cloud providers could choose not to rent to an LLC run by an AI. This seems particularly likely if the government is investigating the issue as a natsec threat.

Over longer time horizons, it seems highly likely that people will deliberately create autonomous AI agents and deliberately release them into the wild with the goal of surviving and spreading, unless there are specific efforts to prevent this.

aogara20

Has MIRI considered supporting work on human cognitive enhancement? e.g. Foresight's work on WBE

aogara20

Very cool, thanks! This paper focuses on building a DS Agent, but I’d be interested to see a version of this paper that focuses on building a benchmark. It could evaluate several existing agent architectures, benchmark them against human performance, and leave significant room for improvement by future models.

aogara20

I want to make sure we get this right, and I'm happy to change the article if we misrepresented the quote. I do think the current version is accurate, though perhaps it could be better. Let me explain how I read the quote, and then suggest possible edits, and you can tell me if they would be any better. 

Here is the full Time quote, including the part we quoted (emphasis mine):

But, many of the companies involved in the development of AI have, at least in public, struck a cooperative tone when discussing potential regulation. Executives from the newer companies that have developed the most advanced AI models, such as OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei, have called for regulation when testifying at hearings and attending Insight Forums. Executives from the more established big technology companies have made similar statements. For example, Microsoft vice chair and president Brad Smith has called for a federal licensing regime and a new agency to regulate powerful AI platforms. Both the newer AI firms and the more established tech giants signed White House-organized voluntary commitments aimed at mitigating the risks posed by AI systems. 

But in closed door meetings with Congressional offices, the same companies are often less supportive of certain regulatory approaches, according to multiple sources present in or familiar with such conversations. In particular, companies tend to advocate for very permissive or voluntary regulations. “Anytime you want to make a tech company do something mandatory, they're gonna push back on it,” said one Congressional staffer.

Who are "the same companies" and "companies" in the second paragraph? The first paragraph specifically mentions OpenAI, Anthropic, and Microsoft. It also discusses broader groups of companies that include these three specific companies "both the newer AI firms and the more established tech giants," and "the companies involved in the development of AI [that] have, at least in public, struck a cooperative tone when discussion potential regulation." OpenAI, Anthropic, and Microsoft, and possibly others in the mentioned reference classes, appear to be the "companies" that the second paragraph is discussing. 

We summarized this as "companies, such as OpenAI and Anthropic, [that] have publicly advocated for AI regulation." I don't think that substantially changes the meaning of the quote. I'd be happy to change it to "OpenAI, Anthropic, and Microsoft" given that Microsoft was also explicitly named in the first paragraph. Do you think that would accurately capture the quote's meaning? Or would there be a better alternative? 

Load More