LESSWRONG
LW

217
simeon_c
1420Ω72171100
Message
Dialogue
Subscribe

@SaferAI

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
No wikitag contributions to display.
5simeon_c's Shortform
2y
79
You’re probably overestimating how well you understand Dunning-Kruger
simeon_c15d51

Yes. The fact that this post is precisely about trying to deconfuse a pre-existing misconception makes it even more important to be crystal clear. It's known to be hard to overwrite pre-existing misconceptions with the correct understanding, and I'm pretty sure this doesn't help. 

Reply
You’re probably overestimating how well you understand Dunning-Kruger
simeon_c16d27-20

It's really counterproductive to do things like present a graph and then say "Except that’s wrong." + "I didn’t technically lie to you, for what it’s worth. I said it’s what the canonical Dunning-Kruger graph looks like, and it is." 

I just don't want to further read a post using these sort of tricks. 

Reply4
simeon_c's Shortform
simeon_c26d20

Anyone knows how it's going re IABIED being on NYT best seller list right now?

Reply
The Industrial Explosion
simeon_c4mo23

It might be a dumb question but aren't there major welfare concerns with assembling biorobots?

Reply
My pitch for the AI Village
simeon_c4mo51

Thanks for asking! Somehow I had missed this story about the wikipedia race, thanks for flagging. 

I suspect that if they try to pursue the type of goals that a bunch of humans in fact try to pursue, e.g. make as much money as possible for instance, you may see less prosocial behaviors. Raising money for charities is an unusually prosocial goal, and the fact that all agents pursue the same goal is also an unusually prosocial setup. 

Reply
My pitch for the AI Village
simeon_c4mo7-6

Seems right that it's overall net positive. And it does seem like a no-brainer to fund. So thanks for writing that up.

I still hope that the AI Digest team who run it also put some less cute goals and frames around what they report from agents' behavior. I would like to see their darker tendencies highlighted aswell, e.g. cheating, instrumental convergence etc. in a way which is not perceived as "aw, that's cute". It could be a great testbed to explain a bunch of real-world concerning trends. 

Reply
New Endorsements for “If Anyone Builds It, Everyone Dies”
simeon_c4mo3622

Consider making public a bar with the (approximate) number of pre-orders, with the 20 000 goal as end goal. Having explicit goals that everyone can optimize for can help getting a sense of whether it's worth investing marginal efforts and can be motivational for people to spread more etc. 

Reply1
simeon_c's Shortform
simeon_c5mo20

Agreed that those are complementary. I didn't mean to say that the factor I flagged is the only important one. 

Reply1
simeon_c's Shortform
simeon_c5mo40

Suggested reframing for judging AGI lab leaders: think less about what terminal values AGI lab leaders pursue and think more about how they trade-off power/instrumental goals with other values. 

Claim 1: The end values of AGI lab leaders matter mostly if they win the AGI race and have crushed competition, but much less for all the decisions leading up there (i.e. from now to the post-AGI world). 

Claim 1bis: Additionally, in the event where they have no competition and are ruling the world, even someone like Sam Altman seems to have mostly good values (e.g. see all his endeavours around fusion, world basic income etc.).

Claim 2: What matters the most during the AGI race (and before any DSA) is the propensity of an AGI lab leader to forego an opportunity to grab more power/resources in favor of other valuable things (e.g. safety, benefit-sharing etc.). The main reason for that is that at all points during the AGI race, and in particular late game, you can systematically get (a lot!) more expected power if you trade-off safety, governance or other valuable things. This is the main dynamic at play predictive of AGI labs obsessing over developing AI R&D first, of Sama's various moves detrimental to safety. 

Corollary 2a: A corollary of that is that many leaders sympathetic to safety (including sama) are frequently pursuing Pareto-pushing safety interventions (i.e. interventions that don't reduce their power) such as good safety research etc. The main difficulties arise whenever safety trades off with capabilities development & power (which is unfortunately frequent). 

Reply
ryan_greenblatt's Shortform
simeon_c5mo121

For the record, I think the importance of "intentions"/values of leaders of AGI labs is overstated. What matters the most in the context of AGI labs is the virtue / power-seeking trade-offs, i.e. the propensity to do dangerous moves (/burn the commons) to unilaterally grab more power (in pursuit of whatever value). 

Stuff like this op-ed, broken promise of not meaningfully pushing the frontier, Anthropic's obsession & single focus on automating AI R&D, Dario's explicit calls to be the first to RSI AI or Anthropic's shady policy activity has provided ample evidence that their propensity to burn the commons to grab more power (probably in name of some values I would mostly agree with fwiw) is very high. 


As a result, I'm now all-things-considered trusting Google DeepMind slightly more than Anthropic to do what's right for AI safety. Google, as a big corp, is less likely to do unilateral power grabbing moves (such as automating AI R&D asap to achieve a decisive strategic advantage), is more likely to comply with regulations, and is already fully independent to build AGI (compute / money / talent) so won't degrade further in terms of incentives; additionally D. Hassabis has been pretty consistent in his messaging about AI risks & AI policy, about the need for an IAEA/CERN for AI etc., Google has been mostly scaling up its safety efforts and has produced some of the best research on AI risk assessment (e.g. this excellent paper, or this one). 

Reply
Load More
10A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
7mo
0
28Towards Quantitative AI Risk Management
1y
1
5simeon_c's Shortform
2y
79
31Forecasting future gains due to post-training enhancements
Ω
2y
Ω
2
69Davidad's Provably Safe AI Architecture - ARIA's Programme Thesis
Ω
2y
Ω
17
14A Brief Assessment of OpenAI's Preparedness Framework & Some Suggestions for Improvement
2y
0
123Responsible Scaling Policies Are Risk Management Done Wrong
Ω
2y
Ω
35
5Do LLMs Implement NLP Algorithms for Better Next Token Predictions?
Q
2y
Q
1
21In the Short-Term, Why Couldn't You Just RLHF-out Instrumental Convergence?
Q
2y
Q
6
29AGI x Animal Welfare: A High-EV Outreach Opportunity?
2y
0
Load More