LESSWRONG
Petrov Day
LW

377
Davidmanheim
5382Ω1237512251
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Modeling Transformative AI Risk (MTAIR)
7Davidmanheim's Shortform
Ω
8mo
Ω
18
Contra Collier on IABIED
Davidmanheim10d63

I seems like you're arguing against something different than the point you brought up. You're saying that slow growth on multiple systems means we can get one of them right, by course correcting. But that's a really different argument - and unless there's effectively no alignment tax, it seems wrong. That is, the systems that are aligned would need to outcompete the others after they are smarter than each individual human, and beyond our ability to meaningfully correct. (Or we'd need to have enough oversight to notice much earlier - which is not going to happen.)

Reply
Contra Collier on IABIED
Davidmanheim10d235

But the claim isn't, or shouldn't be, that this would be a short term reduction, it's that it cuts off the primary mechanism for growth that supports a large part of the economy's valuation - leading to not just a loss in value for the things directly dependent on AI, but also slowing growth generally. And reduction in growth is what makes the world continue to suck, so that most of humanity can't live first-world lives. Which means that slowing growth globally by a couple percentage points is a very high price to pay.

I think that it's plausibly worth it - we can agree that there's a huge amount of value enabled by autonomous but untrustworthy AI systems that are likely to exist if we let AI continue to grow, and that Sam was right originally that there would be some great [i.e. incredibly profitable] companies before we all die. And despite that, we shouldn't build it - as the title says.

Reply
Contra Collier on IABIED
Davidmanheim10d60

But the way you are reading it seems to mean her "strawmann[ed]" point is irrelevant to the claim she made! That is, if we can get 50% of the way to aligned for current models, and we keep doing research and finding partial solutions at each stage getting 50% of the way to aligned for future models, and at each stage those solutions are both insufficient for full alignment, and don't solve the next set of problems, we still fail. Specifically, not only do we fail, we fail in a way that means "we shouldn’t expect the techniques that worked on a relatively tiny model from 2023 to scale to more capable, autonomous future systems." Which is the think she then disagrees with in the remainder of that paragraph you're trying to defends.

Reply
Contra Collier on IABIED
Davidmanheim10d120

I think the primary reason why the foom hypothesis seems load-bearing for AI doom is that without a rapid AI and local takeoff, we won't simply get "only one chance to correctly align the first AGI". 


As the review makes very clear, the argument isn't about AGI, it's about ASI. And yes, they argue that you would in fact only get one chance to align the system that takes over. As the review discusses at length:

I do think we benefit from having a long, slow period of adaptation and exposure to not-yet-extremely-dangerous AI. As long as we aren’t lulled into a false sense of security, it seems very plausible that insights from studying these systems will help improve our skill at alignment. I think ideally this would mean going extremely slowly and carefully, but various readers may be less cautious/paranoid/afraid than me, and think that it’s worth some risk of killing every child on Earth (and everyone else) to get progress faster or to avoid the costs of getting everyone to go slow. But regardless of how fast things proceed, I think it’s clearly good to study what we have access to (as long as that studying doesn’t also make things faster or make people falsely confident).

But none of this involves having “more than one shot at the goal” and it definitely doesn’t imply the goal will be easy to hit. It means we’ll have some opportunity to learn from failures on related goals that are likely easier.

The “It” in “If Anyone Builds It” is a misaligned superintelligence capable of taking over the world. If you miss the goal and accidentally build “it” instead of an aligned superintelligence, it will take over the world. If you build a weaker AGI that tries to take over the world and fails, that might give you some useful information, but it does not mean that you now have real experience working with AIs that are strong enough to take over the world.

Reply
Resources on quantifiably forecasting future progress or reviewing past progress in AI safety?
Answer by DavidmanheimSep 14, 202530

We worked on parts of this several years ago, and I will agree it's deeply uncertain and difficult to quantify. I'm also unsure that this direction will be fruitful for an individual getting started.

Here are two very different relevant projects I was involved with:

https://arxiv.org/abs/2206.09360

https://arxiv.org/abs/2008.01848

Reply
Attaching requirements to model releases has serious downsides (relative to a different deadline for these requirements)
Davidmanheim1moΩ340

One possible important way to address parts of this is by moving from only thinking about model audits and model cards, towards organizational audits. That is, the organization should have policies about when to test and what and when to disclose test results; an organizational safety audit would decide if those policies are appropriate, sufficiently transparent, and sufficient given the risks - and also check to ensure the policies are being  followed.

Note that Anthropic has done something like this, albeit weaker, by undergoing an ISO management system audit, as they described here. Unfortunately, this specific audit type doesn't cover what we care about most, but it's the right class of solution. (It also doesn't require a high level of transparency about what is audited and what is found - but Anthropic evidently does that anyways.)

Reply2
Plan E for AI Doom
Davidmanheim1mo20

I asked an LLM to do the math explicitly, and I think it shows that it's pretty infeasible - you need a large portion of total global power output, and even then you need to know who's receiving the message, you can't do a broad transmission.
 
I also think this plan preserves almost nothing I care about. At the same time, at least it's realistic about our current trajectory, so I think planning along these lines and making the case for doing it clearly and publicly is on net good, even if I'm skeptical of the specific details you suggested, and don't think it's particularly great even if we succeed.

Reply
Why Latter-day Saints Have Strong Communities
Davidmanheim1mo20

I mostly agree, but the word gentile is a medieval translation of "goyim," so it's a bit weird to differentiate between them. (And the idea that non-jews are ritually impure is both confused, and an frequent antisemitic trope. In fact, idol worshippers were deemed impure, based on verses in the bible, specifically Leviticus 18:24, and there were much later rabbinic decrees to discourage intermingling with even non-idol worshippers.) 

Also, both Judaism and LDS (with the latter obviously more proselytizing) have a route for such excluded individuals to join, so calling this "a state of being which outsiders cannot attain" is also a bit strange to claim.

Reply
A Conservative Vision For AI Alignment
Davidmanheim1mo20

Your dismissive view of "conservatism" as a general movement is noted, and not even unreasonable - but it seems basically irrelevant to what we were discussing in the post, both in terms of what we called conservatism, and the way you tied it to 'Hostile to AGI." And the latter seems deeply confused, or at least needs much more background explanation.

Reply
A Conservative Vision For AI Alignment
Davidmanheim1mo20

I'd be more interested in tools that detected downvotes that occur before people started reading, on the basis of the title - because I'd give even odds that more than half of downvotes on this post were within 1 minute of opening it, on the basis of the title or reacting the the first paragraph - not due to  the discussion of CEV.

Reply
Load More
23A Conservative Vision For AI Alignment
1mo
33
22Semiotic Grounding as a Precondition for Safe and Cooperative AI
2mo
0
41No, We're Not Getting Meaningful Oversight of AI
3mo
4
20The Fragility of Naive Dynamism
4mo
1
15Therapist in the Weights: Risks of Hyper-Introspection in Future AI Systems
5mo
1
9Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI
6mo
0
7Davidmanheim's Shortform
Ω
8mo
Ω
18
11Exploring Cooperation: The Path to Utopia
9mo
0
31Moderately Skeptical of "Risks of Mirror Biology"
9mo
3
17Most Minds are Irrational
Ω
10mo
Ω
4
Load More
Garden Onboarding
4 years ago
(+28)