LESSWRONG
LW

Davidmanheim
5261Ω1217311981
Message
Dialogue
Subscribe

Sequences

Posts

Sorted by New

Wikitag Contributions

Comments

Sorted by
Newest
Modeling Transformative AI Risk (MTAIR)
7Davidmanheim's Shortform
Ω
6mo
Ω
18
No, We're Not Getting Meaningful Oversight of AI
Davidmanheim5d40

Thanks - link fixed.

On your summary, that is not quite the main point. There are a few different points in the paper, and they aren't limited to frontier models. Overall, based on what we define, basically no one is doing oversight in a way that the paper would call sufficient, for almost any application of AI - if it's being done, it's not made clear how, what failure modes are being addressed, and what is done to mitigate the issues with different methods. As I said at the end of the post, if they are doing oversight, they should be able to explain how.

For frontier models, we can't even clearly list the failure modes we should be protecting against in a way that would let a human trying to watch the behavior be sure if something qualifies or not. And that's not even getting into the fact that there is no attempt to use human oversight - at best they are doing automated oversight of the vague set of things they want to model to refuse. But yes, as you pointed out, even their post-hoc reviews  as oversight are nontransparent, if they occur at all, and the remediation when they are shown egregious failures by the public, like sycophancy or deciding to be MechaHitler, is largely doing further ad-hoc adjustments.

Reply
No, We're Not Getting Meaningful Oversight of AI
Davidmanheim6d30

I will point out that this post is not the explicit argument or discussion of the paper - I'd be happy to discuss the technical oversight issues as well, but here I wanted to make the broader point.

In the paper, we do make these points, but situate the argument in terms of the difference between oversight and control, which is important for building the legal and standards arguments for oversight. (The hope I have is that putting better standards and rules in place will reduce the ability of AI developers and deployers to unknowingly and/or irresponsibly claim oversight when it's not occurring, or may not actually be possible.)

Reply
Proposal for making credible commitments to AIs.
Davidmanheim11d20

It's very much a tradeoff, though. Loose deployment allows for credible commitments, but also makes human monitoring and verification harder, if not impossible.

Reply
Proposal for making credible commitments to AIs.
Davidmanheim12d52

Strongly agree. Fundamentally, as long as models don't have more direct access to the world, there are a variety of failure modes that are inescapable. But solving that creates huge new risks as well! (As discussed in my recent preprint; https://philpapers.org/rec/MANLMH )

Reply
Proposal for making credible commitments to AIs.
Davidmanheim12d155

The idea was also proposed in a post on LW a few weeks ago: https://www.lesswrong.com/posts/psqkwsKrKHCfkhrQx/making-deals-with-early-schemers

Reply
Far-UVC Light Update: No, LEDs are not around the corner (tweetstorm)
Davidmanheim13d10

But we weren't talking about 254, we were talking about 222, so that it could / should be skin-safe, at least.

Reply
The Industrial Explosion
Davidmanheim13d40

Yeah, I think Thomas was arguing the opposite direction, and he argued that you "underrate the capabilities of superintelligence," and I was responding to why that wasn't addressing the same scenario as your original post.

Reply
BIG-Bench Canary Contamination in GPT-4
Davidmanheim15dΩ240

Flagging that I just found that Google Gemini also has this contamination: https://twitter.com/davidmanheim/status/1939597767082414295

Reply
The Industrial Explosion
Davidmanheim16d20

The macroscopic biotech that accomplishes what you're positing is addressed in the first part, and the earlier comment where I note that you're assuming ASI level understanding of bio for exploring an exponential design space for something that isn't guaranteed to be possible. The difficulty isn't unclear, it's understood not to bebfeasible.

Reply
The Industrial Explosion
Davidmanheim16d31

Given the premises, I guess I'm willing to grant that this isn't a silly extrapolation, and absent them it seems like you basically agree with the post? 

However, I have a few notes on why I'd reject your premises.

On your first idea, I think high-fidelity biology simulators require so much understanding of biology that they are subsequent to ASI, rather than a replacement. And even then, you're still trying to find something by searching an exponential design space - which is nontrivial even for AGI with feasible amounts of "unlimited" compute.  Not only that, but the thing you're looking for needs to do a bunch of stuff that probably isn't feasible due to fundamental barriers (Not identical to the ones listed there, but closely related to them.)

On your second idea, a software-only singularity assumes that there is a giant compute overhang for some specific buildable general AI that doesn't even require specialized hardware. Maybe so, but I'm skeptical; the brain can't be simulated directly via Deep NNs, which is what current hardware is optimized for. And if some other hardware architecture using currently feasible levels of compute is devised, there still needs to be a massive build-out of these new chips - which then allows "enough compute has been manufactured that nanotech-level things can be developed." But that means you again assume that arbitrary nanotech is feasible, which could be true, but as the other link notes, certainly isn't anything like obvious.

Reply
Load More
Garden Onboarding
4y
(+28)
41No, We're Not Getting Meaningful Oversight of AI
6d
4
20The Fragility of Naive Dynamism
2mo
1
15Therapist in the Weights: Risks of Hyper-Introspection in Future AI Systems
3mo
1
9Grounded Ghosts in the Machine - Friston Blankets, Mirror Neurons, and the Quest for Cooperative AI
3mo
0
7Davidmanheim's Shortform
Ω
6mo
Ω
18
11Exploring Cooperation: The Path to Utopia
7mo
0
31Moderately Skeptical of "Risks of Mirror Biology"
7mo
3
17Most Minds are Irrational
Ω
7mo
Ω
4
9Refuting Searle’s wall, Putnam’s rock, and Johnson’s popcorn
7mo
31
16Mitigating Geomagnetic Storm and EMP Risks to the Electrical Grid (Shallow Dive)
8mo
4
Load More