Davidmanheim

Sequences

Modeling Transformative AI Risk (MTAIR)

Wikitag Contributions

Comments

Sorted by

Thanks - link fixed.

On your summary, that is not quite the main point. There are a few different points in the paper, and they aren't limited to frontier models. Overall, based on what we define, basically no one is doing oversight in a way that the paper would call sufficient, for almost any application of AI - if it's being done, it's not made clear how, what failure modes are being addressed, and what is done to mitigate the issues with different methods. As I said at the end of the post, if they are doing oversight, they should be able to explain how.

For frontier models, we can't even clearly list the failure modes we should be protecting against in a way that would let a human trying to watch the behavior be sure if something qualifies or not. And that's not even getting into the fact that there is no attempt to use human oversight - at best they are doing automated oversight of the vague set of things they want to model to refuse. But yes, as you pointed out, even their post-hoc reviews  as oversight are nontransparent, if they occur at all, and the remediation when they are shown egregious failures by the public, like sycophancy or deciding to be MechaHitler, is largely doing further ad-hoc adjustments.

I will point out that this post is not the explicit argument or discussion of the paper - I'd be happy to discuss the technical oversight issues as well, but here I wanted to make the broader point.

In the paper, we do make these points, but situate the argument in terms of the difference between oversight and control, which is important for building the legal and standards arguments for oversight. (The hope I have is that putting better standards and rules in place will reduce the ability of AI developers and deployers to unknowingly and/or irresponsibly claim oversight when it's not occurring, or may not actually be possible.)

It's very much a tradeoff, though. Loose deployment allows for credible commitments, but also makes human monitoring and verification harder, if not impossible.

Strongly agree. Fundamentally, as long as models don't have more direct access to the world, there are a variety of failure modes that are inescapable. But solving that creates huge new risks as well! (As discussed in my recent preprint; https://philpapers.org/rec/MANLMH )

The idea was also proposed in a post on LW a few weeks ago: https://www.lesswrong.com/posts/psqkwsKrKHCfkhrQx/making-deals-with-early-schemers

But we weren't talking about 254, we were talking about 222, so that it could / should be skin-safe, at least.

Yeah, I think Thomas was arguing the opposite direction, and he argued that you "underrate the capabilities of superintelligence," and I was responding to why that wasn't addressing the same scenario as your original post.

DavidmanheimΩ240

Flagging that I just found that Google Gemini also has this contamination: https://twitter.com/davidmanheim/status/1939597767082414295

The macroscopic biotech that accomplishes what you're positing is addressed in the first part, and the earlier comment where I note that you're assuming ASI level understanding of bio for exploring an exponential design space for something that isn't guaranteed to be possible. The difficulty isn't unclear, it's understood not to bebfeasible.

Given the premises, I guess I'm willing to grant that this isn't a silly extrapolation, and absent them it seems like you basically agree with the post? 

However, I have a few notes on why I'd reject your premises.

On your first idea, I think high-fidelity biology simulators require so much understanding of biology that they are subsequent to ASI, rather than a replacement. And even then, you're still trying to find something by searching an exponential design space - which is nontrivial even for AGI with feasible amounts of "unlimited" compute.  Not only that, but the thing you're looking for needs to do a bunch of stuff that probably isn't feasible due to fundamental barriers (Not identical to the ones listed there, but closely related to them.)

On your second idea, a software-only singularity assumes that there is a giant compute overhang for some specific buildable general AI that doesn't even require specialized hardware. Maybe so, but I'm skeptical; the brain can't be simulated directly via Deep NNs, which is what current hardware is optimized for. And if some other hardware architecture using currently feasible levels of compute is devised, there still needs to be a massive build-out of these new chips - which then allows "enough compute has been manufactured that nanotech-level things can be developed." But that means you again assume that arbitrary nanotech is feasible, which could be true, but as the other link notes, certainly isn't anything like obvious.

Load More