jimrandomh comments on New(ish) AI control ideas - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (14)
I'm a fan of "defense in depth" strategies. Everything I've seen or managed come up with so far has some valid objection that means it might not be sufficient or workable, for some subset of AGIs and circumstances. But if you come up with enough tricks that each have some chance to work, and these tricks are compatible and their failures uncorrelated, then you've made progress.
I'd be very worried about it if an AGI launched that didn't have a well-specified utility function, transparent operation, and the best feasible attempt at a correctness proof. But even given all these things, I'd still want to see lots of safety-oriented tricks layered on top.