Eugene comments on The genie knows, but doesn't care - Less Wrong

54 Post author: RobbBB 06 September 2013 06:42AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (515)

You are viewing a single comment's thread. Show more comments above.

Comment author: CoffeeStain 07 September 2013 09:29:41PM *  12 points [-]

Instead of friendliness, could we not code, solve, or at the very least seed boxedness?

It is clear that any AI strong enough to solve friendliness would already be using that power in unpredictably dangerous ways, in order to provide the computational power to solve it. But is it clear that this amount of computational power could not fit within, say, a one kilometer-cube box outside the campus of MIT?

Boxedness is obviously a hard problem, but it seems to me at least as easy as metaethical friendliness. The ability to modify a wide range of complex environments seems instrumental in an evolution into superintelligence, but it's not obvious that this necessitates the modification of environments outside the box. Being able to globally optimize the universe for intelligence involves fewer (zero) constraints than would exist with a boxedness seed, but the only question is whether or not this constraint is so constricting as to preclude superintelligence, which it's not clear to me that it is.

It seems to me that there is value in finding the minimally-restrictive safety-seed in AGI research. If any restriction removes some non-negligible ability to globally optimize for intelligence, the AIs of FAI researchers will be necessarily at a disadvantage to all other AGIs in production. And having more flexible restrictions increases the chance than any given research group will apply the restriction in their own research.

If we believe that there is a large chance that all of our efforts at friendliness will be futile, and that the world will create a dominant UFAI despite our pleas, then we should be adopting a consequentialist attitude toward our FAI efforts. If our goal is to make sure that an imprudent AI research team feels as much intellectual guilt as possible over not listening to our risk-safety pleas, we should be as restrictive as possible. If our goal is to inch the likelihood that an imprudent AI team creates a dominant UFAI, we might work to place our pleas at the intersection of restrictive, communicable, and simple.

Comment author: Eugene 11 October 2013 07:50:53PM *  0 points [-]

A slightly bigger "large risk" than Pentashagon puts forward is that a provably boxed UFAI could indifferently give us information that results in yet another UFAI, just as unpredictable as itself (statistically speaking, it's going to give us more unhelpful information than helpful, as Robb point out). Keep in mind I'm extrapolating here. At first you'd just be asking for mundane things like better transportation, cures for diseases, etc. If the UFAI's mind is strange enough, and we're lucky enough, then some of these things result in beneficial outcomes, politically motivating humans to continue asking it for things. Eventually we're going to escalate to asking for a better AI, at which point we'll get a crap-shoot.

An even bigger risk than that -though - is that if it's especially Unfriendly, it may even do this intentionally, going so far as to pretend it's friendly while bestowing us with data to make an AI even more Unfriendly AI than itself. So what do we do, box that AI as well, when it could potentially be even more devious than the one that already convinced us to make this one? Is it just boxes, all the way down? (spoilers: it isn't, because we shouldn't be taking any advice from boxed AIs in the first place)

The only use of a boxed AI is to verify that, yes, the programming path you went down is the wrong one, and resulted in an AI that was indifferent to our existence (and therefore has no incentive to hide its motives from us). Any positive outcome would be no better than an outcome where the AI was specifically Evil, because if we can't tell the difference in the code prior to turning it on, we certainly wouldn't be able to tell the difference afterward.