Eliezer_Yudkowsky comments on Prices or Bindings? - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (42)
The problem is that this is phrased as an injunction over positive consequneces. Deontology does better when it's closer to the action level and negative rather than positive.
Imagine trying to give this injunction to an AI. Then it would have to do anything that it thought would prevent the destruction of the world, without other considerations. Doesn't sound like a good idea.
If all I want is money, then I will one-box on Newcomb's Problem. I don't think that's quite the same as being a Kantian, but it does reflect the idea that similar decision algorithms in similar epistemic states will tend to produce similar outputs.
The whole point here is that "personal integrity" doesn't have to be about being a virtuous person. It can be about trying to save the world without any concern for your own virtue. It can be the sort of thing you'd want a pure nonsentient decision agent to do.
Your rationality is the sum of your full abilities, all components, including your wisdom about what you refrain from doing in the presence of what seem like good reasons.
So, I realize this is really old, but it helped trip the threshold for this idea I'm rolling between my palms.
Do we suspect that a proper AI would interpret "avoid destroying the world" as something like
avoid(prevent self from being cause of) destroying(analysis indicates destruction threshold ~= 10% landmass remaining habitable, etc.) the world(interpret as earth, human society...)
(like a modestly intelligent genie)
or do we have reason to suspect that it would hash out the phrase to something more like how a human would read it (given that it's speaking english which it learned from humans)?
This idea isn't quite fully formed yet, but I think there might be something to it.