Surely if there is something I will give up pleasure for, which I do not experience as pleasurable, that's strong evidence that it is an example of 1 and not 2?
I'm not sure even what th first steps would be in making an estimate for that.
Correlating recorded disease rates with recorded horses per capita would be a place to start, though of course there are many confounding factors.
This needn't be ironic. If I'm willing to die to give my beneficiary a comfortable living, this might be a viable strategy.
It is not clear to me that talking to a human is simpler than interacting with a copy of itself.
I agree that if talking to a human is simpler, it would probably do that first.
I agree that what it would learn by this process is general game theory, and not specific facts about humans.
It is not clear to me that sufficient game-theoretical knowledge, coupled with the minimal set of information about humans required to have a conversation with one at all, is insufficient to effectively deceive a human.
It is not clear to me that, even if it does "stumble," humans will respond as you describe.
It is not clear to me that a system capable of having a meaningful conversation with a human will necessarily have a stack trace that is subject to the kind of analysis you imply here. It is not even clear to me that the capacity for such a stack trace is likely, depending on what architectures turn out to work best for implementing AI.
But, sure, I could be wrong about all of that. And if I'm wrong, and you're right, then a system like you describe will be reliably incapable of fooling a human observer.
Agreed that an actual concrete plan would be a valuable thing, for the reasons you list among others.
Ah! That makes sense. I know of no way to move it... sorry.
Wearing a beard works, too.
"Everything which Yudkowsky ever said" would also denote a set of ideas, after all.
Albeit an internally inconsistent set, given that Yudkowsky has occasionally changed his mind about things.
There probably exists (hypothetically) some plan such that it wouldn't seem unreasonable to me to declare anyone who doesn't endorse that plan either insufficiently well-informed or insufficiently intelligent.
In fact, there probably exist several such plans, many of which would have results I would subsequently regret, and some of which do not.
View more: Next


Subscribe to RSS Feed
I'm not sure how I could ever be sure of such a thing, but it certainly seems implausible to me.