All of Yasuo's Comments + Replies

So, when people pick chocolate, it illustrates that that's what they truly desire, and when they pick vanilla, it just means that they're confused and really they like chocolate but they don't know it.

Acting based on the feelings one will experience is something that already happens, so optimizing for it is sensible

I can't really pick apart your logic here, because there isn't any. This is like saying "buying cheese is something that already happens, so optimizing for it is sensible"

0[anonymous]
Not really. Let me try to clarify what I meant. We already know that rewards and punishments influence our actions. Any utopia would try to satisfy them. Even in a complex optimized universe full of un-wireheaded sentients caring about external referents, people would want to avoid pain, ... and experience lots of excitement, ... . Wireheading just says, that's all humans care about, so there's no need for all these constraints, let's pick the obvious shortcut. In support of this view, I gave the example of the entertainment industry that optimizes said experiences, but is completely fake (and trying to become more fake) and how many humans react positively to that. They don't complain that there's something missing, but rather enjoy those improved experiences more than the existent externally referenced alternatives. Also, take the reversed experience machine, in which the majority of students asked would stay plugged in. If they had complex preferences as typically cited against wireheading, wouldn't they have immediately rejected it? An expected paperclip maximizer would have left the machine right away. It can't build any paperclips there, so the machine has no value to it. But the reversed experience machine seems to have plenty of value for humans. This is essentially an outside view argument against complex preferences. What's the evidence that they actually exist? That people care about reality, about referents, all that? When presented with options that don't fulfill any of this, lots of people still seem to choose them.

I like marginal revolution, if only because the comments section will usually yell at them when they post something stupid.

0nazgulnarsil
more generally just the blogs of economists.

Overall, it sounds to me like people are confusing their feelings about (predicted) states of the world with caring about states directly.

But aren't you just setting up a system that values states of the world based on the feelings they contain? How does that make any more sense?

You're arguing as though neurological reward maximization is the obvious goal to fall back to if other goals aren't specified coherently. But people have filled in that blank with all sorts of things. "Nothing matters, so let's do X" goes in all sorts of zany directions.

2[anonymous]
I'm not. My thought process isn't "there aren't any real values, so let's go with rewards"; it's not intended as a hack to fix value nihilism. Rewards already do matter. It describes people's behavior well (see PCT) and makes introspective sense. I can actually feel projected and real rewards come up and how decisions arise based on that. I don't know how "I value that there are many sentients" or any other external referent could come up. It would still be judged on the emotional reaction it causes (but not always in a fully conscious manner). I think I can imagine agents that actually care about external referents and that wouldn't wirehead. I just don't think humans are such agents and I don't see evidence to the contrary. For example, many humans have no problem with "fake" experiences, like "railroaded, specifically crafted puzzles to stimulate learning" (e.g. Portal 2), "insights that feel profound, but don't mean anything" (e.g. entheogens) and so on. Pretty much the whole entertainment industry could be called wireheading lite. Acting based on the feelings one will experience is something that already happens, so optimizing for it is sensible. (Not-wireheaded utopias would also optimize them after all, just not only them.) A major problem I see with acting based on propositions about the world outside one's mind is that it would assign different value to states that one can't experimentally distinguish (successful mindless wallpaper vs. actual sentients, any decision after being memory-wiped, etc.). I can always tell if I'm wireheaded, however. I'd invoke Occam's Razor here and ignore any proposal that generates no anticipated experiences.

2:30 is a good time to go to the dentist.

1Jonathan_Graehl
The dentist is not yet completely exhausted, and you've eaten lunch. Traffic isn't too bad yet, either. A shining moment.

I would. I'd want to do some shorter test runs first though, to get used to the idea, and I'd want to be sure I was in a good mood for the main reset point.

It would probably be good to find a candidate who was enlightened in the buddhist sense, not only because they'd be generally calmer and more stable, but specifically because enlightenment involves confronting the incoherent naïve concept of self and understanding the nature of impermanence. From the enlightened perspective, the peculiar topology of the resetting subjective experience would not be a source of anxiety.

1[anonymous]
...I don't get it. I looked it up, and still don't get it. Explain please? EDIT: I'm a massive idiot for not getting it. Dammit. I'm only leaving this here because it would be dishonest to delete it. This one goes in the "for when I'm feeling arrogant" folder.

Q: Is it important to figure out how to make AI provably friendly to us and our values (non-dangerous), before attempting to solve artificial general intelligence?

Stan Franklin: Proofs occur only in mathematics.

This seems like a good point, and something that's been kind of bugging me for a while. It seems like "proving" an AI design will be friendly is like proving a system of government won't lead to the economy going bad. I don't understand how it's supposed to be possible.

I can understand how you can prove a hello world program will print ... (read more)

1XiXiDu
That doesn't sound to be impossible. Consider that in the case of a seed AI, the "government" only has to deal with one perfectly rational game theoretic textbook agent. The only reason that economists fail to predict how certain policies will affect the economy is that their models often have to deal with a lot of unknown, or unpredictable factors. In the case of an AI, the policy is applied to the model itself, which is a well-defined mathematical entity.
9jimrandomh
Proving "friendliness" may well be impossible to define, but there are narrower desirable properties that can be proven. You could prove that it optimizes correctly on special cases that are simpler than the world as a whole; you can prove that it doesn't have certain classes of security holes; you can prove that it's resilient against single-bit errors. With a more detailed understanding of metaethics, we might prove that it aggregates values in a way that's stable in spite of outliers, and that its debug output about the values it's discovered is accurate. Basically, we should prove as much as we can, even if there are some parts that aren't amenable to formal proof.
3Kaj_Sotala
I've been under the impression that "Friendliness proofs" aren't about proving Friendliness as such. Rather, they're proofs that whatever is set as the AI's goal function will always be preserved by the AI as its goal function, no matter how much self-improvement it goes through.

Can you give some examples of the problem?

There are no known structures in conway's game of life that are robust. Even eaters, which are used to soak up excess gliders, only work when struck from specific directions.

If you had a life board which was extremely sparsely populated, it's possible that a clever agent could send out salvos of gliders and other spaceships in all directions, in configurations that would stop incoming projectiles, inform it about the location of debris, and gradually remove that debris so that it would be safe to expand.

At a 50% density, the agent would need to start with ... (read more)

0snarles
Great post. The physics of GoL determine the technology available for an intelligent life form in the universe, and the limit of that technology may not be sufficient to ensure the eternal survival of that lifeform. But if that is the case in the GoL universe, the same might not be true in our universe.