"It's a badly formulated question, likely to lead to confusion." Why? That's precisely what I'm denying.
"So, can you specify what this cluster is? Can you list the criteria by which a behaviour would be included in or excluded from this cluster? If you do this, you have defined blackmail."
That's precisely what I (Stuart really) am trying to do! I said so, you even quoted me saying so, and as I interpret him, Stuart said so too in the OP. I don't care about the word blackmail except as a means to an end; I'm trying to come up with criter...
Dunno what Username was thinking, but here's the answer I had in mind: "Why is it obvious? Because the Problem of Induction has not yet been solved."
You make it sound like those two things are mutually exclusive. They aren't. We are trying to define words so that we can understand and manipulate behavior.
"I don't know what blackmail is, but I want to make sure an AI doesn't do it." Yes, exactly, as long as you interpret it in the way I explained it above.* What's wrong with that? Isn't that exactly what the AI safety project is, in general? "I don't know what bad behaviors are, but I want to make sure the AI doesn't do them."
*"In other words there are a cluster of behaviors th...
"You want to understand and prevent some behaviors (in which case, start by tabooing culturally-dense words like "blackmail")"
In a sense, that's exactly what Stuart was doing all along. The whole point of this post was to come up with a rigorous definition of blackmail, i.e. to find a way to say what we wanted to say without using the word.
As I understand it, the idea is that we want to design an AI that is difficult or impossible to blackmail, but which makes a good trading partner.
In other words there are a cluster of behaviors that we do NOT want our AI to have, which seem blackmailish to us, and a cluster of behaviors that we DO want it to have, which seem tradeish to us. So we are now trying to draw a line in conceptual space between them so that we can figure out how to program an AI appropriately.
OH ok I get it now: "But clearly re-arranging terms doesn't change the expected utility, since that's just the sum of all terms." That's what I guess I have to deny. Or rather, I accept that (I agree that EU = infinity for both A and B) but I think that since A is better than B in every possible world, it's better than B simpliciter.
The reshuffling example you give is an example where A is not better than B in every possible world. That's the sort of example that I claim is not realistic, i.e. not the actual situation we find ourselves in. Why? W...
Again, thanks for this.
"The problem with your solution is that it's not complete in the formal sense: you can only say some things are better than other things if they strictly dominate them, but if neither strictly dominates the other you can't say anything."
As I said earlier, my solution is an argument that in every case there will be an action that strictly dominates all the others. (Or, weaker: that within the set of all hypotheses of probability less than some finite N, one action will strictly dominate all the others, and that this action w...
This was helpful, thanks!
As I understand it, you are proposing modifying the example so that on some H1 through HN, choosing A gives you less utility than choosing B, but then thereafter choosing B is better, because there is some cost you pay which is the same in each world.
It seems like the math tells us that any price would be worth it, that we should give up an unbounded amount of utility to choose A over B. I agree that this seems like the wrong answer. So I don't think whatever I'm proposing solves this problem.
But that's a different problem than the...
It's arbitrary, but that's OK in this context. If I can establish that this works when the ratio is 1 in a billion, or lower, then that's something, even if it doesn't work when the ratio is 1 in 10.
Especially since the whole point is to figure out what happens when all these numbers go to extremes--when the scenarios are extremely improbable, when the payoffs are extremely huge, etc. The cases where the probabilities are 1 in 10 (or arguably even 1 in a billion) are irrelevant.
Update: The conclusion of that article is that the expected utilities don't converge for any utility function that is bounded below by a computable, unbounded utility function. That might not actually be in conflict with the idea I'm grasping at here.
The idea I'm trying to get at here is that maybe even if EU doesn't converge in the sense of assigning a definite finite value to each action, maybe it nevertheless ranks each action as better or worse than the others, by a certain proportion.
Toy model:
The only hypotheses you consider are H1, H2, H3, ... etc. ...
True. So maybe this only works in the long run, once we have more than 30 bits to work with.
Yes, but I don't think that's relevant. Any use of complexity depends on the language you specify it in. If you object to what I've said here on those grounds, you have to throw out Solomonoff, Kolmogorov, etc.
Yes, I've read it, but not at the level of detail where I can engage with it. Since it is costly for me to learn the math necessary to figure this out for good, I figured I'd put the basic idea up for discussion first just in case there was something obvious I overlooked.
Edit: OK, now I think I understand it well enough to say how it interacts with what I've been thinking. See my other comment .
I disagree with your characterization of 0. You say that it is incompatible with physicalism, but that seems false. Indeed it seems to be a very mainstream physicalist view to say "I am a physical object--my brain. So a copy of me would have the same experiences, but it would not be me."
How do I do step II? I can't seem to find the relevant debates. I found one debate with the same title as the minimum wage one I argued about, but I don't see my argument appearing there.
Yep. "The melancholy of haruhi suzumiya" can be thought of as an example of something in the same reference class.
This is an interesting idea! Some thoughts:
Doesn't acausal trade (like all trade) depend on enforcement mechanisms? I can see how two AI's might engage in counterfactual trade, since they can simulate each other and see that they self-modify to uphold the agreement, but I don't think a human would be able to do it.
Also, I'd like to hear more about motivations for engaging in counterfactual trade. I get the multiverse one, though I think that's a straightforward case of acausal trade rather than a case of counterfactual trade, since you would be trading with a really existing entity in another universe. But can you explain the second motivation more?
The point you raise is by far the strongest argument I know of against the idea.
However, it is a moral objection rather than a decision-theory objection. It sounds like you agree with me on the decision theory component of the idea: that if we were anthropically selfish, it would be rational for us to commit to making ancestor-simulations with afterlives. That's an interesting result in itself, isn't it? Let's go tell Ayn Rand.
When it comes to the morality of the idea, I might end up agreeing with you. We'll see. I think there are several minor considerations in favor of the proposal, and then this one massive consideration against it. Perhaps I'll make a post on it soon.
This is a formal version of a real-life problem I've been thinking about lately.
Should we commit to creating ancestor-simulations in the future, where those ancestor-simulations will be granted a pleasant afterlife upon what appears to their neighbors to be death? If we do, then arguably we increase the likelihood that we ourselves have a pleasant afterlife to look forward to.
Thanks for the response. Yes, it depends on how much interaction I have with human beings and on the kind of people I interact with. I'm mostly interested in my own case, of course, and I interact with a fair number of fairly diverse, fairly intelligent human beings on a regular basis.
If you're a social butterfly who regularly talks with some of the smartest people in the world, the AI will probably struggle
Ah, but would it? I'm not so sure, that's why I made this post.
Yes, if everyone always said what I predicted, things would be obvious, but recall I...
Yes, but I'm not sure there is a difference between an AI directly puppeting them, and an AI designing a chatbot to run as a subroutine to puppet them, at least if the AI is willing to monitor the chatbot and change it as necessary. Do you think there is?
Also, it totally is a fruitful line of thinking. It is better to believe the awful truth than a horrible lie. At least according to my values. Besides, we haven't yet established that the truth would be awful in this case.
I'm surprised that it sounded that way to you. I've amended my original post to clarify.
Yes, this is the sort of consideration I had in mind. I'm glad the discussion is heading in this direction. Do you think the answer to my question hinges on those details though? I doubt it.
Perhaps if I was extraordinarily unsuspicious, chatbots of not much more sophistication than modern-day ones could convince me. But I think it is pretty clear that we will need more sophisticated chatbots to convince most people.
My question is, how much more sophisticated would they need to be? Specifically, would they need to be so much more sophisticated that they wou...
That's exactly what I had in mind, although I did specify that the controller would never simulate anybody besides me to the level required to make them people.
In the picture you just drew, the ideal being is derived from a series of better beings, thus it is (trivially) easier to imagine a better being than to imagine an ideal being.
I see it differently: The ideal being maximizes all good qualities, whereas imperfect beings have differing levels of the various good qualities. Thus to compare a non-ideal being to an ideal being, we only need to recognize how the ideal being does better than the non-ideal being in each good quality. But to compare two non-ideal beings, we need to evaluate trade-offs between their ...
Ok, thanks.
I also don't see any reason to go from "the FAI doesn't care about identity* to "I shouldn't think identity exists."
I don't either, now that I think about it. What motivated me to make this post is that I realized that I had been making that leap, thanks to applying the heuristic. We both agree the heuristic is bad.
Why are we talking about a bad heuristic? Well, my past self would have benefited from reading this post, so perhaps other people would as well. Also, I wanted to explore the space of applications of this heuristic, to see if I had been unconsciously applying it in other cases without realizing it. Talking with you has helped me with that.
Hmm, okay. I'd be interested to hear your thoughts on the particular cases then. Are there any examples that you would endorse?
The fact that your post was upvoted so much makes me take it seriously; I want to understand it better. Currently I see your post as merely a general skeptical worry. Sure, maybe we should never be very confident in our FAI-predictions, but to the extent that we are confident, we can allow that confidence to influence our other beliefs and decisions, and we should be confident in some things to some extent at least (the alternative, complete and paralyzing skepticism, is absurd) Could you explain more what you meant, or explain what you think my mistake is in the above reasoning?
It is easier to determine whether you are doing "better" than your current self than it is to determine how well you line up with a perceived ideal being.
Really? That doesn't seem obvious to me. Could you justify that claim?
Thanks!
But if the UFAI can't parlay that takes out much of the fun, and much of the realism too.
Also, if Hard Mode has no FAI tech at all, then no one will research AI on Hard Mode and it will just devolve into a normal strategy game.
Edit: You know, this proposal could probably be easily implemented as a mod for an existing RTS or 4X game. For example, imagine a Civilization mod that added the "AI" tech that allowed you to build a "Boxed AI" structure in your cities. This quadruples the science and espionage production of your city, at ...
There are at least two distinct senses in which consciousness can be binary. The first sense is the kind you are probably thinking about: the range between e.g. insects, dogs, and humans, or maybe between early and late-stage Alzheimers.
The second sense is the kind that your interlocutors are (I surmise) thinking about. Imagine this: A being that is functionally exactly like you, and that is experiencing exactly what you are experiencing, except that it is experiencing everything "only half as much." It still behaves the same way as you, and it s...
Is the claustrum located in the pineal gland? ;)
At the moment, in order for a Creator/FAI team to win (assuming you're sticking with Diplomacy mechanics) they first >have to collect 18 supply centres between them and then have the AI transfer all its control back to the human; I don't >think even the friendliest of AIs would willingly rebox itself like that.
This is exactly what I had in mind. :) It should be harder for FAI to win than for UFAI to win, since FAI are more constrained. I think it is quite plausible that one of the safety measures people would try to implement in a FAI is "Wha...
This would be the ideal. Like I said though, I don't think I'll be able to make it anytime soon, or (honestly) anytime ever.
But yeah, I'm trying to design it to be simple enough to play in-browser or as an app, perhaps even as a Facebook game or something. It doesn't need to have good graphics or a detailed physics simulator, for example: It is essentially a board game in a computer, like Diplomacy or Risk. (Though it is more complicated than any board game could be)
I think that the game, as currently designed, would be an excellent source of fictional evidence for the notions of AI risk and AI arms races. Those notions are pretty important. :)
(utility monsters are awful, for some reason, even though by assumption they generate huge amounts of utility, oh dear!)
Utility monsters are awful, possibly for no reason whatsoever. That's OK. Value is complex. Some things are just bad, not because they entail any bad thing but just because they themselves are bad.
That's not something the average person will think upon hearing the term, especially since "AGI" tends to connote something very intelligent. I don't think it is a strong reason not to use it.
It is nice to see people thinking about this stuff. Keep it up, and keep us posted!
Have you read the philosopher Derek Parfit? He is famous for arguing for pretty much exactly what you propose here, I think.
Doubt: Doesn’t this imply that anthropic probabilities depend on how big a boundary the mind draws around stuff it considers “I”? Self: Yes. Doubt: This seems to render probability useless.
I agree with Doubt. If can make it 100% probable that I'll get superpowers tomorrow merely by convincing myself that only superpowered future-versions of me cou...
No, the analogy I had in mind was this:
What People Saw: Acupuncture* being correlated with health, and [building things according to theories developed using the scientific method] being correlated with [having things that work very well]
What People Thought Happened: Acupuncture causing health and [building things according to theories developed using the scientific method] causing [having things that work very well]
What Actually Happened: Placebo effect and Placebo effect (in the former case, involving whatever mechanisms we think cause the placebo effect...
I didn't mean to imply that the placebo effect is a complete mystery. As you say, perhaps it is pretty well understood. But that doesn't touch my overall point which is that before modern medicine (and modern explanations for the placebo effect) people would have had plenty of evidence that e.g. faith healing worked, and that therefore spirits/gods/etc. existed.
Similarly, modern theories about how to discover the habits of God in governing Creation (the Laws of Nature) are pretty sound as well. Or so theists say.
A better example than Amiens Cathedral would be the Placebo Effect. For most of human history, people with access to lots of data (but no notion of the Placebo Effect) had every reason to believe that e.g. witch doctors, faith healing, etc. was all correct.
Warning: Rampant speculation about a theory of low probability: Consider the corresponding theory about science. Maybe there is a Placebo Effect going ...
If we are in a time loop we won't be trying to escape it, but rather exploit it.
For example: Suppose I find out that the entire local universe-bubble is in a time loop, and there is a way to build a spaceship that will survive the big crunch in time for the next big bang. Or something like that.
Well, I go to my backyard and start digging, and sure enough I find a spaceship complete with cryo-chambers. I get in, wait till the end of the universe, and then after the big bang starts again I get out and seed the Earth with life. I go on to create a wonderful c...
Thanks for the info. Hmm. What do you mean by "There is no entering or exiting the loop?" Could the loop be big enough to contain us already?
I'm not concerned about traveling backwards in time to change the past; I just want to travel backwards in time. In fact, I hope that I wouldn't be able to change the past. Consistency of that sort can be massively exploited.
That time-traveling universe is interesting. Physics question: Is it at all possible, never mind how likely, that our own universe contains closed timelike curves? What about closed timelike curves that we can feasibly exploit?
Something about the name-dropping and phrasing in the "super-committee" line is off-putting. I'm not sure how to fix it, though.
Agreed. Maybe it is because it feels like you are talking down to us with the name-dropping? Perhaps this should be tested with people who are unfamiliar with LW and AI-related ideas, to see if they have the same reaction.
Yep, it means the same thing, or close enough. Of course there are measurement problems, but the intent behind the pay is for it to reward rational thinking in the usual sense.
Yeah. I should think about how to get around this, and glean useful information from their expertise.
The original AI will have a head start over all the other AI's, and it will probably be controlled by a powerful organization. So if its controllers give it real power soon, they will be able to give it enough power quickly enough that it can stop all the other AI's before they get too strong. If they do not give it real power soon, then shortly after there will be a war between the various new AI's being built around the world with different utility functions.
The original AI can argue convincingly that this war will be a worse outcome than letting it take...
I would expect that to lead to the creation of AIs with a similar codebase but more or less tweaked utility functions
That's the point.
In the space of possible futures, it is much better than e.g. tiling the universe with orgasmium. So much better, in fact, that in the grand scheme of things it counts as OK.
Sweet. I too will write something not about coronavirus.