"If the real Chantiel is so correlated with you that they will do what you will do, then you should believe you're real so that the real Chantiel will believe they are real, too. This holds even if you aren't real."
By "real", do you mean non-simulated? Are you saying that even if 99% of Chantiels in the universe are in simulations, then I should still believe I'm not in one? I don't know how I could convince myself of being "real" if 99% of Chantiels aren't.
Do you perhaps mean I should act as if I were non-simulated, rather than literally being non-simulated?
Thanks for the response, Gwern.
he is explicit that the minds in the simulation may be only tenuously related to 'real'/historical minds;
Oh, I guess I missed this. Do you know where Bostrom said the "simulations" can only tenuously related to real minds? I was rereading the paper but didn't see mention of this. I'm just surprised, because normally I don't think zoo-like things would be considered simulations.
...This falls under either #1 or #2, since you don't say what human capabilities are in the zoo or explain how exactly this zoo situation matters to
I've realized I'm somewhat skeptical of the simulation argument.
The simulation argument proposed by Bostrom argued, roughly, that either almost exactly all Earth-like worlds don't reach a posthuman level, almost exactly all such civilizations don't go on to build many simulations, or that we're almost certainly in a simulation.
Now, if we knew that the only two sorts of creatures that experience what we experience are either in simulations or the actual, original, non-simulated Earth, then I can see why the argument would be reasonable. However, I don't kno...
...For robustness, you have a dataset that's drawn from the wrong distribution, and you need to act in a way that you would've acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won't matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn't automatically make sense, comparing models by usefulness doesn't
I've been thinking about what you've said about iterated amplification, and there are some things I'm unsure of. I'm still rather skeptical of the benefit of iterated amplification, so I'd really appreciate a response.
You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I'm wondering about. The first is that it seems to me that, for a wide range of situations, you need a ge...
I hadn't fully appreciated to difficultly that could result from AIs having alien concepts, so thanks for bringing it up.
However, it seems to me that this would not be a big problem, provided the AI is still interpretable. I'll provide two ways to handle this.
For one, you could potentially translate the human concepts you care about into statements using the AI's concepts. Even if the AI doesn't use the same concepts people do, AIs are still incentivized to form a detailed model of the world. If you can have access to all the AI's world model, but still ca...
Another problem is that the system cannot represent and communicate the whole predicted future history of the universe to us.
This is a good point and one that I, foolishly, hadn't considered.
However, it seems to me that there is a way to get around this. Specifically, just provide the query-answerers the option to refuse to evaluate the utility of a description of a possible future. If this happens, the AI won't be able to have its utility function return a value for such a possible future.
To see how to do this, note that if a description of a possible ...
Sorry for taking a ridiculously long time to get back to you. I was dealing with some stuff.
This works great when you can recognize good things within the represention the AI uses to think about the world. But what if that's not true?
Yes, that is correct. As I said in the article, a high degree of interpretability is necessary to use the idea.
It's true that interpretability is required, but the key point of my scheme is this: interpretability is all you need for intent alignment, provided my scheme is correct. I don't know of any other alignment strate...
I've made a few posts that seemed to contain potentially valuable ideas related to AI safety. However, I got almost no feedback on them, so I was hoping some people could look at them and tell me what they think. They still seem valid to me, and if they are, they could potentially be very valuable contributions. And if they aren't valid, then I think knowing the reason for this could potentially help me a lot in my future efforts towards contributing to AI safety.
The posts are:
...
FWIW, this conclusion is not clear to me. To return to one of my original points: I don't think you can dodge this objection by arguing from potentially idiosyncratic preferences, even perfectly reasonable ones; rather, you need it to be the case that no rational agent could have different preferences. Either that, or you need to be willing to override otherwise rational individual preferences when making interpersonal tradeoffs.
Yes, that's correct. It's possible that there are some agents with consistent preferences that really would wish to get extra...
If the impact measure was poorly implemented, then I think such an impact-reducing AI could indeed result in the world turning out that way. However, note that the technique in the paper is intended to, for a very wide range of variables, make the world if the AI wasn't turned on as similar as possible to what it would be like if it was turned on. So, you can potentially avoid the AI-controlled-drone scenario by including the variable "number of AI-controlled drones in the world" or something correlated with it, as these variables could be have quite diffe...
I have some concerns about an impact measure proposed here. I'm interested on working on impact measures, and these seem like very serious concerns to me, so it would be helpful seeing what others think about them. I asked Stuart, one of the authors, about these concerns, but he said it was too busy to work on dealing with them.
First, I'll give a basic description of the impact measure. Have your AI be turned on from some sort of stochastic process that may or may not result in the AI being turned on. For example, consider sending a photo through a semi-si...
I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.
Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a...
I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.
Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a...
Also, in addition to my previous response, I want to note that the issues with unbounded satisfaction measures are not unique to my infinite ethical system. Instead, they are common potential problems with a wide variety of aggregate consequentialist theories.
For example, imagine suppose your a classical utilitarianism with an unbounded utility measure per person. And suppose you know that the universe is finite will consist of a single inhabitant with a utility whose probability distributions follows a Cauchy distribution. Then your expected utilities are...
...Thanks. I've toyed with similar ideas perviously myself. The advantage, if this sort of thing works, is that it conveniently avoids a major issue with preference-based measures: that they're not unique and therefore incomparable across individuals. However, this method seems fragile in relying on a finite number of scenarios: doesn't it break if it's possible to imagine something worse than whatever the currently worst scenario is? (E.g. just keep adding 50 more years of torture.) While this might be a reasonable approximation in some circumstances, it do
For the record, according to my intuitions, average consequentialism seems perfectly fine to me in a finite universe.
That said, if you don't like using average consequentialism in a finite case, I don't personally see what's wrong with just having a somewhat different ethical system for finite cases. I know it seems ad-hoc, but I think there really is an important distinction between finite and infinite scenarios. Specifically, people have the moral intuition that larger numbers of satisfied lives are more valuable than smaller numbers of them, which avera...
In P(old probability of being in first group) * 1 = (P(old probability of being in first group) + $\epsilon) * u the epsilon is smaller than any real number and there is no real small enough that it could characterise the difference between 1 and u.
Could you explain why you think so? I had already explained why would be real, so I'm wondering if you had an issue with my reasoning. To quote my past self:
...Remember that if you decide to take a certain action, that implies that other agents who are sufficiently similar to you and in sufficiently similar
It's possible that (a) is true, and much of your response seems like it's probably (?) targeted at that claim, but FWIW, I don't think this case can be convincingly made by appealing to contingent personal values: e.g. suggesting that another 50 years of torture wouldn't much matter to you personally won't escape the objection, as long as there's a possible agent who would view their life-satisfaction as being materially reduced in the same circumstances.
To some extent, whether or not life satisfaction is bounded just comes down to how you want to measu...
Thanks for the response.
Third, the average view prefers arbitrarily small populations over very large populations, as long as the average wellbeing was higher. For example, a world with a single, extremely happy individual would be favored to a world with ten billion people, all of whom are extremely happy but just ever-so-slightly less happy than that single person.
In an infinite universe, there's already infinitely-many people, so I don't think this applies to my infinite ethical system.
...First, consider a world inhabited by a single person enduring
Under my eror model you run into trouble when you treat any transfininte amount the same. From that perspective recognising two transfinite amounts that could be different is progress.
I guess this is the part I don't really understand. My infinite ethical system doesn't even think about transfinite quantities. It only considers the prior probability over ending up in situations, which is always real-valued. I'm not saying you're wrong, of course, but I still can't see any clear problem.
...Another attempt to throw a situation you might not be able to hand
My point was more that, even if you can calculate the expectation, standard versions of average utilitarianism are usually rejected for non-infinitarian reasons (e.g. the repugnant conclusion) that seem like they would plausibly carry over to this proposal as well.
If I understand correctly, average utilitarianism isn't rejected due to the repugnant conclusion. In fact, it's the opposite: the repugnant conclusion is a problem for total utilitarianism, and average utilitarianism is one way to avoid the problem. I'm just going off what I read on The Stanfo...
Oh, I'm sorry; you're right. I messed up on step two of my proposed proof that your technique would be vulnerable to the same problem.
However, it still seems to me that agents using your technique would also be concerning likely to fail to cross, or otherwise suffer from other problems. Like last time, suppose and that . So if the agent decides to cross, it's either because of the chicken rule, because not crossing counterfactually results in utility -10, or because crossing counterfactually results in utility greater than -10...
Thanks for clearing some things up. There are still some things I don't follow, though.
You said my system would be ambivalent between between sand and insult. I just wanted to make sure I understand what you're saying here. Is insult specifically throwing sand at the same people that get it thrown at in dust, and get the sand amount of sand thrown at them at the same throwing speed? If so, then it seems to me that my system would clearly prefer sand to insult. This is because there in some non-zero chance of an agent, conditioning only on being in this uni...
The fact that it's lavishly uncomputable is a problem for using it in practice, of course :-).
Yep. To be fair, though, I suspect any ethical system that respects agents' arbitrary preferences would also be incomputable. As a silly example, consider an agent whose terminal values are, "If Turing machine T halts, I want nothing more than to jump up and down. However, if it doesn't halt, then it is of the utmost importance to me that I never jump up and down and instead sit down and frown." Then any ethical system that cares about those preferences is inco...
If we define "bad reasoning" as "crossing when there is a proof that crossing is bad" in general, this begs the question of how to evaluate actions. Of course the troll will punish counterfactual reasoning which doesn't line up with this principle, in that case. The only surprising thing in the proof, then, is that the troll also punishes reasoners whose counterfactuals respect proofs (EG, EDT).
I'm concerned that may not realize that your own current take on counterfactuals respects logical to some extent, and that, if I'm reasoning correctly, could res...
So let's try again. The key thing in your system is not a program that outputs a hypothetical being's stream of experiences, it's a program that outputs a complete description of a (possibly infinite) universe and also an unambiguous specification of a particular experience-subject within that universe. This is only possible if there are at most countably many experience-subjects in said universe, but that's probably OK.
That's closer to what I meant. By "experience-subject", I think you mean a specific agent at a specific time. If so, my system doesn't ...
The integactions are all supposed to be negative in peace, punch, dust, insult. The surprising thing to me would be that the system would be ambivalent between sand and insult being a bad idea. If we don't necceasrily prefer D to C when helping does it matter if we torture our people a lot or a little as its going to get infinity saturated anyway.
Could you explain what insult is supposed to do? You didn't say what in the previous comment. Does it causally hurt infinitely-many people?
Anyways, it seems to me that my system would not be ambivalent about wh...
Thanks for responding. As I said, the measure of satisfaction is bounded. And all bounded random variables have a well-defined expected value. Source: Stack Exchange.
Oh, I'm sorry; I misunderstood you. When you said the average of utilities, I thought you meant the utility averaged among all the different agents in the world. Instead, it's just, roughly, an average among probability density function of utility. I say roughly because I guess integration isn't exactly an average.
RE: scenario one:
All these worlds come out exactly the same, so "infinitely many happy, one unhappy" is indistinguishable from "infinitely many unhappy, one happy"
It's not clear to me how they are indistinguishable. As long as the agent that's unhappy can have itself and its circumstances described with a finite description length, then it would have non-zero probability of an agent ending up as that one. Thus, making the agent unhappy would decrease the moral value of the world.
I'm not sure what would happen if the single unhappy agent has infinite co...
By one logic because we prefer B to A then if we "acausalize" this we should still preserve this preference (because "the amount of copies granted" would seem to be even handed), so we would expect to prefer D to C. However in a system where all infinites are of equal size then C=D and we become ambivalent between the options.
We shouldn't necessarily prefer D to C. Remember that one of the main things you can do to increase the moral value of the universe is to try to causally help other creatures so that other people who are in sufficiently similar cir...
I'll begin at the end: What is "the expected value of utility" if it isn't an average of utilities?
I'm just using the regular notion of expected value. That is, let P(u) be the probability density you get utility u. Then, the expected value of utility is , where uses Lebesgue integration for greater generality. Above, I take utility to be in .
Also note that my system cares about a measure of satisfaction, rather than specifically utility. In this case, just replace P(u) to be that measure of life satisfaction instead of a utility.
Als...
Post is pretty long winded,a bit wall fo texty in a lot of text which seems like fixed amount of content while being very claimy and less showy about the properties.
Yeah, I see what you mean. I have a hard time balancing between being succinct and providing sufficient support and detail. It actually used to be shorter, but I lengthened it to address concerns brought up a review.
...My suspicion is that the acausal impact ends up being infinidesimal anyway. Even if one would get finite probability impact for probabilties concerning a infinite universe for
...Of course you can make moral decisions without going through such calculations. We all do that all the time. But the whole issue with infinite ethics -- the thing that a purported system for handling infinite ethics needs to deal with -- is that the usual ways of formalizing moral decision processes produce ill-defined results in many imaginable infinite universes. So when you propose a system of infinite ethics and I say "look, it produces ill-defined results in many imaginable infinite universes", you don't get to just say "bah, who cares about the deta
You say, "There must be some reasonable way to calculate this."
(where "this" is Pr(I'm satisfied | I'm some being in such-and-such a universe)) Why must there be? I agree that it would be nice if there were, of course, but there is no guarantee that what we find nice matches how the world actually is.
To use probability theory to form accurate beliefs, we need a prior. I didn't think this was controversial. And if you have a prior, as far as I can tell, you can then compute Pr(I'm satisfied | I'm some being in such-and-such a universe) by simply updat...
Kind of hard to ge a handle.
Are you referring to it being hard to understand? If so, I appreciate the feedback and am interested in the specifics what is difficult to understand. Clarity is a top priority for me.
If I have a choice of (finitely) helping a single human and I believe there to be infinite humans then the probability of a human being helped in my world will nudge less than a real number. And if we want to stick with probabilties being real then the rounding will make infinitarian paralysis.
You are correct that a single human would have 0...
(Assuming you're read my other response you this comment):
I think it might help if I give a more general explanation of how my moral system can be used to determine what to do. This is mostly taken from the article, but it's important enough that I think it should be restated.
Suppose you're considering taking some action that would benefit our world or future life cone. You want to see what my ethical system recommends.
Well, for almost possible circumstances an agent could end up in in this universe, I think your action would have effectively no causal or ...
How is it a distribution over possible agents in possible universes (plural) when the idea is to give a way of assessing the merit of one possible universe?
I do think JBlack understands the idea of my ethical system and is using it appropriately.
my system provides a method of evaluating the moral value of a specific universe. The point of moral agents to to try to make the universe one that scores highlly on this moral valuation. But we don't know exactly what universe we're in, so to make decisions, we need to consider all universes we could be in, and...
How does that cash out if not in terms of picking a random agent, or random circumstances in the universe? So, remember, the moral value of the universe according to my ethical system depends on P(I'll be satisfied | I'm some creature in this universe).
There must be some reasonable way to calculate this. And one that doesn't rely on impossibly taking a uniform sample from a set that has none. Now, we haven't fully formalized reasoning and priors yet. But there is some reasonable prior probability distribution over situations you could end up in. And aft...
Thank you for responding. I actually had someone else bring up the same way in a review; maybe I should have addressed this in the article.
The average life satisfaction is undefined in a universe with infinitely-many agents of varying life-satisfaction. Thus a moral system using it suffers from infinitarian paralysis. My system doesn't worry about averages, and thus does not suffer from this problem.
I think this system may have the following problem: It implicitly assumes that you can take a kind of random sample that in fact you can't.
...You want to evaluate universes by "how would I feel about being in this universe?", which I think means either something like "suppose I were a randomly chosen subject-of-experiences in this universe, what would my expected utility be?" or "suppose I were inserted into a random place in this universe, what would my expected utility be?". (Where "utility" is shorthand for your notion of "life satisfaction", and you a
I'm not entirely sure what you consider to be a "bad" reason for crossing the bridge. However, I'm having a hard time finding a way to define it that both causes agents using evidential counterfactuals to necessarily fail while not having other agents fail.
One way to define a "bad" reason is an irrational one (or the chicken rule). However, if this is what is meant by a "bad" reason, it seems like this is an avoidable problem for an evidential agent, as long as that agent has control over what it decides to think about.
To illustrate, consider what I would ...
I'm certain that ants do in fact have preferences, even if they can't comprehend the concept of preferences in abstract or apply them to counterfactual worlds. They have revealed preferences to quite an extent, as does pretty much everything I think of as an agent.
I think the question of whether insects have preferences in morally pretty important, so I'm interested in hearing what made you think they do have them.
I looked online for "do insects have preferences?", and I saw articles saying they did. I couldn't really figure out why they thought they di...
Right, I suspected the evaluation might be something like that. It does have the difficulty of being counterfactual and so possibly not even meaningful in many cases.
Interesting. Could you elaborate?
I suppose counterfactuals can be tricky to reason about, but I'll provide a little more detail on what I had in mind. Imagine making a simulation of an agent that is a fully faithful representation of its mind. However, run the agent simulation in a modified environment that both gives it access to infinite computational resources as well as makes it ask, an...
...Presumably the evaluation is not just some sort of average-over-actual-lifespan of some satisfaction rating for the usual reason that (say) annihilating the universe without warning may leave average satisfaction higher than allowing it to continue to exist, even if every agent within it would counterfactually have been extremely dissatisfied if they had known that you were going to do it. This might happen if your estimate of the current average satisfaction was 79% and your predictions of the future were that the average satisfaction over the next trill
I'm not sure how this system avoids infinitarian paralysis. For all actions with finite consequences in an infinite universe (whether in space, time, distribution, or anything else), the change in the expected value resulting from those actions is zero.
The causal change from your actions is zero. However, there are still logical connections between your actions and the actions of other agents in very similar circumstances. And you can still consider these logical connections to affect the total expected value of life satisfaction.
It's true, though, that...
I've come up with a system of infinite ethics intended to provide more reasonable moral recommendations than previously-proposed ones. I'm very interested in what people think of this, so comments are appreciated. I've made a write-up of it below.
One unsolved problem in ethics is that aggregate consquentialist ethical theories tend to break down if the universe is infinite. An infinite universe could contain both an infinite amount of good and an infinite amount of bad. If so, you are unable to change the total amount of good or bad in the universe, which ...
Interesting. When you say "fake" versions of myself, do you mean simulations? If so, I'm having a hard time seeing how that could be true. Specifically, what's wrong about me thinking I might not be "real"? I mean, if I though I was i... (read more)