Probability of coming into existence again ?
This question has been bothering me for a while now, but I have the nagging feeling that I'm missing something big and that the reasoning is flawed in a very significant way. I'm not well read in philosophy at all, and I'd be really surprised if this particular problem hasn't been addressed many times by more enlightened minds. Please don't hesitate to give reading suggestions if you know more. I don't even know where to start learning about such questions. I have tried the search bar but have failed to find a discussion around this specific topic.
I'll try and explain my train of thought as best as I can but I am not familiar with formal reasoning, so bear with me! (English is not my first language, either)
Based on the information and sensations currently available, I am stuck in a specific point of view and experience specific qualia. So far, it's the only thing that has been available to me; it is the entirety of my reality. I don't know if the cogito ergo sum is well received on Less Wrong, but it seems on the face of it to be a compelling argument for my own existence at least.
Let's assume that there are other conscious beings who "exist" in a similar way, and thus other possible qualia. If we don't assume this, doesn't it mean that we are in a dead end and no further argument is possible? Similar to what happens if there is no free will and thus nothing matters since no change is possible? Again, I am not certain about this reasoning but I can't see the flaw so far.
There doesn't seem to be any reason why I should be experiencing these specific qualia instead of others, that I "popped into existence" as this specific consciousness instead of another, or that I perceive time subjectively. According to what I know, the qualia will probably stop completely at some subjective point in time and I will cease to exist. The qualia are likely to be tied to a physical state of matter (for example colorblindness due to different cells in the eyes) and once the matter does not "function" or is altered, the qualia are gone. It would seem that there could be a link between the subjective and some sort of objective reality, if there is indeed such a thing.
On a side note, I think it's safe to ignore theism and all mentions of a pleasurable afterlife of some sort. I suppose most people on this site have debated this to death elsewhere and there's no real point in bringing it up again. I personally think it's not an adequate solution to this problem.
Based on what I know, and that qualia occur, what is the probability (if any) that I will pop into existence again and again, and experience different qualia each time, with no subjectively perceivable connection with the "previous" consciousness? If it has happened once, if a subjective observer has emerged out of nothing at some point, and is currently observing subjectively (as I think is happening to me), does the subjective observing ever end?
I know it sounds an awful lot like mysticism and reincarnation, but since I am currently existing and observing in a subjective way (or at least I think I am), how can I be certain that it will ever stop?
The only reason why this question matters at all is because suffering is not only possible but quite frequent according to my subjective experience and my intuition of what other possible observers might be experiencing if they do exist in the same way I do. If there were no painful qualia, or no qualia at all, nothing would really matter since there would be no change needed and no concept of suffering. I don't know how to define suffering, but I think it is a valid concept and is contained in qualia, based on my limited subjectivity.
This leads to a second, more disturbing question : does suffering have a limit or is it infinite? Is there a non zero probability to enter into existence as a being that experiences potentially infinite suffering, similar to the main character in I have no mouth and I must scream? Is there no way out of existence? If the answer is no, then how would it be possible to lead a rational life, seeing as it would be a single drop in an infinite ocean?
On a more positive note, this reasoning can serve as a strong deterrent to suicide, since it would be rationally better to prolong your current and familiar existence than to potentially enter a less fortunate one with no way to predict what might happen.
Sadly, these thoughts have shown to be a significant threat to motivation and morale. I feel stuck in this logic and can't see a way out at the moment. If you can identify a flaw here, or know of a solution, then I eagerly await your reply.
kind regards
Another problem with quantum measure
Let's play around with the quantum measure some more. Specifically, let's posit a theory T that claims that the quantum measure of our universe is increasing - say by 50% each day. Why could this be happening? Well, here's a quasi-justification for it: imagine there are lots and lots of of universes, most of them in chaotic random states, jumping around to other chaotic random states, in accordance with the usual laws of quantum mechanics. Occasionally, one of them will partially tunnel, by chance, into the same state our universe is in - and then will evolve forwards in time exactly as our universe is. Over time, we'll accumulate an ever-growing measure.
That theory sounds pretty unlikely, no matter what feeble attempts are made to justify it. But T is observationally indistinguishable from our own universe, and has a non-zero probability of being true. It's the reverse of the (more likely) theory presented here, in which the quantum measure was being constantly diminished. And it's very bad news for theories that treat the quantum measure (squared) as akin to a probability, without ever renormalising. It implies that one must continually sacrifice for the long-term: any pleasure today is wasted, as that pleasure will be weighted so much more tomorrow, next week, next year, next century... A slight fleeting smile on the face of the last human is worth more than all the ecstasy of the previous trillions.
One solution to the "quantum measure is continually diminishing" problem was to note that as the measure of the universe diminished, it would eventually get so low that that any alternative, non-measure diminishing theory, not matter how initially unlikely, would predominate. But that solution is not available here - indeed, that argument runs in reverse, and makes the situation worse. No matter how initially unlikely the "quantum measure is continually increasing" theory is, eventually, the measure will become so high that it completely dominates all other theories.
False vacuum: the universe playing quantum suicide
Imagine that the universe is approximately as it appears to be (I know, this is a controversial proposition, but bear with me!). Further imagine that the many worlds interpretation of Quantum mechanics is true (I'm really moving out of Less Wrong's comfort zone here, aren't I?).
Now assume that our universe is in a situation of false vacuum - the universe is not in its lowest energy configuration. Somewhere, at some point, our universe may tunnel into true vacuum, resulting in a expanding bubble of destruction that will eat the entire universe at high speed, destroying all matter and life. In many worlds, such a collapse need not be terminal: life could go one on a branch of lower measure. In fact, anthropically, life will go on somewhere, no matter how unstable the false vacuum is.
So now assume that the false vacuum we're in is highly unstable - the measure of the branch in which our universe survives goes down by a factor of a trillion every second. We only exist because we're in the branch of measure a trillionth of a trillionth of a trillionth of... all the way back to the Big Bang.
None of these assumptions make any difference to what we'd expect to see observationally: only a good enough theory can say that they're right or wrong. You may notice that this setup transforms the whole universe into a quantum suicide situation.
The question is, how do you go about maximising expected utility in this situation? I can think of a few different approaches:
- Gnaw on the bullet: take the quantum measure as a probability. This means that you now have a discount factor of a trillion every second. You have to rush out and get/do all the good stuff as fast as possible: a delay of a second costs you a reduction in utility of a trillion. If you are a negative utilitarian, you also have to rush to minimise the bad stuff, but you can also take comfort in the fact that the potential for negative utility across the universe is going down fast.
- Use relative measures: care about the relative proportion of good worlds versus bad worlds, while assigning zero to those worlds where the vacuum has collapsed. This requires a natural zero to make sense, and can be seen as quite arbitrary: what would you do about entangled worlds, or about the non-zero probability that the vacuum-collapsed worlds may have worthwhile life in them? Would the relative measure user also put zero value to worlds that were empty of life for other reasons than vacuum collapse? For instance, would they be in favour of programming an AI's friendliness using random quantum bits, if it could be reassured that if friendliness fails, the AI would kill everyone immediately?
- Deny the measure: construct a meta ethical theory where only classical probabilities (or classical uncertainties) count as probabilities. Quantum measures do not: you care about the sum total of all branches of the universe. Universes in which the photon went through the top slit, went through the bottom slit, or was in an entangled state that went through both slits... to you, there are three completely separate universes, and you can assign totally unrelated utilities to each one. This seems quite arbitrary, though: how are you going to construct these preferences across the whole of the quantum universe, when forged your current preferences on a single branch?
- Cheat: note that nothing in life is certain. Even if we have the strongest evidence imaginable about vacuum collapse, there's always a tiny chance that the evidence is wrong. After a few seconds, that probability will be dwarfed by the discount factor of the collapsing universe. So go about your business as usual, knowing that most of the measure/probability mass remains in the non-collapsing universe. This can get tricky if, for instance the vacuum collapsed more slowly that a factor of a trillion a second. Would you be in a situation where you should behave as if you believed vacuum collapse for another decade, say, and then switch to a behaviour that assumed non-collapse afterwards? Also, would you take seemingly stupid bets, like bets at a trillion trillion trillion to one that the next piece of evidence will show no collapse (if you lose, you're likely in the low measure universe anyway, so the loss is minute)?
Universal agents and utility functions
I'm Anja Heinisch, the new visiting fellow at SI. I've been researching replacing AIXI's reward system with a proper utility function. Here I will describe my AIXI+utility function model, address concerns about restricting the model to bounded or finite utility, and analyze some of the implications of modifiable utility functions, e.g. wireheading and dynamic consistency. Comments, questions and advice (especially about related research and material) will be highly appreciated.
Introduction to AIXI
Marcus Hutter's (2003) universal agent AIXI addresses the problem of rational action in a (partially) unknown computable universe, given infinite computing power and a halting oracle. The agent interacts with its environment in discrete time cycles, producing an action-perception sequence with actions (agent outputs)
and perceptions (environment outputs)
chosen from finite sets
and
. The perceptions are pairs
, where
is the observation part and
denotes a reward. At time k the agent chooses its next action
according to the expectimax principle:
Here M denotes the updated Solomonoff prior summing over all programs that are consistent with the history
[1] and which will, when run on the universal Turing machine T with successive inputs
, compute outputs
, i.e.
AIXI is a dualistic framework in the sense that the algorithm that constitutes the agent is not part of the environment, since it is not computable. Even considering that any running implementation of AIXI would have to be computable, AIXI accurately simulating AIXI accurately simulating AIXI ad infinitem doesn't really seem feasible. Potential consequences of this separation of mind and matter include difficulties the agent may have predicting the effects of its actions on the world.
Utility vs rewards
So, why is it a bad idea to work with a reward system? Say the AIXI agent is rewarded whenever a human called Bob pushes a button. Then a sufficiently smart AIXI will figure out that instead of furthering Bob’s goals it can also threaten or deceive Bob into pushing the button, or get another human to replace Bob. On the other hand, if the reward is computed in a little box somewhere and then displayed on a screen, it might still be possible to reprogram the box or find a side channel attack. Intuitively you probably wouldn't even blame the agent for doing that -- people try to game the system all the time.
You can visualize AIXI's computation as maximizing bars displayed on this screen; the agent is unable to connect the bars to any pattern in the environment, they are just there. It wants them to be as high as possible and it will utilize any means at its disposal. For a more detailed analysis of the problems arising through reinforcement learning, see Dewey (2011).
Is there a way to bind the optimization process to actual patterns in the environment? To design a framework in which the screen informs the agent about the patterns it should optimize for? The answer is, yes, we can just define a utility function
that assigns a value to every possible future history
and use it to replace the reward system in the agent specification:
When I say "we can just define" I am actually referring to the really hard question of how to recognize and describe the patterns we value in the universe. Contrasted with the necessity to specify rewards in the original AIXI framework, this is a strictly harder problem, because the utility function has to be known ahead of time and the reward system can always be represented in the framework of utility functions by setting
For the same reasons, this is also a strictly safer approach.
Infinite utility
The original AIXI framework must necessarily place upper and lower bound on the rewards that are achievable, because the rewards are part of the perceptions and is finite. The utility function approach does not have this problem, as the expected utility
is always finite as long as we stick to a finite set of possible perceptions, even if the utility function is not bounded. Relaxing this constraint and allowing to be infinite and the utility to be unbounded creates divergence of expected utility (for a proof see de Blanc 2008). This closely corresponds to the question of how to be a consequentialist in an infinite universe, discussed by Bostrom (2011). The underlying problem here is that (using the standard approach to infinities) these expected utilities will become incomparable. One possible solution to this problem could be to use a larger subfield than
of the surreal numbers, my favorite[2] so far being the Levi-Civita field generated by the infinitesimal
:
with the usual power-series addition and multiplication. Levi-Civita numbers can be written and approximated as
(see Berz 1996), which makes them suitable for representation on a computer using floating point arithmetic. If we allow the range of our utility function to be , we gain the possibility of generalizing the framework to work with an infinite set of possible perceptions, therefore allowing for continuous parameters. We also allow for a much broader set of utility functions, no longer excluding the assignment of infinite (or infinitesimal) utility to a single event. I recently met someone who argued convincingly that his (ideal) utility function assigns infinite negative utility to every time instance that he is not alive, therefore making him prefer life to any finite but huge amount of suffering.
Note that finiteness of is still needed to guarantee the existence of actions with maximal expected utility, and the finite (but dynamic) horizon
remains a very problematic assumption, as described in Legg (2008).
Modifiable utility functions
Any implementable approximation of AIXI implies a weakening of the underlying dualism. Now the agent's hardware is part of the environment and at least in the case of a powerful agent, it can no longer afford to neglect the effect its actions may have on its source code and data. One question that has been asked is whether AIXI can protect itself from harm. Hibbard (2012) shows that an agent similar to the one described above, equipped with the ability to modify its policy responsible for choosing future actions, would not do so, given that it starts out with the (meta-)policy to always use the optimal policy, and the additional constraint to change only if that leads to a strict improvement. Ring and Orseau (2011) study under which circumstances a universal agent would try to tamper with the sensory information it receives. They introduce the concept of a delusion box, a device that filters and distorts the perception data before it is written into the part of the memory that is read during the calculation of utility.
A further complication to take into account is the possibility that the part of memory that contains the utility function may get rewritten, either by accident, by deliberate choice (programmers trying to correct a mistake), or in an attempt to wirehead. To analyze this further we will now consider what can happen if the screen flashes different goals in different time cycles. Let
denote the utility function the agent will have at time k.
Even though we will only analyze instances in which the agent knows at time k, which utility function it will have at future times
(possibly depending on the actions
before that), we note that for every fixed future history
the agent knows the utility function
that is displayed on the screen because the screen is part of its perception data
.
This leads to three different agent models worthy of further investigation:
- Agent 1 will optimize for the goals that are displayed on the screen right now and act as if it would continue to do so in the future. We describe this with the utility function
- Agent 2 will try to anticipate future changes to its utility function and maximize the utility it experiences at every time cycle as shown on the screen at that time. This is captured by
- Agent 3 will, at time k, try to maximize the utility it derives in hindsight, displayed on the screen at the time horizon
Of course arbitrary mixtures of these are possible.
The type of wireheading that is of interest here is captured by the Simpleton Gambit described by Orseau and Ring (2011), a Faustian deal that offers the agent maximal utility in exchange for its willingness to be turned into a Simpleton that always takes the same default action at all future times. We will first consider a simplified version of this scenario: The Simpleton future, where the agent knows for certain that it will be turned into a Simpleton at time k+1, no matter what it does in the remaining time cycle. Assume that for all possible action-perception combinations the utility given by the current utility function is not maximal, i.e. holds for all
. Assume further that the agents actions influence the future outcomes, at least from its current perspective. That is, for all
there exist
with
. Let
be the Simpleton utility function, assigning equal but maximal utility
to all possible futures. While Agent 1 will optimize as before, not adapting its behavior to the knowledge that its utility function will change, Agent 3 will be paralyzed, having to rely on whatever method its implementation uses to break ties. Agent 2 on the other hand will try to maximize only the utility
.
Now consider the actual Simpleton Gambit: At time k the agent gets to choose between changing, , resulting in
and
(not changing), leading to
for all
. We assume that
has no further effects on the environment. As before, Agent 1 will optimize for business as usual, whether or not it chooses to change depends entirely on whether the screen specifically mentions the memory pointer to the utility function or not.
Agent 2 will change if and only if the utility of changing compared to not changing according to what the screen currently says is strictly smaller than the comparative advantage of always having maximal utility in the future. That is,
is strictly less than
This seems quite analogous to humans, who sometimes tend to choose maximal bliss over future optimization power, especially if the optimization opportunities are meager anyhow. Many people do seem to choose their goals so as to maximize the happiness felt by achieving them at least some of the time; this is also advice that I have frequently encountered in self-help literature, e.g. here. Agent 3 will definitely change, as it only evaluates situations using its final utility function.
Comparing the three proposed agents, we notice that Agent 1 is dynamically inconsistent: it will optimize for future opportunities, that it predictably will not take later. Agent 3 on the other hand will wirehead whenever possible (and we can reasonably assume that opportunities to do so will exist in even moderately complex environments). This leaves us with Agent model 2 and I invite everyone to point out its flaws.
[1] Dotted actions/ perceptions, like denote past events, underlined perceptions
denote random variables to be observed at future times.
[2] Bostrom (2011) proposes using hyperreal numbers, which rely heavily on the axiom of choice for the ultrafilter to be used and I don't see how those could be implemented.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)