Allow me to restate the problem in my own words. We are asked to produce a general purpose function named "agent()" which returns a positive integer. The returned result is intended to represent a choice (made by "agent()") among a discrete collection of options. There is one implicit parameter to agent - the source code for second function named "world()". This second function returns an integer result as well, but this result is to be interpreted as a kind of payoff. The objective in this exercise is to produce an agent() function which maximizes the result returned by an arbitrary world().
The code of the function world() is expected to include one or more calls to agent(). For example, when modeling a Newcomb or Parfit's hitchhiker decision, the world() will call agent() at least once on behalf of Omega in order to predict what the agent will do later. Then world calls agent() a second time, this time "for real". Or, in a symmetric Prisoner's dilemma, agent() might be called twice, once for each decision-making prisoner.
Slepnev (cousin it?) suggests that the way agent() should be structured is as a theorem-prover, rather than as a simulator. (This approach has the important technical advantage of better control over non-termination.)
Ok, that is my summary - my understanding of what is going on here. Now for my questions:
That is enough questions for now. Depending on the answers, I may have follow-ups.
Edit: minor cleanups.
Slepnev (cousin it?)
Yeah. Sadly I didn't use my real name when I registered. Sorry if that got anyone confused. Nesov uses my real name because we had many email exchanges, and also because he's a university friend of my brother's :-)
In the draft, I used your nickname at first, but in "prior work" section that sounded rather strange, so I replaced it with your real name everywhere. Glad to see that agrees with your preference.
The code of the function world() is expected to include one or more calls to agent()
Not necessarily (or not exactly). Where I used agent2() for a predictor, I could use agent3() for the agent as well. See this comment.
Extensions are possible, but not relevant to the main point of the model. It doesn't seem natural to model different players with the same agent-program. Observations are not part of the model, but you could consider the agent as producing a program with parameters (instead of an unstructured constant) as a result of its decision.
The other use for inputs is information passed to agents through sensory observation. The extension is extremely natural.
"random numbers": random numbers in nature are really pseudorandom numbers - numbers we lack the computational power to predict. In a full model of the universe, random numbers are not necessary - but different possible worlds with different laws of physics and some probability mass assigned to each, are.
Any randomness in the agent would really be pseudorandomness.
The other use for inputs is information passed to agents through sensory observation.
I don't believe you can obtain proper information about the world through observation (unless you are a human and don't know what you want). Preference is unchanging and is defined subjectively, so formal agents can't learn new things about it (apart from resolution of logical uncertainty). Observations play a role in how plans play out (plans are prepared to be conditional on observations), or in priority for preparing said plans as observations come in, but not as criteria for making decisions.
On the other hand, observations could probably be naturally seen as constructing new agents from existing ones (without changing their preference/concept of environment). And some notion of observation needs to be introduced at some point, just as a notion of computational time.
I was using "information" loosely. Define it as "that thing you get from observation" if you want.
The point is, you will make different choices if you get different sensory experiences, because the sensory experiences imply something about how you control the world program. You're right that this could be modeled with different agent functions instead of parameters. Interesting - that seems rather deeply meaningful.
The point is, you will make different choices if you get different sensory experiences
You can see it as unconditionally making a single conditional choice. The choice is itself a program with parameters, and goes different ways depending on observation, but is made without regard for observation. As an option, the choice is a parametrized constructor for new agents, which upon being constructed will make further choices, again without parameters.
There are several different possible formulations here. It seems like they will lead to the same results. The best one is, I suppose, the most elegant one.
Not sure which is.
So it sounds like you are saying that the agent() program only represents the decision-theory portion of the agent. The Bayesian cognitive portion of the agent and the part of the agent that prefers some things over others are both modeled in world(). Communication with other agents, logic, all these features of agency are in world(), not in agent().
May I suggest that "agent()" has been misnamed? Shouldn't it be called something more like choice() or decision()? And shouldn't "world()" be named "expected-value-of-decision-by-agent()", or something like that?
Note that world() is part of the agent(). Certainly, world() will become preference in the next post (and cease to be necessarily a program, while agent() must remain a program), but historically this part was always "world program", and preference replaces (subsumes) that, which is not an obvious step.
This looks right to me. It mimics the reason humans don't explode when they try to cross the street. Brilliant.
It seems the most pressing question is how to deal with complicated worlds about which certain proofs cannot be made in reasonable time. A proof system for uncertain proofs, I suppose.
Let's say the agent proves two statements implying different consequences of the same action, such as [agent()==2 => world2()==1000] and
[agent()==2 => world2()==2000].
Then, it can also prove that (agent()==2) implies a contradiction, which normally can't happen
Why not? (X implies a contradiction) implies (NOT X), it's a standard reductio-ad-absurdum proof technique (= derived rule of propositional calculus). Or do I miss something here?
It does mean that [agent()==2] is false, but that allows proving statements like [agent()==2 => world2()==U] for any value of U, which breaks our algorithm. If we use Goedel diagonal (chicken) rule of performing an action whenever it's proven that this action won't be performed, that guarantees that an action can never be proven to be impossible, unless the reasoning system is compromised.
You'd need to figure out a model first. To the extent Parfit's Hitchhiker is like a two-player game, the tools for figuring out what happens are very brittle and hard to work with. To the extent it's like Newcomb's problem, you can model it and get the right answer.
Actually I think it doesn't depend. Because you don't update on evidence, so you can't incorporate that into your proofs, so you don't get bad moral arguments.
Ok so the code goes like:
Use the appropriate decision theory to implement a function f from {picks you up, doesn't pick you up} and {world-programs} to {pay, don't pay}
Then set:
world-program:
If (f(picks you up, world-program)={pay}) then
superfluously call f(picks you up,world-program)
return 10000
else
superfluously call f(doesn't pick you up,world-program)
return 0
If agent() is actually agent('source of world') as the classical newcomb problem has it, I fail to see what is wrong with simply enumerating the possible actions and simulating the 'source of world' with the constant call of agent('source of world') replaced by the current action candidate? And then returning the action with maximum payoff obviously.
See world2(). Also, the agent takes no parameters, it just knows the world program it's working with.
The only difference I can see between "an agent which knows the world program it's working with" and "agent('source of world')" is that the latter agent can be more general.
A prior distribution about possible states of the world, which is what you'd want to pass outside of toy-universe examples, is rather clearly part of the agent rather than a parameter.
Yes, in a sense. (Although technically, the agent could know facts about the world program that can't be algorithmically or before-timeout inferred just from the program, and ditto for agent's own program, but that's a fine point.)
The section on "moral arguments" made me say aha. The agent has an axiom that allows it to deduce the statement "agent()==1" from the conjunction.
No need for an additional axiom, it's just the way its program is written, and so is part of the definition of agent().
In fact, one is not able to deduce agent()==1 from that. Doing so would create a contradiction because we would prove
~agent()==2 agent()==2 => world()=100000000000000000000000000 agent()==2
This agent cannot PROVE its own action, except possibly at the maximum proof length, as it would create a contradiction. The algorithm is too unpredictable for that.
An external agent with more understanding COULD prove agent()'s action from this.
The formal statement you've written is unclear; certainly, proving that the agent performs the action is not as straightforward as just acting on a moral argument (the agent would need to prove that [it proves the statement before deciding on the action], not just that the statement is true), but I'm not sure there are general barriers against being able to do so (I might be missing something).
Formal statements tend to be unclear when they're really proofs that one's attempts to place line breaks in somehow failed.
The problem is that proving that the agent performs the action proves that the agent does not perform some other action. From this, the agent can prove very high utility given that action. An agent which proves very high utility given an action does not take any other action.
So: An agent that proves that it takes an action, as long as it has enough time left to extend the proof a bit further, does not take that action.
That's basically the reason consequences appear consistent.
If the agent can change its mind, then naturally it can't prove that it won't. I was assuming a situation where the agent is reasoning further after it has already decided on the action, so that further moral arguments are not.
Oh, I see. I thought proving things about agent() and deciding what to do could be the same thing, but I was wrong.
This post explains the sense in which UDT and its descendants can control programs with no parameters, without using explicit control variables.
Related to: Towards a New Decision Theory, What a reduction of "could" could look like.
Usually, a control problem is given by an explicit (functional) dependence of outcome on control variables (together with a cost function over the possible outcomes). Solution then consists in finding the values of control variables that lead to the optimal outcome. On the face of it, if we are given no control variables, or no explicit dependence of the outcome on control variables, then the problem is meaningless and cannot be solved.
Consider what is being controlled in UDT and in the model of control described by Vladimir Slepnev. It might be counterintuitive, but in both cases the agent controls constant programs, in other words programs without explicit parameters. And for constant programs, their output is completely determined by their code, nothing else.
Let's take, for example, Vladimir Slepnev's model of Newcomb's problem, written as follows:
The control problem that the agent faces is to optimize the output of program world() that has no parameters. It might be tempting to say that there is a parameter, namely the sites where agent() is included in the program, but it's not really so: all these entries can be substituted with the code of program agent() (which is also a constant program), at which point there remains no one element in the program world() that can be called a control variable.
To make this point more explicit, consider the following variant of program world():
Here, agent2() is a constant program used to predict agent's decision, that is known to compute the same output as agent(), but does not, generally, resemble agent() in any other way. If we try to consider only the explicit entry of program agent() as control variable (either by seeing the explicit program call in this representation of world2(), or by matching the code of agent() if its code was substituted for the call), we'll end up with an incorrect understanding of the situation, where the agent is only expected to control its own action, but not the prediction computed by agent2().
Against explicit dependence
What the above suggests is that dependence of the structure being controlled from agent's decision shouldn't be seen as part of problem statement. Instead, this dependence should be reconstructed, given definition of the agent and definition of the controlled structure. Relying on explicit dependence, even if it's given as part of problem statement, is actually detrimental to the ability to correctly solve the problem. Consider, for example, the third variant of the model of Newcomb's problem, where the agent is told explicitly how its action (decision) is used:
Here, agent2() is the predictor, and whatever action the program agent() computes is passed as a parameter to world3(). Note that the problem is identical to one given by world2(), but you are explicitly tempted to privilege the dependence of the outcome of world3() on its parameter that is computed by agent(), over the dependence on the prediction computed by agent2(), even though both are equally important for correctly solving the problem.
To avoid this explicit dependence bias, we can convert the problem given with an explicit dependence to one without, by "plugging in" all parameters, forgetting about the seams, and posing a problem of restoring all dependencies from scratch (alternatively, of controlling the resulting constant program):
Now, world3'() can be seen to be equivalent to world2(), after the code of world3() is substituted.
Knowable consequences
How can the agent control a constant program? Isn't its output "already" determined? What is its decision about the action for, if the output of the program is already determined?
Note that even if the environment (controlled constant program) determines its output, it doesn't follow that the agent can figure out what that output is. The agent knows a certain number of logical facts (true statements) and can work on inferring more such statements, but it might not be enough to infer the output of environment. And every little bit of knowledge helps.
One kind of statements in particular has an unusual property: even though the agent can't (usually) infer their truth, it can determine their truth any way it likes. These are statements about agent's decision, such as (agent() == 1). Being able to know their truth by the virtue of determining it allows the agent to "infer" the truth of many more statements than it otherwise could, and in particular this could allow inferring something about the output of environment (conditional on the decision being so and so). Furthermore, determining which way the statements about the agent go allows to indirectly determine which way the output of environment goes. Of course, the environment already takes into account the actual decision that the agent will arrive at, but the agent normally doesn't know this decision beforehand.
Moral arguments
Let's consider an agent that reasons formally about the output of environment, and in particular about the output of environment given possible decisions of the agent. Such an agent produces (proves) statements in a given logical language and theory, and some of the statements are "calls for action", that is by proving such statements, the agent is justified in taking an action associated with them.
For example, with programs world2() and agent() above, where agent() is known to only output either 1 or 2, one such statement is:
This statement is a moral argument for deciding (agent()==1). Even though in the statement itself, one of the implications must be vacuously true by virtue of its antecedent being false, and so can't say anything about the output of environment, the implication following from that actually chosen action is not vacuous, and therefore choosing that action simultaneously decides the output of environment.
Consequences appear consistent
You might also want to re-read Vladimir Slepnev's post on the point that consequences appear consistent. Even though most of the implications in moral arguments are vacuously true (based on false premise), the agent can't prove which ones, and correspondingly can't prove a contradiction from their premises. Let's say the agent proves two statements implying different consequences of the same action, such as
Then, it can also prove that (agent()==2) implies a contradiction, which normally can't happen. As a result, consequences from possible actions appear as consistent descriptions of possible worlds, even though they are not. For example, one-boxing agent in Newcomb's problem can prove
but (world2()==1000) is actually inconsistent.
This suggests a reduction of the notion of (impossible) possible worlds (counterfactuals) generated by possible actions of the agent that are not taken. Just like explicit dependencies, such possible worlds can be restored from definition of (actual) environment, instead of being explicitly given as part of the problem statement.
Against counterfactuals
Since it turns out that both dependencies, and counterfactual environments are implicit in the actual environment, they don't need to be specified in the problem statement. Furthermore, if counterfactual environments are not specified, their utility doesn't need to be specified as well: it's sufficient to specify the utility of actual environment.
Instead of valuing counterfactuals, the agent can just value the actual environment, with "value of counterfactuals" appearing as an artifact of the structure of moral arguments.
Ambient control
I call this flavor of control ambient control, and correspondingly the decision theory studying it, ambient decision theory (ADT). This name emphasizes how the agent controls the environment not from any given location, but potentially from anywhere, through the patterns whose properties can be inferred from statements about agent's decisions. A priori, the agent is nowhere and everywhere, and some of its work consists in figuring out how exactly it exerts its control.
Prior work
In discussions on Timeless decision theory, Eliezer Yudkowsky suggested the idea that agent's decision could control the environment through more points than agent's "actual location", which led to the question of finding the right configuration of explicit dependencies, or counterfactual structure, in the problem statement (described by Anna Salamon in this post).
Wei Dai first described a decision-making scheme involving automatically inferring such dependencies (and control of constant programs, hence an instance of ambient control) known as Updateless decision theory. In this scheme, the actual inference of logical consequences was extracted as an unspecified "mathematical intuition module".
Vladimir Slepnev figured out an explicit proof-of-the-concept algorithm successfully performing a kind of ambient control to solve the problem of cooperation in Prisoner's Dilemma. The importance of this algorithm is in showing that the agent can in fact successfully prove the moral arguments necessary to make a decision in this nontrivial game theory problem. He then abstracted the algorithm and discussed some of the more general properties of ambient control.