1 min read

8

[epistemic status: maybe wrong; thinking aloud, would like people to yell at me]

There is a repeated motion that occurs when deciding what an AI should do:

(1) Create a decision theory

(2) Create a thought experiment in which an agent with *DT makes a choice which fails to fulfill its utility function (e.g. Oh no! It loses all its money to blackmail!)

(3) Create a DT which does well against problems which the core difficulty which allowed the previous decision theory to lose all its money

If decision theories are as precisely imagined as mathematical structures. For every two distinct decision theories, there exists in mathematical reality a set of "thought experiments" such that the two theories decide differently on them.

This seems weird and difficult now because there isn't a shared logical notation between different "thought experiments". As of now characterizing the class of splitting decision problems for two decision theories is pretheoretic. However, for every pair of decision theories DT_1 and DT_2 the object split(DT_1, DT_2) actually exists. Current notational limits make it currently difficult to simply and completely characterize the class of choice problems on which two DTs give different answers.

But it feels like this sort of problem occupies a similar status as “algorithms” did before the first Universal Turing Machine was constructed.

-

Questions:

In fun games (like prisoner’s dilemma) we have agents (like fairbot) that fight each other. The source code for these agents is entangled with their decision theory. Does examining bots engaged in modal combat make this problem more tractable?

This process repeats like clockwork (it feels like a new decision theory comes out every year or so?) in hopes of giving their baby AI a good way of making good choices and not losing all its money. What if I built an AI that formalized and internalized this process and just ... gave itself good advice? Within logical inductors traders bet on which theorems would be best and traders which make bad bets lose their money. If we can formalize split(DT_1, DT_2) we can look at how well agents fulfill their utility functions in this space. Can we use this to establish a kind of poset of decision theories?

New Comment
3 comments, sorted by Click to highlight new comments since:

Interesting post! :)

I think the process is hard to formalize because specifying step 2 seems to require specifying a decision theory almost directly. Recall that causal decision theorists argue that two-boxing is the right choice in Newcomb’s problem. Similarly, some would argue that not giving the money in counterfactual mugging is the right choice from the perspective of the agent who already knows that it lost, whereas others argue for the opposite. Or take a look at the comments on the Two-Boxing Gene. Generally, the kind of decision problems that put decision theories to a serious test also tend to be ones in which it is non-obvious what the right choice is. The same applies to meta-principles. Perhaps people agree with the vNM axioms, but desiderata that could shed a light on Newcomblike problems appear to be more controversial. For example, irrelevance of impossible outcomes and reflective stability both seem desirable but actually contradict each other.

TL;DR: It seems to be really hard to specify what it means for a decision procedure to "win"/fail in a given thought experiment.

I agree! I think that it this is hard for humans working with current syntactic machinary to specify things like:

* what their decision thoery will return for every decision problem

* what split(DT_1,DT_2) looks like

Right now I think doing this requires putting all decision theories on a useful shared ontology. The way that UTMs put all computable algorithms on a useful shared ontology which allowed people to make proofs about algorithms in general. This looks hard and possibly requires creating new kinds of math.

I am making the assumption here that the decision theories are rescued to the point of being executable philosophy. DTs need to be specified this much to be run by an AI. I believe that the fuzzy concepts inside people's heads about how can in principle be made to work mathematically and then run on a computer. In a similar way that the fuzzy concept of "addition" was ported to symbolic representations and then circuits in a pocket calculator.

Caspar42:

I agree! I think that it this is hard for humans working with current syntactic machinary to specify things like:

* what their decision thoery will return for every decision problem
* what split(DT_1,DT_2) looks like

Right now I think doing this requires putting all decision theories on a useful shared ontology. The way that UTMs put all computable algorithms on a useful shared ontology which allowed people to make proofs about algorithms in general. This looks hard and possibly requires creating new kinds of math.

I am making the assumption here that the decision theories are rescued to the point of being executable philosophy. DTs need to be specified this much to be run by an AI. I believe that the fuzzy concepts inside people's heads about how can in principle be made to work mathematically and then run on a computer. In a similar way that the fuzzy concept of "addition" was ported to symbolic representations and then circuits in a pocket calculator.