jessicat comments on FAI Research Constraints and AGI Side Effects - LessWrong

14 Post author: JustinShovelain 03 June 2015 07:25PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (58)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 05 June 2015 02:27:56PM *  3 points [-]

For example, I'm currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do.

Yes, something like that, although I don't usually think of it as an adversary.

I more meant "adversary" in crypto terms: something that can and will throw behavior at us we don't want unless we formally demonstrate that it can't.

That said, bounded algorithms can be useful as inspiration, even for unbounded problems.

I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes' Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I've also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can't explain at all or write off as "irrational".

To quote myself (because it's applicable here but the full text isn't done):

The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information -- AIXI can update optimally on sensory information by falling back on its infinite computing power.

In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the "bounded" and "approximate" AIXI_{tl} runs in "optimal" time modulo an astronomically-large additive constant. So I think that a "bottom-up" theory of bounded-rational reasoning or resource-rational reasoning - one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information - will work a lot better to explain how to scale up by "passing to the limit" at the last step.

Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you find over time that you place more probability mass on \Bot, then you're acquiring a degree of belief that the computation in question won't terminate.

I think this would then mix with probabilistic programming fairly well, and also have immediate applications to assigning rational, well-behaved degrees of belief to "weird" propositions like Goedel Sentences or Halting predicates.

Comment author: jessicat 09 June 2015 06:15:37AM *  2 points [-]

(BTW: here's a writeup of one of my ideas for writing planning queries that you might be interested in)

Often we want a model where the probability of taking action a is proportional to p(a)e^E[U(x, a)], where p is the prior over actions, x consists of some latent variables, and U is the utility function. The straightforward way of doing this fails:

query {
. a ~ p()
. x ~ P(x)
. factor(U(x, a))
}

Note that I'm assuming factor takes a log probability as its argument. This fails due to "wishful thinking": it tends to prefer riskier actions. The problem can be reduced by taking more samples:

query {
. a ~ p()
. us = []
. for i = 1 to n
. . x_i ~ P(x)
. . us.append(U(x_i, a))
. factor(mean(us))
}

This does better, because since we took multiple samples, mean(us) is likely to be somewhat accurate. But how do we know how many samples to take? The exact query we want cannot be expressed with any finite n.

It turns out that we just need to sample n from a Poisson distribution and make some more adjustments:

query {
. a ~ p()
. n ~ Poisson(1)
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x_i, a))
}

Note that U must be non-negative. Why does this work? Consider:

P(a) α p(a) E[e^sum(log U(x_i, a) for i in range(n))]
= p(a) E[prod(U(x_i, a) for i in range(n))]
= p(a) E[ E[prod(U(x_i, a) for i in range(n)) | n] ]
[here use the fact that the terms in the product are independent]
= p(a) E[ E[U(x, a)]^n ]
= p(a) sum(i=0 to infinity) E[U(x, a)]^i / i!
[Taylor series!]
= p(a) e^E[U(x, a)]

Ideally, this technique would help to perform inference in planning models where we can't enumerate all possible states.

Comment author: [deleted] 09 June 2015 11:13:00PM *  0 points [-]

Interesting! How does that compare to the usual implementations of planning as probabilistic inference, as exemplified below?

(query
. (define a (prior))
. (define x (lambda (a) (world a)))
. (define r (flip (sigmoid (U (x a)))))
. a
. (= r #t))
Comment author: jessicat 10 June 2015 02:26:43AM *  2 points [-]

Your model selects an action proportional to p(a) E[sigmoid(U) | a], whereas mine selects an action proportional to p(a) e^E[U | a]. I think the second is better, because it actually treats actions the same if they have the same expected utility. The sigmoid version will not take very high utilities or very low utilities into account much.

Btw it's also possible to select an action proportional to E[U | a]^n:

query {
. a ~ p()
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x, a))
}
Comment author: [deleted] 10 June 2015 11:38:05PM 0 points [-]

Could you explain your syntax here? What probabilistic programming language are you using?

I think the second is better, because it actually treats actions the same if they have the same expected utility.

Well so does the sigmoided version, but you are right that the sigmoid version won't take very high or very low utilities into account. It's meant to shoehorn unbounded utility functions into a framework where one normally works only with random variables.

Comment author: jessicat 13 June 2015 07:00:04AM 0 points [-]

It's not a specific programming language, I guess it's meant to look like Church. It could be written as:

(query
. (define a (p))
. (foreach (range n) (lambda i)
. . (define x (x-prior))
. . (factor (log (U x a)))))

Well so does the sigmoided version

It samples an action proportional to p(a) E[sigmoid(U) | a]. This can't be written as a function of E[U | a].