Comment author: [deleted] 05 June 2015 02:27:56PM *  3 points [-]

For example, I'm currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do.

Yes, something like that, although I don't usually think of it as an adversary.

I more meant "adversary" in crypto terms: something that can and will throw behavior at us we don't want unless we formally demonstrate that it can't.

That said, bounded algorithms can be useful as inspiration, even for unbounded problems.

I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes' Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I've also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can't explain at all or write off as "irrational".

To quote myself (because it's applicable here but the full text isn't done):

The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information -- AIXI can update optimally on sensory information by falling back on its infinite computing power.

In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the "bounded" and "approximate" AIXI_{tl} runs in "optimal" time modulo an astronomically-large additive constant. So I think that a "bottom-up" theory of bounded-rational reasoning or resource-rational reasoning - one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information - will work a lot better to explain how to scale up by "passing to the limit" at the last step.

Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you find over time that you place more probability mass on \Bot, then you're acquiring a degree of belief that the computation in question won't terminate.

I think this would then mix with probabilistic programming fairly well, and also have immediate applications to assigning rational, well-behaved degrees of belief to "weird" propositions like Goedel Sentences or Halting predicates.

Comment author: jessicat 09 June 2015 06:15:37AM *  2 points [-]

(BTW: here's a writeup of one of my ideas for writing planning queries that you might be interested in)

Often we want a model where the probability of taking action a is proportional to p(a)e^E[U(x, a)], where p is the prior over actions, x consists of some latent variables, and U is the utility function. The straightforward way of doing this fails:

query {
. a ~ p()
. x ~ P(x)
. factor(U(x, a))
}

Note that I'm assuming factor takes a log probability as its argument. This fails due to "wishful thinking": it tends to prefer riskier actions. The problem can be reduced by taking more samples:

query {
. a ~ p()
. us = []
. for i = 1 to n
. . x_i ~ P(x)
. . us.append(U(x_i, a))
. factor(mean(us))
}

This does better, because since we took multiple samples, mean(us) is likely to be somewhat accurate. But how do we know how many samples to take? The exact query we want cannot be expressed with any finite n.

It turns out that we just need to sample n from a Poisson distribution and make some more adjustments:

query {
. a ~ p()
. n ~ Poisson(1)
. for i = 1 to n
. . x_i ~ P(x)
. . factor(log U(x_i, a))
}

Note that U must be non-negative. Why does this work? Consider:

P(a) α p(a) E[e^sum(log U(x_i, a) for i in range(n))]
= p(a) E[prod(U(x_i, a) for i in range(n))]
= p(a) E[ E[prod(U(x_i, a) for i in range(n)) | n] ]
[here use the fact that the terms in the product are independent]
= p(a) E[ E[U(x, a)]^n ]
= p(a) sum(i=0 to infinity) E[U(x, a)]^i / i!
[Taylor series!]
= p(a) e^E[U(x, a)]

Ideally, this technique would help to perform inference in planning models where we can't enumerate all possible states.

Comment author: [deleted] 05 June 2015 02:27:56PM *  3 points [-]

For example, I'm currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do.

Yes, something like that, although I don't usually think of it as an adversary.

I more meant "adversary" in crypto terms: something that can and will throw behavior at us we don't want unless we formally demonstrate that it can't.

That said, bounded algorithms can be useful as inspiration, even for unbounded problems.

I have a slightly different perspective on the bounded/unbounded issue. Have you ever read Jaynes' Probability Theory? Well, I never got up to the part where he undoes paradoxes, but the way he preached about it sunk in: a paradox will often arise because you passed to the limit too early in your proof or construction. I've also been very impressed by the degree to which resource-rational and bounded-rational models of cognition explain facts about real minds that unbounded models either can't explain at all or write off as "irrational".

To quote myself (because it's applicable here but the full text isn't done):

The key is that AIXI evaluates K(x), the Kolmogorov complexity of each possible Turing-machine program. This function allows a Solomonoff Inducer to perfectly separate the random information in its sensory data from the structural information, yielding an optimal distribution over representations that contain nothing but causal structure. This is incomputable, or requires infinite algorithmic information -- AIXI can update optimally on sensory information by falling back on its infinite computing power.

In my perspective, at least, AIXI is cheating by assuming unbounded computational power, with the result that even the "bounded" and "approximate" AIXI_{tl} runs in "optimal" time modulo an astronomically-large additive constant. So I think that a "bottom-up" theory of bounded-rational reasoning or resource-rational reasoning - one that starts with the assumption we have strictly bounded finite compute-power the same way probability theory assumes we have strictly bounded finite information - will work a lot better to explain how to scale up by "passing to the limit" at the last step.

Which then goes to that research I want to do: I think we could attack logical uncertainty and probabilistic reflection by finding a theory for how to trade finite amounts of compute time for finite amounts of algorithmic information. The structure currently in my imagination is a kind of probability mixed with domain theory: the more computing power you add, the more certain you can become about the results of computations, even if you still have to place some probability mass on \Bot (bottom). In fact, if you find over time that you place more probability mass on \Bot, then you're acquiring a degree of belief that the computation in question won't terminate.

I think this would then mix with probabilistic programming fairly well, and also have immediate applications to assigning rational, well-behaved degrees of belief to "weird" propositions like Goedel Sentences or Halting predicates.

Comment author: jessicat 09 June 2015 06:12:40AM 0 points [-]

ZOMFG, can you link to a write-up? This links up almost perfectly with a bit of research I've been wanting to do.

Well, a write-up doesn't exist because I haven't actually done the math yet :)

But the idea is about algorithms for doing nested queries. There's a planning framework where you take action a proportional to p(a) e^E[U | a]. If one of these actions is "defer to your successor", then the computation of (U | a) is actually another query that samples a different action b proportional to p(b) e^E[U | b]. In this case you can actually just go ahead and convert the resulting nested query to a 1-level query: you can convert a "softmax of softmax" into a regular softmax, if that makes sense.

This isn't doing Vingean reflection, because it's actually doing all the computational work that its successor would have to do. So I'm interested in ways to simplify computationally expensive nested queries into approximate computationally cheap single queries.

Here's a simple example of why I think this might be possible. Suppose I flip a coin to decide whether the SAT problem I generate has a solution or not. Then I run a nested query to generate a SAT problem that either does or does not have a solution (depending on the original coin flip). Then I hand you the problem, and you have to guess whether it has a solution or not. I check your solution using a query to find the solution to the problem.

If you suck at solving SAT problems, your best bet might just be to guess that there's a 50% chance that the problem is solveable. You could get this kind of answer by refactoring the complicated nested nested query model into a non-nested model and then noting that the SAT problem itself gives you very little information about whether it is solveable (subject to your computational constraints).

I'm thinking of figuring out the math here better and then applying it to things like planning queries where your successor has a higher rationality parameter than you (an agent with rationality parameter α takes action a with probability proportional to p(a) e^(α * E[U | a]) ). The goal would be to formalize some agent that, for example, generally chooses to defer to a successor who has a higher rationality parameter, unless there is some cost for deferring, in which case it may defer or not depending on some approximation of value of information.

Your project about trading computing power for algorithmic information seems interesting and potentially related, and I'd be interested in seeing any results you come up with.

even if you still have to place some probability mass on \Bot (bottom)

Is this because you assign probability mass to inconsistent theories that you don't know are inconsistent?

Comment author: [deleted] 05 June 2015 12:05:02AM 0 points [-]

Ah, ok. So you're saying, "Let's do FAI by first assuming we have an incomplete infinity of processing power to apply -- thus assuming the Most Powerful Possible Agent as our 'adversary' to be 'tamed'." Hence the continual use of AIXI?

Comment author: jessicat 05 June 2015 02:16:51AM 2 points [-]

Yes, something like that, although I don't usually think of it as an adversary. Mainly it's so I can ask questions like "how could a FAI model its operator so that it can infer the operator's values from their behavior?" without getting hung up on the exact representation of the model or how the model is found. We don't have any solution to this problem, even if we had access to a probabilistic program induction black box, so it would be silly to impose the additional restriction that we can't give the black box any induction problems that are too hard.

That said, bounded algorithms can be useful as inspiration, even for unbounded problems. For example, I'm currently looking at ways you could use probabilistic programs with nested queries to model Vingean reflection.

Comment author: [deleted] 04 June 2015 03:12:38PM 0 points [-]

This is often what MIRI's "unbounded solutions" research is about: finding ways you could solve FAI if you had a hypercomputer.

Sorry to criticize out of the blue, but I think that's a very bad idea. To wit, "Assume a contradiction, prove False, and ex falso quodlibet." If you start by assuming a hypercomputer and reason mathematically from there, I think you'll mostly derive paradox theorems and contradictions.

Comment author: jessicat 04 June 2015 11:34:35PM 5 points [-]

I should be specific that the kinds of results we want to get are those where you could, in principle, use a very powerful computer instead of a hypercomputer. Roughly, the unbounded algorithm should be a limit of bounded algorithms. The kinds of allowed operations I am thinking about include:

  • Solomonoff induction
  • optimizing an arbitrary function
  • evaluating an arbitrary probabilistic program
  • finding a proof of X if one exists
  • solving an infinite system of equations that is guaranteed to have a solution

In all these cases, you can get arbitrarily good approximations using bounded algorithms, although they might require a very large amount of computation power. I don't think things like this would lead to contradictions if you did them correctly.

Comment author: ozziegooen 04 June 2015 05:19:54AM *  3 points [-]

[Edited: replaced Gremaining with Fremaining, which is what I originally meant]

Thanks for the comment jessicat! I haven't read those posts yet, will do more research on reducing FAI to an AGI problem.

A few responses & clarifications:

Our framework assumes the FAI research would happen before AGI creation. If we can research how to reduce FAI to an AGI problem in a way that would reliably make a future AGI friendly, then that amount of research would be our variable Fremaining. If that is quite easy to do, then that's fantastic; an AI venture would have an easy time, and the leakage ratio would be low enough to not have to worry about. Additional required capabilities that we'll find out we need would be added to Fremaining.

"I think the post fails to accurately model these difficulties." -> This post doesn't attempt to model the individual challenges to understand how large Fremaining actually is. That's probably a more important question than what we addressed, but one for a different model.

"The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist." -> This paper definitely advocates for AGI researchers to develop FAI research while not publishing much AGI research. I agree that some internal AGI research will probably be necessary, but hope that it won't be a whole lot. If the tools to create an AGI were figured out, even if they were kept secret by an FAI research group, I would be very scared. Those would be the most important and dangerous secrets of all time, and I doubt they could be kept secret for very long (20 years max?)

"In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year)."

-> This paper purposefully didn't model research effort, but rather, abstract units of research significance. "the numbers of rg and rf don't perfectly correlate with the difficulty to reach them. It may be that we have diminishing marginal returns with our current levels of rg, so similar levels of rf will be easier to reach."

A model that would also take into account the effort required would require a few more assumptions and additional complexity. I prefer to start simple and work from there, so we at least know what people do agree on before adding additional complexity.

Comment author: jessicat 04 June 2015 06:01:12AM 3 points [-]

Thanks for the detailed response! I do think the framework can still work with my assumptions. The way I would model it would be something like:

  1. In the first stage, we have G->Fremaining (the research to an AGI->FAI solution) and Gremaining (the research to enough AGI for UFAI). I expect G->Fremaining < Gremaining, and a relatively low leakage ratio.
  2. after we have AGI->FAI, we have Fremaining (the research for the AGI to input to the AGI->FAI) and Gremaning (the research to enough AGI for UFAI). I expect Fremaining > Gremaining, and furthermore I expect the leakage ratio to be high enough that we are practically guaranteed to have enough AGI capabilities for UFAI before FAI (though I don't know how long before). Hence the strategic importance of developing AGI capabilities in secret, and not having them lying around for too long in too many hands. I don't really see a way of avoiding this: the alternative is to have enough research to create FAI but not a paperclip maximizer, which seems implausible (though it would be really nice if we could get this state!).

Also, it seems I had misinterpreted the part about rg and rf, sorry about that!

Comment author: jessicat 03 June 2015 10:02:59PM *  13 points [-]

This model seems quite a bit different from mine, which is that FAI research is about reducing FAI to an AGI problem, and solving AGI takes more work than doing this reduction.

More concretely, consider a proposal such as Paul's reflective automated philosophy method, which might be able to be implemented using epsiodic reinforcement learning. This proposal has problems, and it's not clear that it works -- but if it did, then it would have reduced FAI to a reinforcement learning problem. Presumably, any implementations of this proposal would benefit from any reinforcement learning advances in the AGI field.

Of course, even if we a proposal like this works, it might require better or different AGI capabilities from UFAI projects. I expect this to be true for black-box FAI solutions such as Paul's. This presents additional strategic difficulties. However, I think the post fails to accurately model these difficulties. The right answer here is to get AGI researchers to develop (and not publish anything about) enough AGI capabilities for FAI without running a UFAI in the meantime, even though the capabilities to run it exist.

Assuming that this reflective automated philosophy system doesn't work, it could still be the case that there is a different reduction from FAI to AGI that can be created through armchair technical philosophy. This is often what MIRI's "unbounded solutions" research is about: finding ways you could solve FAI if you had a hypercomputer. Once you find a solution like this, it might be possible to define it in terms of AGI capabilities instead of hypercomputation, and at that point FAI would be reduced to an AGI problem. We haven't put enough work into this problem to know that a reduction couldn't be created in, say, 20 years by 20 highly competent mathematician-philosophers.

In the most pessimistic case (which I don't think is too likely), the task of reducing FAI to an AGI problem is significantly harder than creating AGI. In this case, the model in the post seems to be mostly accurate, except that it neglects the fact that serial advances might be important (so we get diminishing marginal progress towards FAI or AGI per additional researcher in a given year).

Comment author: [deleted] 07 May 2015 03:06:06PM 2 points [-]

I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.

I think it would not go amiss to read Vikash Masinghka's PhD thesis and the open-world generation paper to see a helpful probabilistic programming approach to these issues. In summary: we can use probabilistic programming to learn the models we need, use conditioning/query to condition the models on the constraints we intend to enforce, and then sample the resulting distributions to generate "actions" which are very likely to be "good enough" and very unlikely to be "bad". We sample instead of inferring the maximum-a-posteriori action or expected action precisely because as part of the Bayesian modelling process we assume that the peak of our probability density does not necessary correspond to an in-the-world optimum.

Comment author: jessicat 07 May 2015 05:24:39PM *  1 point [-]

I agree that choosing an action randomly (with higher probability for good actions) is a good way to create a fuzzy satisficer. Do you have any insights into how to:

  1. create queries for planning that don't suffer from "wishful thinking", with or without nested queries. Basically the problem is that if I want an action conditioned on receiving a high utility (e.g. we have a factor on the expected utility node U equal to e^(alpha * U) ), then we are likely to choose high-variance actions while inferring that the rest of the model works out such that these actions return high utilities

  2. extend this to sequential planning without nested nested nested nested nested nested queries

Comment author: [deleted] 07 May 2015 03:15:33PM 1 point [-]

I don't have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners

Well, all real reasoners are bounded reasoners. If you just don't care about computational time bounds, you can run the Ordered Optimal Problem Solver as the initial input program to a Goedel Machine, and out pops your AI (in 200 trillion years, of course)!

it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind. Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of "human" to the free-parameter space of the evaluation model.

Comment author: jessicat 07 May 2015 05:18:14PM 3 points [-]

I would tend to say that you should be training a conceptual map of the world before you install anything like action-taking capability or a goal system of any kind.

This seems like a sane thing to do. If this didn't work, it would probably be because either

  1. lack of conceptual convergence and human understandability; this seems somewhat likely and is probably the most important unknown

  2. our conceptual representations are only efficient for talking about things we care about because we care about these things; a "neutral" standard such as resource-bounded Solomonoff induction will horribly learn things we care about for "no free lunch" reasons. I find this plausible but not too likely (it seems like it ought to be possible to "bootstrap" an importance metric for deciding where in the concept space to allocate resources).

  3. we need the system to have a goal system in order to self-improve to the point of creating this conceptual map. I find this a little likely (this is basically the question of whether we can create something that manages to self-improve without needing goals; it is related to low impact).

Of course, I also tend to say that you should just use a debugged (ie: cured of systematic faults) model of human evaluative processes for your goal system, and then use actual human evaluations to train the free parameters, and then set up learning feedback from the learned concept of "human" to the free-parameter space of the evaluation model.

I agree that this is a good idea. It seems like the main problem here is that we need some sort of "skeleton" of a normative human model whose parts can be filled in empirically, and which will infer the right goals after enough training.

Comment author: [deleted] 06 May 2015 02:35:12PM *  3 points [-]

We can do something like list a bunch of examples, have humans label them, and then find the lowest Kolomogorov complexity concept that agrees with human judgments in, say, 90% of cases.

Regularization is already a part of training any good classifier.

I'm not sure if this is what you mean by "normatively correct", but it seems like a plausible concept that multiple concept learning algorithms might converge on.

Roughly speaking, I mean optimizing for the causal-predictive success of a generative model, given not only a training set but a "level of abstraction" (something like tagging the training features with lower-level concepts, type-checking for feature data) and a "context" (ie: which assumptions are being conditioned-on when learning the model).

Again, roughly speaking, humans tend to make pretty blatant categorization errors (ie: magical categories, non-natural hypotheses, etc.), but we also are doing causal modelling in the first place, so we accept fully-naturalized causal models as the correct way to handle concepts. However, we also handle reality on multiple levels of abstraction: we can think in chairs and raw materials and chemical treatments and molecular physics, all of which are entirely real. For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.

Basically, I want my "FAI" to be built out of algorithms that can dissolve questions and do other forms of conceptual analysis without turning Straw Vulcan and saying, "Because 'goodness' dissolves into these other things when I naturalize it, it can't be real!". Because once I get that kind of conceptual understanding, it really does get a lot closer to being a problem of just telling the agent to optimize for "goodness" and trusting its conceptual inference to work out what I mean by that.

Sorry for rambling, but I think I need to do more cog-sci reading to clarify my own thoughts here.

Comment author: jessicat 07 May 2015 08:36:16AM *  2 points [-]

Regularization is already a part of training any good classifier.

A technical point here: we don't learn a raw classifier, because that would just learn human judgments. In order to allow the system to disagree with a human, we need to use some metric other than "is simple and assigns high probability to human judgments".

For something like FAI, I want a concept-learning algorithm that will look at the world in this naturalized, causal way (which is what normal modelling shoots for!), and that will model correctly at any level of abstraction or under any available set of features, and will be able to map between these levels as the human mind can.

I totally agree that a good understanding of multi-level models is important for understanding FAI concept spaces. I don't have a good understanding of multi-level maps; we can definitely see them as useful constructs for bounded reasoners, but it seems difficult to integrate higher levels into the goal system without deciding things about the high-level map a priori so you can define goals relative to this.

Comment author: Richard_Loosemore 06 May 2015 04:08:48PM 7 points [-]

With all of the above in mind, a quick survey of some of the things that you just said, with my explanation for why each one would not (or probably would not) be as much of an issue as you think:

As humans, we have a good idea of what "giving choices to people" vs. "forcing them to do something" looks like. This concept would need to resolve some edge cases, such as putting psychological manipulation in the "forceful" category (even though it can be done with only text).

For a massive-weak-constraint system, psychological manipulation would be automatically understood to be in the forceful category, because the concept of "psychological manipulation" is defined by a cluster of features that involve intentional deception, and since the "friendliness" concept would ALSO involve a cluster of weak constraints, it would include the extended idea of intentional deception. It would have to, because intentional deception is connected to doing harm, which is connected with unfriendly, etc.

Conclusion: that is not really an "edge" case in the sense that someone has to explicitly remember to deal with it.

Very likely, the concept space will be very complicated and difficult for humans to understand.

We will not need to 'understand' the AGI's concept space too much, if we are both using massive weak constraints, with convergent semantics. This point I addressed in more detail already.

This seems pretty similar to Paul's idea of a black-box human in the counterfactual loop. I think this is probably a good idea, but the two problems here are (1) setting up this (possibly counterfactual) interaction in a way that it approves a large class of good plans and rejects almost all bad plans (see the next section), and (2) having a good way to predict the outcome of this interaction usually without actually performing it. While we could say that (2) will be solved by virtue of the superintelligence being a superintelligence, in practice we'll probably get AGI before we get uploads, so we'll need some sort of semi-reliable way to predict humans without actually simulating them. Additionally, the AI might need to self-improve to be anywhere smart enough to consider this complex hypothetical, and so we'll need some kind of low-impact self-improvement system. Again, I think this is probably a good idea, but there are quite a lot of issues with it, and we might need to do something different in practice. Paul has written about problems with black-box approaches based on predicting counterfactual humans here and here. I think it's a good idea to develop both black-box solutions and white-box solutions, so we are not over-reliant on the assumptions involved in one or the other.

What you are talking about here is the idea of simulating a human to predict their response. Now, humans already do this in a massive way, and they do not do it by making gigantic simulations, but just by doing simple modeling. And, crucially, they rely on the masive-weak-constraints-with-convergent-semantics (you can see now why I need to coin the concise term "Swarm Relaxation") between the self and other minds to keep the problem manageable.

That particular idea - of predicting human response - was not critical to the argument that followed, however.

What language will people's questions about the plans be in? If it's a natural language, then the AI must be able to translate its concept space into the human concept space, and we have to solve a FAI-complete problem to do this.

No, we would not have to solve a FAI-complete problem to do it. We will be developing the AGI from a baby state up to adulthood, keeping its motivation system in sync all the way up, and looking for deviations. So, in other words, we would not need to FIRST build the AGI (with potentially dangerous alen semantics), THEN do a translation between the two semantic systems, THEN go back and use the translation to reconstruct the motivation system of the AGI to make sure it is safe.

Much more could be said about the process of "growing" and "monitoring" the AGI during the development period, but suffice it to say that this process is extremely different if you have a Swarm Relaxation system vs. a logical system of the sort your words imply.

We should also definitely be wary of a decision rule of the form "find a plan that, if explained to humans, would cause humans to say they understand it".

This hits the nail on the head. This comes under the heading of a strong constraint, or a point-source failure mode. The motivation system of a Swarm Relaxation system would not contain "decision rules" of that sort, precisely because they could have large, divergent effects on the behavior. If motivation is, instead, governed by large numbers of weak constraints, and in this case your decision rule would be seen to be a type of deliberate deception, or manipulation, of the humans. And that contradicts a vast array of constraints that are consistent with friendliness.

Again, it's quite plausible that the AI's concept space will contain some kind of concept that distinguishes between these different types of optimization; however, humans will need to understand the AI's concept space in order to pinpoint this concept so it can be integrated into the AI's decision rule.

Same as previous: with a design that does not use decision rules that are prone to point-source failure modes, the issue evaporates.

To summarize: much depends on an understanding of the concept of a weak constraint system. There are no really good readings I can send you (I know I should write one), but you can take a look at the introductory chapter of McClelland and Rumelhart that I gave in the references to the paper.

Also, there is a more recent reference to this concept, from an unexpected source. Yann LeCun has been giving some lectures on Deep Learning in which he came up with a phrase that could have been used two decades ago to describe exactly the sort of behavior to be expected from SA systems. He titles his lecture "The Unreasonable Effectiveness of Deep Learning". That is a wonderful way to express it: swarm relaxation systems do not have to work (there really is no math that can tell you that they should be as good as they are), but they do. They are "unreasonably effective".

There is a very deep truth buried in that phrase, and a lot of what I have to say about SA is encapsulated in it.

Comment author: jessicat 07 May 2015 08:27:40AM *  4 points [-]

Okay, thanks a lot for the detailed response. I'll explain a bit about where I'm coming from with understading the concept learning problem:

  • I typically think of concepts as probabilistic programs eventually bottoming out in sense data. So we have some "language" with a "library" of concepts (probabilistic generative models) that can be combined to create new concepts, and combinations of concepts are used to explain complex sensory data (for example, we might compose different generative models at different levels to explain a picture of a scene). We can (in theory) use probabilistic program induction to have uncertainty about how different concepts are combined. This seems like a type of swarm relaxation, due to probabilistic constraints being fuzzy. I briefly skimmed through the McClellard chapter and it seems to mesh well with my understanding of probabilistic programming.
  • But, when thinking about how to create friendly AI, I typically use the very conservative assumptions of statistical learning theory, which give us guarantees against certain kinds of overfitting but no guarantee of proper behavior on novel edge cases. Statistical learning theory is certainly too pessimistic, but there isn't any less pessimistic model for what concepts we expect to learn that I trust. While the view of concepts as probabilistic programs in the previous bullet point implies properties of the system other than those implied by statistical learning theory, I don't actually have good formal models of these, so I end up using statistical learning theory.

I do think that figuring out if we can get more optimistic (but still justified) assumptions is good. You mention empirical experience with swarm relaxation as a possible way of gaining confidence that it is learning concepts correctly. Now that I think about it, bad handling of novel edge cases might be a form of "meta-overfitting", and perhaps we can gain confidence in a system's ability to deal with context shifts by having it go through a series of context shifts well without overfitting. This is the sort of thing that might work, and more research into whether it does is valuable, but it still seems worth preparing for the case where it doesn't.

Anyway, thanks for giving me some good things to think about. I think I see how a lot of our disagreements mostly come down to how much convergence we expect from different concept learning systems. For example, if "psychological manipulation" is in some sense a natural category, then of course it can be added as a weak (or even strong) constraint on the system.
I'll probably think about this a lot more and eventually write up something explaining reasons why we might or might not expect to get convergent concepts from different systems, and the degree to which this changes based on how value-laden a concept is.

There is a lot of talk that can be given about how that complex union takes place, but here is one very important takeaway: it can always be made to happen in such a way that there will not, in the future, be any Gotcha cases (those where you thought you did completely merge the two concepts, but where you suddenly find a peculiar situation where you got it disastrously wrong). The reason why you won't get any Gotcha cases is that the concepts are defined by large numbers of weak constraints, and no strong constraints -- in such systems, the effect of smaller and smaller numbers of concepts can be guaranteed to converge to zero. (This happens for the same reason that the effect of smaller and smaller sub-populations of the molecules in a gas will converge to zero as the population sizes go to zero).

I didn't really understand a lot of what you said here. My current model is something like "if a concept is defined by lots of weak constraints, then lots of these constraints have to go wrong at once for the concept to go wrong, and we think this is unlikely due to induction and some kind of independence/uncorrelatedness assumption"; is this correct? If this is the right understanding, I think I have low confidence that errors in each weak constraint are in fact not strongly correlated with each other.

View more: Prev | Next