Comment author: jsteinhardt 25 October 2010 10:18:53PM 2 points [-]

Model selection is definitely one of the biggest conceptual problems in GAI right now (I would say that planning once you have a model is of comparable importance / difficulty). I think the way to solve this sort of problem is by having humans carefully pick a really good model (flexible enough to capture even unexpected situations while still structured enough to make useful predictions). Even with SVMs you are implicitly assuming some sort of structure on the data, because you usually transform your inputs into some higher-dimensional space consisting of what you see as useful features in the data.

Even though picking the model is the hard part, using Bayes by default seems like a good idea because it is the only general method I know of for combining all of my assumptions without having to make additional arbitrary choices about how everything should fit together. If there are other methods, I would be interested in learning about them.

What would the "really good model" for a GAI look like? Ideally it should capture our intuitive notions of what sorts of things go on in the world without imposing constraints that we don't want. Examples of these intuitions: superficially similar objects tend to come from the same generative process (so if A and B are similar in ways X and Y, and C is similar to both A and B in way X, then we would expect C to be similar to A and B in way Y, as well); temporal locality and spatial locality underly many types of causality (so if we are trying to infer an input-output relationship, it should be highly correlated over inputs that are close in space/time); and as a more concrete example, linear momentum tends to persist over short time scales. A lot of work has been done in the past decade on formalizing such intuitions, leading to nonparametric models such as Dirichlet processes and Gaussian processes. See for instance David Blei's class on Bayesian nonparametrics (http://www.cs.princeton.edu/courses/archive/fall07/cos597C/index.html) or Michael Jordan's tutorial on Dirichlet processes (http://www.cs.berkeley.edu/~jordan/papers/pearl-festschrift.pdf).

I'm beginning to think that a top-level post on how Bayes is actually used in machine learning would be helpful. Perhaps I will make on when I have a bit more time. Also, does anyone happen to know how to collapse URLs in posts (e.g. the equivalent of <a href=...>test </a> in HTML).

Comment author: JohnDavidBustard 26 October 2010 02:50:41PM 1 point [-]

A high level post on its use would be very interesting.

I think my main criticism of the Bayes approach is that it leads to the kind of work you are suggesting i.e. have a person construct a model and then have a machine calculate its parameters.

I think that much of what we value in intelligent people is their ability to form the model themselves. By focusing on parameter updating we aren't developing the AI techniques necessary for intelligent behavior. In addition, because correct updating does not guarantee good performance (because the model properties dominate) then we will always have to judge methods based on experimental results.

Because we always come back to experimental results, whatever general AI strategy we develop its structure is more likely to be one that searches for new ways to learn (with bayesian model updating and SVMs as examples) and validates these strategies using experimental data (replicating the behaviour of the AI field as a whole).

I find it useful to think about how people solve problems and examine the huge gulf between specific learning techniques and these approaches. For example, to replicate a Bayesian AI researcher an AI needs to take a small amount of data, an incomplete informal model of the process that generates it (e.g. based on informal metaphors of physical processes the author is familiar with) and then find a way of formalising this informal model (so that its behaviour under all conditions can be calculated) and possibly doing some theorem proving to investigate properties of the model. They then apply potentially standard techniques to determine the models parameters and judge its worth based on experiment (potentially repeating the whole process if it doesn't work).

By focusing on Bayesian approaches we aren't developing techniques that can replicate these kinds of lateral and creative thinking behaviour. Saying there is only one valid form of inference is absurd because it doesn't address these problems.

I feel that trying to force our problems to suit our tools is unlikely to make much progress. For example, unless we can model (and therefore largely solve) all of the problems we want an AI to address we can't create a "Really Good Model".

Rather than manually developing formalisations of specific forms of similarity we need an algorithm to learn different types of similarity and then construct the formalisation itself (or not as I don't think we actually formalise our notions of similarity and yet can still solve problems).

Automated theorem proving is a good example where the problems are well defined yet unique, so any algorithm that can construct proofs needs to see meta patterns in other proofs and apply them. This brings home the difficulty of identifying what it means for things to be similar and also emphasises the incompleteness of a probabilistic approach: the proof that the AI is trying to construct has never been encountered before, in order for it to benefit from experience it needs to invent a type of similarity to map the current problem to the past.

Comment author: David_Allen 20 October 2010 05:09:08PM *  4 points [-]

I'm a MWI cynic*, so here is my approach.

There is an interpretation of quantum mechanics called the many-world interpretation that rejects the wave-function collapse of the classical interpretation.

This interpretation leads some people to the belief that all possible alternative histories are real. Everything that can happen does happen. They assume that if you die in this world, you would also continue living on in another world. Since there must always be a chance you won't die, then there must be a world in which you live forever.

The problem with this belief is that at best it applies only for simple quantum systems, and generally not to events as large and complex as a person's death. To avoid death, a very large number of quantum level alternatives have to be simultaneously selected for. The probability of this simultaneous selection will be effectively zero.

In MWI terms this means that you die in all worlds, except the impossible ones.

Value the life you have and don't depend on quantum immortality.

* I like the MWI on lack of wave-function collapse and on quantum decoherence, but I don't think that the idea of separate worlds is necessary.

Comment author: JohnDavidBustard 25 October 2010 03:25:20PM 0 points [-]

Eh not impossible... just very improbable (in a given world) and certain across all worlds.

I would have thought the more conventional explanation is that the other versions are not actually you (just very like you). This sounds like the issue of only economists acting in the way that economists model people. I would suspect that only people who fixate on such matters would confuse a copy with themselves.

I suspect that people who are vulnerable to these ideas leading to suicide are in fact generally vulnerable to suicide. There are lots of better reasons to kill yourself that most people ignore. If you think you're at risk of this I recommend you seek therapy, thought experiments should not have such drastic effects on your actions.

Comment author: jsteinhardt 25 October 2010 02:20:14AM *  1 point [-]

Bayesian approaches tend to be more powerful than other statistical techniques in situations where there is a relatively limited supply of data. This is because Bayesian approaches, due to being model-based, tend to have a richer structure that allows it to take advantage of more of the structure of the data; a second reason is because Bayes allows for the explicit integration of prior assumptions and is therefore usually a more aggressive form of inference than most frequentist methods.

I tried to find a good paper demonstrating this (called "learning from one example"), unfortunately I only came across this PhD thesis --- http://www.cs.umass.edu/~elm/papers/thesis.pdf , although there is certainly a lot of work being done on generalizing from one, or a small number of, examples.

Comment author: JohnDavidBustard 25 October 2010 02:41:51PM *  2 points [-]

Thanks for your reference it is good to get down to some more specific examples.

Most AI techniques are model based by necessity: it is not possible to generalise from samples unless the sample is used to inform the shape of a model which then determines the properties of other samples. In effect, AI is model fitting. Bayesian techniques are one scheme for updating a model from data. I call them incomplete because they leave a lot of the intelligence in the hands of the user.

For example, in the thesis reference the author designs a model of transformations on handwritten letters that (thanks to the authors intelligence) is similar to the set of transformations applied to numeric characters. The primary reason why the technique is effective is because the author has constructed a good transformation. The only way to determine if this is true is through experimentation, I doubt the bayesian updating is contributing significantly to the results, if another scheme such as an SVM was chosen I would expect it to produce similar recognition results.

The point is that the legitimacy or otherwise of the model parameter updating scheme is relatively insignificant in comparison to the difficulty in selecting a good model in the first place. As far as I am aware, as there are a potentially infinite set of models, Bayesian techniques cannot be applied to select between them, leaving the real intelligence being provided by the user in the form of the model. In contrast, SVMs are an attempt to construct experimentally useful models from samples and so are much closer to being intelligent in the sense of being able to produce good results with limited human interaction. However, neither technique addresses the fundamental difficulty of replicating the intelligence used by the author in creating the transformation in the first place. Fixating on a particular approach to model updating when model selection is not addressed is to miss the point, it may be meaningful for gambling problems but for real AI challenges the difference it makes appears to be irrelevant to actual performance.

I would love to discuss what the real challenges of GAI are and explore ways of addressing them, but often the posts on LW seem to focus on seemingly obscure game theory or gambling based problems which don't appear to be bringing us closer to a real solution. If the model selection problem can't be addressed then there is no way to guarantee that whatever we want an AI to value, it won't create an internal model that finds something similar (like paperclips) and decides to optimise for that instead.

Silently down voting criticism of Bayesian probability without justification is not helpful either.

Comment author: ata 23 October 2010 05:05:26AM *  11 points [-]

I think the fundamental insight of Bayesianism is that Bayes' Theorem is the law of inference, not (just) a normative law but a descriptive law — that frequentist methods and other statistical algorithms that make no mention of Bayes aren't cleverly circumventing it, they're implicitly using it. Any time you use some data to generate a belief about some proposition, if you use a method whose output is systematically correlated with reality at all, then you are using Bayes, just with certain assumptions and simplifications mixed in.

The failing of frequentism is not in the specific methods it uses — it is perfectly true that we need simplified methods in order to do much useful inference — but in its claim of "objectivity" that really consists of treating its assumptions and simplifications as though they don't exist, and in its reliance on experimenters' intuition in deciding which methods should be used (considering that different methods make different assumptions that lead to different results). Frequentist methods aren't (all) bad, frequentist epistemology is.

If I remember correctly, it is perfectly possible to create Bayesian formulations of most frequentist methods; of course, they will often still talk about things that Bayesians don't usually care about, like P-values, but they will nevertheless reveal the deductively-valid Bayes-structure of the path from your data to that result. Revealing frequentist methods' hidden structure is important because it lets us understand why they work — when they do work — and it lets us predict when they won't be as useful.

Comment author: JohnDavidBustard 24 October 2010 09:56:34PM 2 points [-]

From what I understand, in order to apply Bayesian approaches in practical situations it is necessary to make assumptions which have no formal justification, such as the distribution of priors or the local similarity of analogue measures (so that similar but not exact predictions can be informative). This changes the problem without necessarily solving it. In addition, it doesn't address the issue of AI problems not based on repeated experience, e.g. automated theorem proving. The advantage of statistical approaches such as SVMs is that they produce practically beneficial results with limited parameters. With parameter search techniques they can achieve fully automated predictions that often have good experimental results. Regardless of whether Bayesianism is the law of inference, if such approaches cannot be applied automatically they are fundamentally incomplete and only as valid as the assumptions they are used with. If Bayesian approaches carry a fundamental advantage over these techniques why is this not reflected in their practical performance on real world AI problems such as face recognition?

Oh and bring on the down votes you theory loving zealots :)

Comment author: multifoliaterose 16 October 2010 06:35:20PM *  3 points [-]

Upvoted for a thoughtful comment.

  1. I don't know anything about statistical learning theory.

  2. I don't know what kinds of probability you're interested in learning, but would recommend Concrete Mathematics: A Foundation for Computer Science by Graham, Knuth and Patashnik and William Feller's two volume set An Introduction to Probability Theory and Its Applications.

  3. I would second the recommendation of the Princeton Companion to Mathematics but would also warn it does not go into enough depth for one to get an accurate understanding of what many of the subjects discussed therein are about. This is understandable given space constraints.

  4. The edifice of pure mathematics is vast and the number of people alive who could give a good overview of existing mathematics as a whole is tiny and possibly zero.

  5. As a matter of practice, much of the information about how mathematicians learn and think about a given subject is never recorded. See this comment by SarahC and Bill Thurston's MathOverflow question Thinking and Explaining.

  6. On average I've found reading math books that adopt a historical approach to the material therein to be considerably more useful than reading math books that adopt an axiomatic approach to the material therein.

  7. Based on my (limited) impression of applied math, it's not uncommon for people to use advanced mathematical techniques to solve a practical problem because doing so makes for a good marketable story rather than because the advanced mathematical techniques are genuinely useful to analyzing the practical problem at hand.

  8. There is an issue of a high noise-to-signal ratio in mathematics textbooks corresponding to the fact that many authors of textbooks don't have the depth of understanding of the creators of the theories that they're writing about and correspondingly do not emphasize the key points.

  9. Concerning your suspicion that "mathematics is as it is because it appeals to those who like puzzles, rather than necessarily providing profound insight into a problem" - there's great variability among mathematicians here. Two essays which discuss dichotomies which are not identical to the one that you draw but which I think you'll find peripherally relevant are Timothy Gowers' The Two Cultures of Mathematics and Freeman Dyson's Birds and Frogs.

  10. Those mathematicians who seek profound insight into problems often seek profound insight into problems within pure math rather than problems that arise in engineering.

  11. Looking at your website, you might find it useful to check out the Brown University Pattern Theory Group. I don't have any subject matter knowledge of what they do, but the group includes David Mumford who is of extremely high caliber, having earned a Fields Medal in the 1970's for his work on algebraic geometry.

  12. While I don't know enough to point you in the right direction to help you with your research, if you're interested in learning about pure math out of general intellectual curiosity then there are many books that I can recommend.

Comment author: JohnDavidBustard 16 October 2010 10:32:46PM 1 point [-]

Thank you very much for your great reply. I'll look into all of the links. Your comments have really inspired me in my exploration of mathematics. They remind me of the aspect of academia I find most surprising. How it can so often be ideological, defensive and secretive whilst also supporting those who sincerely, openly and fearlessly pursue the truth.

Comment author: multifoliaterose 16 October 2010 04:03:11PM *  1 point [-]

I agree with your remarks here and share your frustration. While books of the type that you're looking for are relatively uncommon; over the years I've amassed a list of ones that I've found very good. What subject(s) are you interested in learning? (N.B. There are large parts of math that I'm ignorant of - in particular I don't know almost anything about applied math and so may not be able to say anything useful - I just thought I'd ask in case I can help.)

Comment author: JohnDavidBustard 16 October 2010 04:58:57PM 1 point [-]

Thank you, my main goal at the moment is to get a handle on statistical learning approaches and probability. I hope to read Jaynes's book and the nature of statistical learning theory once I have some time to devote to them. however I would love to find an overview of mathematics. Particularly one which focuses on practical applications or problems. One of the other posts mentioned the Princeton companion to Mathematics and that sounds like a good start. I think what I would like is to read something that could explain why different fields of mathematics were important, and how I would concretely benefit from understanding them.

At the moment I have a general unease about my partial mathematical blindness, I understand the main mathematical ideas underlying the work in my own field (computer vision) and I'm pretty happy with the subjects in numerical recipes and some optimisation theory. I'm fairly sure that I don't need to know more, but it bothers me that I don't. At the same time I don't want to spend a lot of time wading through proofs that are unlikely to ever be relevant to me. I have also yet to find a concrete example in AI where an engineering approach with some relatively simple applied maths has been substantially weaker than an approach that requires advanced mathematical techniques, making me suspect that mathematics is as it is because it appeals to those who like puzzles, rather than necessarily providing profound insight into a problem. Although I'd love to be proved wrong on that point.

Comment author: whpearson 16 October 2010 12:27:16PM *  0 points [-]

is there a real difference between a world that satisfies what we want and directly altering what we want?

From an evolutionary point of view those things that manage to procreate will out compete those things that change themselves to not care about that and just wirehead.

So in non-singleton situations, alien encounters and any form of resource competition it matters whether you wirehead or not. Pleasure, in an evolved creature, can be seen as the giving (very poor) information on the map to the territory of future influence for the patterns that make up you.

Comment author: JohnDavidBustard 16 October 2010 03:37:03PM 0 points [-]

So, assuming survival is important, a solution that maximises survival plus wireheading would seem to solve that problem. Of course it may well just delay the inevitable heat death ending but if we choose to make that important, then sure, we can optimise for survival as well. I'm not sure that gets around the issue that any solution we produce (with or without optimisation for survival) is merely an elaborate way of satisfying our desires (in this case including the desire to continue to exist) and thus all FAI solutions are a form of wireheading.

Comment author: JohnDavidBustard 16 October 2010 10:44:45AM *  5 points [-]

One frustration I find with mathematics is that it is rarely presented like other ideas. For example, few books seem to explain why something is being explained prior to the explanation. They don't start with a problem, outline its solution provide the solution and then summarise this process at the end. They present one 'interesting' proof after another requiring a lot of faith and patience from the reader. Likewise they rarely include grounded examples within the proofs so that the underlying meaning of the terms can be maintained. It is as if the field is constructed so that it is in the form of puzzles rather than providing a sincere attempt to communicate idea as clearly as possible. Another analogy would be programming without the comments.

A book like Numerical Recipies, or possibly Jaynes book on probability, is the closest I've found so far. Has anyone encountered similar books?

Comment author: whpearson 15 October 2010 09:11:33PM *  1 point [-]

Try to figure out what maximizes this estimate method. It won't be anything you'd want implemented, it will be a wireheading stimulus.

I'm not sure that there is a verbal description of a possible world that is also a wirehead stimulus for me. There might be, which might be enough to discount this method.

And questions about possible worlds involve quantifies of data that a mere human can't handle.

True.

Comment author: JohnDavidBustard 16 October 2010 10:29:33AM 0 points [-]

I'm not sure I understand the distinction between an answer that we would want and a wireheading solution. Are not all solutions wireheading with an elaborate process to satisfy our status concerns. I.e. is there a real difference between a world that satisfies what we want and directly altering what we want? If the wire in question happens to be an elaborate social order rather than a direct connection why is that different? What possible goal could we want pursued other than the one which we want?

Comment author: Vladimir_Nesov 15 October 2010 10:58:11PM 2 points [-]

Interesting, if I understand correctly the idea is to find a theoretically correct basis for deciding on a course of action given existing knowledge and then to make this calculation efficient and then direct towards a formally defined objective.

Yes, but there is only one top-level objective, to do the right thing, so one doesn't need to define an objective separately from the goal system itself (and improving state of knowledge is just another thing one can do to accomplish the goal, so again not a separate issue).

FAI really stands for a method of efficient production of goodness, as we would want it produced, and there are many landmines on this path, in particular humanity in its current form doesn't seem to be able to retain its optimization goal in the long run, and the same applies to most obvious hacks that don't have explicit notions of preference, such as upload societies. It's not just a question of speed, but also of ability to retain the original goal after quadrillions of incompletely understood self-modifications.

Comment author: JohnDavidBustard 16 October 2010 01:01:46AM -1 points [-]

Ok, so how about this work around.

The current approach is to have a number of human intelligences continue to explore this problem until they enter a mental state C (for convinced they have the answer to FAI). The next stage is to implement it.

We have no other route to knowledge other than to use our internal sense of being convinced. I.e. no oracle to tell us if we are right or not.

So what if we formally define what this mental state C consists of and then construct a GAI which provably pursues only the objective of creating this state. The advantage being that we now have a means of judging our progress because we have a formally defined measurable criteria for success. (In fact this process is a valuable goal regardless of the use of AI but it now makes it possible to use AI techniques to solve it).

View more: Next