Wei_Dai comments on Nonparametric Ethics - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (56)
k-nearest-neighbors seems to be a reasonable method of interpolation, but what about extrapolation? I'm having trouble seeing how nonparametric methods can deal with regions far away from existing data points.
With very wide predictive distributions, if they are Bayesian nonparametric methods. See the 95% credible intervals (shaded pink) in Figure 2 on page 4, and in Figure 3 on page 5, of Mark Ebden's Gaussian Processes for Regression: A Quick Introduction.
(Carl Edward Rasmussen at Cambridge and Arman Melkumyan at the University of Sydney maintain sites with more links about Gaussian processes and Bayesian nonparametric regression. Also see Bayesian neural networks which can justifiably extrapolate sharper predictive distributions than Gaussian process priors can.)
See also Modeling human function learning with Gaussian processes, by Tom Griffiths, Chris Lucas, Joseph Jay Williams, and Michael Kalish, in NIPS 21:
The first author, Tom Griffiths, is the director of the Computational Cognitive Science Lab at UC Berkeley, and Lucas and Williams are graduate students there. The work of the Computational Cognitive Science Lab is very close to the mission of Less Wrong:
Griffiths's page recommends the foundations section of the lab publication list.
There are always "nearest" neighbors. You might wish for more data than you have, but you must make do with what you have.
If the data is actually linear or anything remotely resembling linear, then on distant points a linear model will do much better than a nearest-neighbor estimator. Whereas on nearby points, a nearest-neighbor estimator will do as well as a linear model given enough data. So on distant points nearest-neighbor only works if the curve is a particular shape (constant), while on near points it works so long as the curve has anything resembling local neighborhoods.
Well, yes. Nonparametric methods use similarity of neighbors. To predict that which has never been seen before - which is not, on its surface, like things seen before - you need modular and causal models of what's going on behind the scenes. At that point it's parametric or bust.
Your use of the terms parametric vs. nonparametric doesn't seem to be that used by people working in nonparametric Bayesian statistics, where the distinction is more like whether your statistical model has a fixed finite number of parameters or has no such bound. Methods such as Dirichlet processes, and its many variants (Hierarchical DP, HDP-HMM, etc), go beyond simple modeling of surface similarities using similarity of neighbours.
See, for example, this list of publications coauthored by Michael Jordan:
Parametric methods aren't any better at extrapolation. They are arguably worse, in that they make strong unjustified assumptions in regions with no data. The rule is "don't extrapolate if you can possibly avoid it". (And you avoid it by collecting relevant data.)
Parametric extrapolation actually works quite well in some cases. I'll cite a few examples that I'm familiar with:
I don't see any examples of nonparametric extrapolation that have similar success.
A major problem in Friendly AI is how to extrapolate human morality into transhuman realms. I don't know of any parametric approach to this problem that isn't without serious difficulties, but "nonparametric" doesn't really seem to help either. What does your advice "don't extrapolate if you can possibly avoid it" imply in this case? Pursue a non-AI path instead?
I'm in essential agreement with Wei here. Nonparametric extrapolation sounds like a contradiction to me (though I'm open to counterexamples).
The "nonparametric" part of the FAI process is where you capture a detailed picture of human psychology as a starting point for extrapolation, instead of trying to give the AI Four Great Moral Principles. Applying extrapolative processes like "reflect to obtain self-judgments" or "update for the AI's superior knowledge" to this picture is not particularly nonparametric - in a sense it's not an estimator at all, it's a constructor. But yes, the "extrapolation" part is definitely not a nonparametric extrapolation, I'm not really sure what that would mean.
But every extrapolation process starts with gathering detailed data points, so it confused me that you focused on "nonparametric" as a response to Robin's argument. If Robin is right, an FAI should discard most of the detailed picture of human psychology it captures during its extrapolation process as errors and end up with a few simple moral principles on its own.
Can you clarify which of the following positions you agree with?
(Presumably you don't agree with 2. I put it in just for completeness.)
2, certainly disagree. 1 vs. 3, don't know in advance. But an FAI should not discard its detailed psychology as "error"; an AI is not subject to most of the "error" that we are talking about here. It could, however, discard various conclusions as specifically erroneous after having actually judged the errors, which is not at all the sort of correction represented by using simple models or smoothed estimators.
I think connecting this to FAI is far-fetched. To talk technically about FAI you need to introduce more tools first.
I distinguish "extrapolation" in the sense of an extending an empirical regularity (as in Moore's law) from inferring a logical consequence of of well-supported theory (as in the black hole prediction). This is really a difference of degree, not kind, but for human science, this distinction is a good abstraction. For FAI, I'd say the implication is that an FAI's morality-predicting component should be a working model of human brains in action.
I think it implies that a Friendly sysop should not dream up a transhuman society and then try to reshape humanity into that society, but rather let us evolve at our own pace just attending to things that are relevant at each time.