V_V comments on MIRI's Approach - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (59)
No, not really. In fact, I expect that given the right way of modelling, formal verification of learning systems up to epsilon-delta bounds (in the style of PAC-learning, for instance) should be quite doable. Why? Because, as mentioned regarding PAC learning, it's the existing foundation for machine learning.
I do agree that this post reflects an "Old Computer Science" worldview, but to be fair, that's not Nate's personal fault, or MIRI's organizational fault. It's the fault of the entire subfield of AGI that still has not bloody learned the basic lessons of statistical machine learning: that real cognition just is about probably approximately correct statistical modelling.
So as you mention, for instance, there's an immense amount of foundational theory behind modern neural networks. Hell, if I could find the paper showing that deep networks form a "funnel" in the model's free-energy landscape - where local minima are concentrated in that funnel and all yield more-or-less as-good test error, while the global minimum reliably overfits - I'd be posting the link myself.
The problem with deep neural networks is not that they lack theoretical foundations. It's that most of the people going "WOW SO COOL" at deep neural networks can't be bothered to understand the theoretical foundations. The "deep learning cabal" of researchers (out of Toronto, IIRC), and the Switzerland Cabal of Schmidhuber-Hutter-and-Legg fame, all know damn well what they are doing on an analytical level.
(And to cheer for my favorite approach, the probabilistic programming cabal has even more analytical backing, since they can throw Bayesian statistics, traditional machine learning, and programming-languages theory at their problems.)
Sure, it does all require an unusual breadth of background knowledge, but they, this is how real science proceeds, people: shut up and read the textbooks and literature. Sorry, but if we (as in, this community) go around claiming that important problems can be tackled without background knowledge and active literature, or with as little as the "AGI" field seems to generate, then we are not being instrumentally rational. Period. Shut up and PhD.
Because that requires a way to state and demonstrate safety properties such that safety guarantees obtained with small amounts of resources remain strong when the system gets more resources. More on that below.
You're not really being fair to Nate here, but let's be charitable to you: this is fundamentally a dispute between the heuristics-and-biases school of thought about cognition and the bounded/resource-rational school of thought.
In the heuristics-and-biases school of thought, the human mind uses heuristics or biases when it believes it doesn't have the computing power on hand to use generally intelligent inference, or sometimes the general intelligence is even construed as an emergent computational behavior of an array of heuristics and biases that happened to get thrown together by evolution in the right way. Computationally, this is saying, "When we have enough resources that only asymptotic complexity matters, we use the Old Computer Science way of just running the damn algorithm that implements optimal behavior and optimal asymptotic complexity." Trying to extend this approach into statistical inference gets you basic Bayesianism and AIXI, which appear to have nice "optimality" guarantees, but are computationally intractable and are only optimal up to the training data you give them.
In the bounded-rationality school of thought, computing power is considered a strictly (not asymptotically) finite resource, which must be exploited in an optimal way. I've seen a very nice paper on how thermodynamics actually yields a formal theory for how to do this. Cognition is then analyzed as a algorithmic ways to tractably build and evaluate models that deal well with the data. This approach yields increasingly fruitful analyses of such cognitive activities as causal learning, concept learning, and planning in arbitrary environments as probabilistic inference enriched with causal/logical structure.
In terms of LW posts, the former alternative is embodied in Eliezer's Sequences, and the latter in jacob_cannell's post on The Brain as a Universal Learning Machine and my book review of Plato's Camera.
The kinds of steps needed to get both "AI" as such, and "Friendliness" as such, are substantively different in the "possible worlds" where the two different schools of thought apply. Or, perhaps, both are true in certain ways, and what we're really talking about is just two different ways of building minds. Personally, I think the one true distinction is that Calude's work on measuring nonhalting computations gives us a definitive way to deal with the kinds of self-reference scenarios that Old AGI's "any finite computation" approach generates paradoxes in.
But time will tell and I am not a PhD, so everything I say should be taken with substantial sprinklings of salt. On the other hand, to wit, while you shouldn't think for a second that I am one of them, I am certainly on the side of the PhDs.
(Nate: sorry for squabbling on your post. All these sorts of qualms with the research program were things I was going to bring up in person, in a much more constructive way. Still looking forward to meeting you in September!)
It's not obvious to me that the Church programming language and execution model is based on bounded rationality theory.
I mean, the idea of using MCMC to sample the executions of probabilistic programs is certainly neat, and you can trade off bias with computing time by varying the burn-in and samples lag parameters, but this trade-off is not provably optimal.
If I understand correctly, provably optimal bounded rationality is marred by unsolved theoretical questions such as the one-way functions conjecture and P != NP. Even assuming that these conjectures are true, the fact that we can't prove them implies that we can't often prove anything interesting about the optimality of many AI algorithms.
That's because it's not. The probabilistic models of cognition (title drop!) implemented using Church tend to deal with what the authors call the resource-rational school of thought about cognition.
The paper about it that I read was actually using statistical thermodynamics to form its theory of bounded-optimal inference. These conjectures are irrelevant, in that we would be building reasoning systems that would make use of their own knowledge about these facts, such as it might be.
Sounds interesting, do you have a reference?
Sure. If you know statistical mechanics/thermodynamics, I'd be happy to hear your view on the paper, since I don't know those fields.
Thanks, I'll read it, though I'm not an expert in statistical mechanics and thermodynamics.