V_V comments on MIRI's Approach - Less Wrong

34 Post author: So8res 30 July 2015 08:03PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (59)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 30 July 2015 03:38:41PM *  8 points [-]

It appears you are making the problem unnecessarily difficult.

No, not really. In fact, I expect that given the right way of modelling, formal verification of learning systems up to epsilon-delta bounds (in the style of PAC-learning, for instance) should be quite doable. Why? Because, as mentioned regarding PAC learning, it's the existing foundation for machine learning.

I do agree that this post reflects an "Old Computer Science" worldview, but to be fair, that's not Nate's personal fault, or MIRI's organizational fault. It's the fault of the entire subfield of AGI that still has not bloody learned the basic lessons of statistical machine learning: that real cognition just is about probably approximately correct statistical modelling.

So as you mention, for instance, there's an immense amount of foundational theory behind modern neural networks. Hell, if I could find the paper showing that deep networks form a "funnel" in the model's free-energy landscape - where local minima are concentrated in that funnel and all yield more-or-less as-good test error, while the global minimum reliably overfits - I'd be posting the link myself.

The problem with deep neural networks is not that they lack theoretical foundations. It's that most of the people going "WOW SO COOL" at deep neural networks can't be bothered to understand the theoretical foundations. The "deep learning cabal" of researchers (out of Toronto, IIRC), and the Switzerland Cabal of Schmidhuber-Hutter-and-Legg fame, all know damn well what they are doing on an analytical level.

(And to cheer for my favorite approach, the probabilistic programming cabal has even more analytical backing, since they can throw Bayesian statistics, traditional machine learning, and programming-languages theory at their problems.)

Sure, it does all require an unusual breadth of background knowledge, but they, this is how real science proceeds, people: shut up and read the textbooks and literature. Sorry, but if we (as in, this community) go around claiming that important problems can be tackled without background knowledge and active literature, or with as little as the "AGI" field seems to generate, then we are not being instrumentally rational. Period. Shut up and PhD.

Why not test safety long before the system is superintelligent?

Because that requires a way to state and demonstrate safety properties such that safety guarantees obtained with small amounts of resources remain strong when the system gets more resources. More on that below.

This again reflects the old 'hard' computer science worldview, and obsession with exact solutions.

If it seems really really really impossibly hard to solve a problem even with the 'simplification' of lots of computing power, perhaps the underlying assumptions are wrong. For example - perhaps using lots and lots of computing power makes the problem harder instead of easier.

You're not really being fair to Nate here, but let's be charitable to you: this is fundamentally a dispute between the heuristics-and-biases school of thought about cognition and the bounded/resource-rational school of thought.

In the heuristics-and-biases school of thought, the human mind uses heuristics or biases when it believes it doesn't have the computing power on hand to use generally intelligent inference, or sometimes the general intelligence is even construed as an emergent computational behavior of an array of heuristics and biases that happened to get thrown together by evolution in the right way. Computationally, this is saying, "When we have enough resources that only asymptotic complexity matters, we use the Old Computer Science way of just running the damn algorithm that implements optimal behavior and optimal asymptotic complexity." Trying to extend this approach into statistical inference gets you basic Bayesianism and AIXI, which appear to have nice "optimality" guarantees, but are computationally intractable and are only optimal up to the training data you give them.

In the bounded-rationality school of thought, computing power is considered a strictly (not asymptotically) finite resource, which must be exploited in an optimal way. I've seen a very nice paper on how thermodynamics actually yields a formal theory for how to do this. Cognition is then analyzed as a algorithmic ways to tractably build and evaluate models that deal well with the data. This approach yields increasingly fruitful analyses of such cognitive activities as causal learning, concept learning, and planning in arbitrary environments as probabilistic inference enriched with causal/logical structure.

In terms of LW posts, the former alternative is embodied in Eliezer's Sequences, and the latter in jacob_cannell's post on The Brain as a Universal Learning Machine and my book review of Plato's Camera.

The kinds of steps needed to get both "AI" as such, and "Friendliness" as such, are substantively different in the "possible worlds" where the two different schools of thought apply. Or, perhaps, both are true in certain ways, and what we're really talking about is just two different ways of building minds. Personally, I think the one true distinction is that Calude's work on measuring nonhalting computations gives us a definitive way to deal with the kinds of self-reference scenarios that Old AGI's "any finite computation" approach generates paradoxes in.

But time will tell and I am not a PhD, so everything I say should be taken with substantial sprinklings of salt. On the other hand, to wit, while you shouldn't think for a second that I am one of them, I am certainly on the side of the PhDs.

(Nate: sorry for squabbling on your post. All these sorts of qualms with the research program were things I was going to bring up in person, in a much more constructive way. Still looking forward to meeting you in September!)

Comment author: V_V 01 August 2015 06:54:22AM *  0 points [-]

This approach yields increasingly fruitful analyses of such cognitive activities as causal learning, concept learning, and planning in arbitrary environments as probabilistic inference enriched with causal/logical structure.

It's not obvious to me that the Church programming language and execution model is based on bounded rationality theory.

I mean, the idea of using MCMC to sample the executions of probabilistic programs is certainly neat, and you can trade off bias with computing time by varying the burn-in and samples lag parameters, but this trade-off is not provably optimal.

If I understand correctly, provably optimal bounded rationality is marred by unsolved theoretical questions such as the one-way functions conjecture and P != NP. Even assuming that these conjectures are true, the fact that we can't prove them implies that we can't often prove anything interesting about the optimality of many AI algorithms.

Comment author: [deleted] 03 August 2015 03:40:04AM 0 points [-]

It's not obvious to me that the Church programming language and execution model is based on bounded rationality theory.

That's because it's not. The probabilistic models of cognition (title drop!) implemented using Church tend to deal with what the authors call the resource-rational school of thought about cognition.

If I understand correctly, provably optimal bounded rationality is marred by unsolved theoretical questions such as the one-way functions conjecture and P != NP.

The paper about it that I read was actually using statistical thermodynamics to form its theory of bounded-optimal inference. These conjectures are irrelevant, in that we would be building reasoning systems that would make use of their own knowledge about these facts, such as it might be.

Comment author: V_V 09 August 2015 07:42:21PM -1 points [-]

The paper about it that I read was actually using statistical thermodynamics to form its theory of bounded-optimal inference.

Sounds interesting, do you have a reference?

Comment author: [deleted] 09 August 2015 07:52:15PM 1 point [-]

Sure. If you know statistical mechanics/thermodynamics, I'd be happy to hear your view on the paper, since I don't know those fields.

Comment author: V_V 11 August 2015 12:40:08PM -1 points [-]

Thanks, I'll read it, though I'm not an expert in statistical mechanics and thermodynamics.