V_V comments on MIRI's Approach - Less Wrong

34 Post author: So8res 30 July 2015 08:03PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (59)

You are viewing a single comment's thread. Show more comments above.

Comment author: jacob_cannell 30 July 2015 05:26:16PM *  4 points [-]

In fact, I expect that given the right way of modelling, formal verification of learning systems up to epsilon-delta bounds (in the style of PAC-learning, for instance) should be quite doable. Why?

Dropping the 'formal verification' part and replacing it with approximate error bound variance reduction this is potentially interesting - although it also seems to be a general technique that would - if it worked well - be useful for practical training, safety aside.

Why? Because, as mentioned regarding PAC learning, it's the existing foundation for machine learning.

Machine learning is an eclectic field with many mostly independent 'foundations' - bayesian statistics of course, optimization methods (hessian free, natural, etc), geometric methods and NLDR, statistical physics ...

That being said - I'm not very familiar with the PAC learning literature yet - do you have a link to a good intro/summary/review?

Hell, if I could find the paper showing that deep networks form a "funnel" in the model's free-energy landscape - where local minima are concentrated in that funnel and all yield more-or-less as-good test error, while the global minimum reliably overfits - I'd be posting the link myself.

That sounds kind of like the saddle point paper. It's easy to show that in complex networks there are a large number of equivalent minima due to various symmetries and redundancies. Thus finding the actual technical 'global optimum' quickly becomes suboptimal when you discount for resource costs.

If it seems really really really impossibly hard to solve a problem even with the 'simplification' of lots of computing power, perhaps the underlying assumptions are wrong. For example - perhaps using lots and lots of computing power makes the problem harder instead of easier.

You're not really being fair to Nate here, but let's be charitable to you: this is fundamentally a dispute between the heuristics-and-biases school of thought about cognition and the bounded/resource-rational school of thought.

Yes that is the source of disagreement, but how am I not being fair? I said 'perhaps' - as in have you considered this? Not 'here is why you are certainly wrong'.

Computationally, this is saying, "When we have enough resources that only asymptotic complexity matters, we use the Old Computer Science way of just running the damn algorithm that implements optimal behavior and optimal asymptotic complexity." Trying to extend this approach into statistical inference gets you basic Bayesianism and AIXI, which appear to have nice "optimality" guarantees, but are computationally intractable and are only optimal up to the training data you give them.

Solonomoff/AIXI and more generally 'full Bayesianism' is useful as a thought model, but is perhaps over valued on this site compared to the machine learning field. Compare the number of references/hits to AIXI on this site (tons) to the number on r/MachineLearning (1!). Compare the number of references for AIXI papers (~100) to other ML papers and you will see that the ML community sees AIXI and related work as minor.

The important question is what does the optimal practical approximation of Solonomoff/Bayesian look like? And how different is that from what the brain does? By optimal I of course I mean optimal in terms of all that really matters, which is intelligence per unit resources.

Human intelligence - including that of Turing or Einstein, only requires 10 watts of energy and more surprisingly only around 10^14 switches/second or less - which is basically miraculous. A modern GPU uses more than 10^18 switches/second. You'd have to go back to a pentium or something to get down to 10^14 switches per second. Of course the difference is that switch events in an ANN are much more powerful because they are more like memory ops, but still.

It is really really hard to make any sort of case that actual computer tech is going to become significantly more efficient than the brain anytime in the near future (at least in terms of switch events/second). There is a very strong case that all the H&B stuff is just what actual practical intelligence looks like. There is no such thing as intelligence that is not resource efficient - or alternatively we could say that any useful definition of intelligence must be resource normalized (ie utility/cost).

Comment author: V_V 31 July 2015 08:32:58AM 1 point [-]

Human intelligence - including that of Turing or Einstein, only requires 10 watts of energy and more surprisingly only around 10^14 switches/second or less - which is basically miraculous. A modern GPU uses more than 10^18 switches/second.

I don't think that "switches" per second is a relevant metric here. The computation performed by a single neuron in a single firing cycle is much more complex than the computation performed by a logic gate in a single switching cycle.

The amount of computational power required to simulate a human brain in real time is estimated in the petaflops range. Only the largest supercomputer operate in that range, certainly not common GPUs.

Comment author: jacob_cannell 31 July 2015 04:29:46PM *  0 points [-]

You misunderstood me - the biological switch events I was referring to are synaptic ops, and they are comparable to transistor/gate switch ops in terms of minimum fundemental energy cost in Landauer analysis.

The amount of computational power required to simulate a human brain in real time is estimated in the petaflops range.

That is a tad too high, the more accurate figure is 10^14 ops/second (10^14 synapses * avg 1 hz spike rate). The minimal computation required to simulate a single GPU in real time is 10,000 times higher.

Comment author: V_V 01 August 2015 06:29:55AM *  0 points [-]

That is a tad too high, the more accurate figure is 10^14 ops/second (10^14 synapses * avg 1 hz spike rate).

I've seen various people give estimates in the order of 10^16 flops by considering the maximum firing rate of a typical neuron (~10^2 Hz) rather than the average firing rate, as you do.

On one hand, a neuron must do some computation whether it fires or not, and a "naive" simulation would necessarily use a cycle frequency of the order of 10^2 Hz or more, on the other hand, if the result of a computation is almost always "do not fire", then as a random variable the result has little information entropy and this may perhaps be exploited to optimize the computation. I don't have a strong intuition about this.

The minimal computation required to simulate a single GPU in real time is 10,000 times higher.

On a traditional CPU perhaps, on another GPU I don't think so.