You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Lumifer comments on Approximating Solomonoff Induction - Less Wrong Discussion

6 Post author: Houshalter 29 May 2015 12:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (45)

You are viewing a single comment's thread. Show more comments above.

Comment author: Lumifer 15 June 2015 06:44:47PM 0 points [-]

Thinking about it a bit, the "global" vs "local" property is not really a property of an optimizer, as it is a combined property of the optimizer and the search space (aka the error landscape). For an entirely arbitrary search space, only the exhaustive search and equivalents will guarantee you the global optimum. For simple search spaces (e.g. quadratic optimization) easy "local" things like hill-climbing will guarantee you the global optimum. If your solution is bounded, you always know how close to the theoretical best you are and can adjust the search accordingly (e.g. terminate early), etc.

Comment author: jacob_cannell 15 June 2015 07:08:16PM *  0 points [-]

Yes. In some cases simple (convex) search spaces even permit exact non-iterative analytic solutions which are faster than any iterative approach like SGD (although curiously, they are typically not dramatically faster - as they still need to analyze each example once, and the number of full epochs/iterations in SGD type approaches appears to be sublinear. )

The more advanced methods - natural gradient, hessian free, etc - expend extra computation to analyze the local search space and then use this analysis to make an improved nonlocal jump. This always has extra cost though, so these techniques just end up tending to match SGD in terms of net optimization speed.

Looking at it another way, when a researcher uses something like a convex optimization technique to solve a problem more quickly - what actually happened is the combined human+machine system used a global optimizer to solve the problem, with more of the algorithm running in the human's brain than on the machine. ;)