You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Qiaochu_Yuan comments on Useful Concepts Repository - Less Wrong Discussion

32 Post author: Qiaochu_Yuan 10 June 2013 06:12AM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (105)

You are viewing a single comment's thread.

Comment author: Qiaochu_Yuan 10 June 2013 07:15:21AM *  20 points [-]

Most functions are not linear. This may seem too obvious to be worth mentioning, but it's very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term "linear fallacy" for something like this assumption), or conversely in either case.

Nonlinearity is responsible for local optima that aren't global optima, which makes optimization a difficult task in general: it's not enough just to look at the direction in which you can improve the most by changing things a little (gradient ascent), but sometimes you might need to traverse an uncanny valley and change things a lot to get to a better local optimum, e.g. if you're at a point in your life where you've made all of the small improvements you can, you may need to do something drastic like quit your job and find a better one, which will temporarily make your life worse, in order to eventually make your life even better.

The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn't a linear function of money. Your improvement in the ability to do something is usually not linear in the amount of time you put into practicing it (at some point you'll hit diminishing marginal returns). And so forth.

Comment author: RomeoStevens 10 June 2013 05:54:55PM *  5 points [-]

Related: although many distributions are normal, It seems that people often assume an unfamiliar distribution must be either normal or uniform. Multimodal in particular seems to be an oft neglected possibility.

Comment author: lukeprog 10 June 2013 05:41:29PM *  3 points [-]

Most functions are not linear

In particular, I'll mention that most tech development curves are logistic.

Comment author: Pablo_Stafforini 31 December 2015 12:09:35PM *  0 points [-]

Most functions are not linear. This may seem too obvious to be worth mentioning, but it's very easy to assume that various functions that appear in real life are linear, e.g. to assume that if a little of something is good, then more of it is better, or if a little of something is bad, then more of it is even worse (apparently some people use the term "linear fallacy" for something like this assumption), or conversely in either case.

Jordan Ellenberg discusses this phenomenon at length in How Not to Be Wrong: The Power of Mathematical Thinking. See here for some relevant quotes (a blog post by one of the targets of Ellenberg's criticism).

Comment author: AspiringRationalist 11 June 2013 04:58:28AM 0 points [-]

The reason variance in financial investments matters, even if you only care about expected utility, is that utility isn't a linear function of money.

It's not just that. Even if your utility function with regards to money were linear, it would still be wise to try to decrease variance, because high variance makes returns not compound as well.

Comment author: Qiaochu_Yuan 11 June 2013 05:03:57AM 1 point [-]

If you value yourself getting money in the future, then that should be taken into account in your utility function.

Comment author: casebash 23 February 2014 08:39:20AM 0 points [-]

Not if you used the geometric mean

Comment author: endoself 10 June 2013 08:47:51PM 0 points [-]

Your second paragraph could benefit from the concept of simulated annealing.

Comment author: Vaniver 11 June 2013 12:51:21AM 2 points [-]

There are lots of metaheuristic optimization methods; simulated annealing is the easiest one to explain and implement, but consequently it's also the dumbest of them.

Comment author: Houshalter 16 July 2013 07:06:02AM 0 points [-]

What are the best ones?

Comment author: Vaniver 16 July 2013 03:15:39PM 0 points [-]

I'm personally a fan of tabu search, which prevents cycling and getting stuck in local optima by remembering features of previously seen solutions, and not visiting any solution with those features for some set length of time. "Best" depends on the particular problem, though; there are situations when the easy implementation of simulated annealing makes it a superior solution to a cleverer but longer implementation of something else.

Comment author: Qiaochu_Yuan 10 June 2013 08:51:09PM 2 points [-]

I thought about mentioning simulated annealing, but then it seemed to me like simulated annealing is more involved than the basic concept I wanted to get across (e.g. the cooling phase is an extra complication).

Comment author: endoself 10 June 2013 09:20:44PM 1 point [-]

But it's actually important to the example. If someone intends to allocate their time searching for small and large improvements to their life, then simulated annealing suggests that they should make more of the big ones first. (The person you describe has may not have done this, since they've settled into a local optimum but now decide to find a completely different point on the fitness landscape, though without more details it's entirely possible they've decided correctly here.)

Comment author: malcolmocean 11 June 2013 06:57:38PM *  1 point [-]

I propose making an analogy to Split & Merge Expectation Maximization (SMEM) instead.

It's a very efficient algorithm for modelling that operates as follows:
1) perform EM to find the local optimum
2) examine the clusters to determine which two are most similar, and combine them
3) examine the clusters to determine which one is least representative of the data it's supposed to describe, and break it into two
4) perform EM on the three adjusted clusters
5) Repeat 1-4 until the change in likelihood between iterations drops below some epsilon.

I think this is actually quite isomorphic to Goal Factoring (from Geoff Anders / CFAR) in that you're trying to combine things that are similar and then break up things that are inefficient. At least, I spent an entire summer working on an SMEM clustering program (though some of that was UI) and they feel similar to me.

Comment author: AspiringRationalist 11 June 2013 04:59:23AM 0 points [-]

Simulated annealing is one of many techniques for optimizing in search spaces with a lot of local maxima. Is there a reason why you're emphasizing that one in particular?

Comment author: endoself 11 June 2013 09:40:00AM 0 points [-]

It provides a usefully concept, which can be carried over into other domains. I suppose there are other techniques that use a temperature, but I'm much less familiar with them and they are more complicated. Is understanding other metaheuristics more useful to people who aren't actually writing a program preforms some optimization than just understanding simulated annealing?