OrphanWilde comments on The Winding Path - Less Wrong

6 Post author: OrphanWilde 24 November 2015 09:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread. Show more comments above.

Comment author: OrphanWilde 25 November 2015 04:20:37PM 3 points [-]

I have not heard of Thompson Sampling, or explore-exploit optimization. That it's a named phenomenon independent of what I considered to be rationality itself may be an issue; that's more or less explicitly my own strategy and regard for rationality, which means it may not be as generalizable as I anticipated, as I'm almost certainly engaging in typical mind fallacy without realizing it there.

Comment author: IlyaShpitser 26 November 2015 06:21:45PM *  3 points [-]

The explore-exploit tradeoff is a fundamental thing in learning in complex environments (in AI this is studied in reinforcement learning). The way this often comes up for people is when ordering food (new restaurant / old favorite, favorite order / new order).

Comment author: Gunnar_Zarncke 25 November 2015 07:50:12PM 1 point [-]

explore-exploit is no human strategy but a mathematical modelling of a specific optimization. Just in case that hasn't been clear. It is just that the specific type of rationality you described could be seen as analogous to that.