You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

OrphanWilde comments on The Winding Path - Less Wrong Discussion

6 Post author: OrphanWilde 24 November 2015 09:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (10)

You are viewing a single comment's thread. Show more comments above.

Comment author: OrphanWilde 25 November 2015 04:20:37PM 3 points [-]

I have not heard of Thompson Sampling, or explore-exploit optimization. That it's a named phenomenon independent of what I considered to be rationality itself may be an issue; that's more or less explicitly my own strategy and regard for rationality, which means it may not be as generalizable as I anticipated, as I'm almost certainly engaging in typical mind fallacy without realizing it there.

Comment author: IlyaShpitser 26 November 2015 06:21:45PM *  3 points [-]

The explore-exploit tradeoff is a fundamental thing in learning in complex environments (in AI this is studied in reinforcement learning). The way this often comes up for people is when ordering food (new restaurant / old favorite, favorite order / new order).

Comment author: Gunnar_Zarncke 25 November 2015 07:50:12PM 1 point [-]

explore-exploit is no human strategy but a mathematical modelling of a specific optimization. Just in case that hasn't been clear. It is just that the specific type of rationality you described could be seen as analogous to that.