You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

TheOtherDave comments on Stupid Questions Open Thread - Less Wrong Discussion

42 Post author: Costanza 29 December 2011 11:23PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (265)

You are viewing a single comment's thread. Show more comments above.

Comment author: Andy_McKenzie 30 December 2011 04:40:12AM *  7 points [-]

In this interview between Eliezer and Luke, Eliezer says that the "solution" to the exploration-exploitation trade-off is to "figure out how much resources you want to spend on exploring, do a bunch of exploring, use all your remaining resources on exploiting the most valuable thing you’ve discovered, over and over and over again." His point is that humans don't do this, because we have our own, arbitrary value called boredom, while an AI would follow this "pure math."

My potentially stupid question: doesn't this strategy assume that environmental conditions relevant to your goals do not change? It seems to me that if your environment can change, then you can never be sure that you're exploiting the most valuable choice. More specifically, why is Eliezer so sure that what wikipedia describes as the epsilon-first strategy is always the optimal one? (Posting this here because I assume he has read more about this than me and that I am missing something.)

Edit 12/30 8:56 GMT: fixed typo in last sentence of second paragraph.

Comment author: TheOtherDave 30 December 2011 04:55:22AM 1 point [-]

Sure. For example, if your environment is such that the process of exploitation can alter your environment in such a way that your earlier judgment of "the most valuable thing" is no longer reliable, then an iterative cycle of explore-exploit-explore can potentially get you better results.

Of course, you can treat each loop of that cycle as a separate optimization problem and use the abovementioned strategy.

Comment author: Andy_McKenzie 30 December 2011 06:31:00PM 0 points [-]

Could I replace "can potentially get you better results" with "will get you better results on average"?

Comment author: TheOtherDave 30 December 2011 08:12:44PM 1 point [-]

Would you accept "will get you better results, all else being equal" instead? I don't have a very clear sense of what we'd be averaging.

Comment author: Andy_McKenzie 30 December 2011 09:00:35PM 0 points [-]

I meant averaging over the possible ways that the environment could change following your exploitation. For example, it's possible that a particular course of exploitation action could shape the environment such that your exploitation strategy actually becomes more valuable upon each iteration. In such a scenario, exploring more after exploiting would be an especially bad decision. So I don't think I can accept "will" without "on average" unless "all else" excludes all of these types of scenarios in which exploring is harmful.

Comment author: TheOtherDave 30 December 2011 10:22:35PM 0 points [-]

OK, understood. Thanks for clarifying.

Hm. I expect that within the set of environments where exploitation can alter the results of what-to-exploit-next calculations, there more possible ways for it to do so such that the right move in the next iteration is further exploration than further exploitation.

So, yeah, I'll accept "will get you better results on average."