Stupid Questions Open Thread

Costanza

This is for anyone in the LessWrong community who has made at least some effort to read the sequences and follow along, but is still confused on some point, and is perhaps feeling a bit embarrassed. Here, newbies and not-so-newbies are free to ask very basic but still relevant questions with the understanding that the answers are probably somewhere in the sequences. Similarly, LessWrong tends to presume a rather high threshold for understanding science and technology. Relevant questions in those areas are welcome as well. Anyone who chooses to respond should respectfully guide the questioner to a helpful resource, and questioners should be appropriately grateful. Good faith should be presumed on both sides, unless and until it is shown to be absent. If a questioner is not sure whether a question is relevant, ask it, and also ask if it's relevant.

In this interview between Eliezer and Luke, Eliezer says that the "solution" to the exploration-exploitation trade-off is to "figure out how much resources you want to spend on exploring, do a bunch of exploring, use all your remaining resources on exploiting the most valuable thing you’ve discovered, over and over and over again." His point is that humans don't do this, because we have our own, arbitrary value called boredom, while an AI would follow this "pure math."

My potentially stupid question: doesn't this strategy assume that environmental conditions relevant to your goals do not change? It seems to me that if your environment can change, then you can never be sure that you're exploiting the most valuable choice. More specifically, why is Eliezer so sure that what wikipedia describes as the epsilon-first strategy is always the optimal one? (Posting this here because I assume he has read more about this than me and that I am missing something.)

Edit 12/30 8:56 GMT: fixed typo in last sentence of second paragraph.

You got me curious, so I did some searching. This paper gives fairly tight bounds in the case where the payoffs are adaptive (i.e. can change in response to your previous actions) but bounded. The algorithm is on page 5.

3Larks14y

You should probably be prepared to change how much you plan to spend on exploring based on the initial information recieved.

1TheOtherDave14y

Sure. For example, if your environment is such that the process of exploitation can alter your environment in such a way that your earlier judgment of "the most valuable thing" is no longer reliable, then an iterative cycle of explore-exploit-explore can potentially get you better results. Of course, you can treat each loop of that cycle as a separate optimization problem and use the abovementioned strategy.

62

Stupid Questions Open Thread

62

62

62

Stupid Questions Open Thread

62

62