You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

JoshuaFox comments on Brainstorming for post topics - Less Wrong Discussion

21 Post author: NancyLebovitz 31 May 2014 03:08PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (148)

You are viewing a single comment's thread. Show more comments above.

Comment author: JoshuaFox 02 June 2014 06:04:28AM *  0 points [-]

I just want to understand UDT, I often need several articles, both popular and more formal, before I really understand something like this.

There have been plenty of articles on UDT, but not an overview.

Comment author: Tyrrell_McAllister 02 June 2014 09:32:23PM 3 points [-]

Here's a brief write-up of the basic idea of UDT that I wrote awhile back.

Comment author: JoshuaFox 04 June 2014 08:42:47PM 0 points [-]

Thank you! I worked my way through it, and the level of formalism is fine. As you say, it is not meant to include the motivation. I'd appreciate an article that includes the motivation for each element of the formalism.

Also, some concepts were not defined, like "execution history." If "programs" are pure functions (stateless), I am not sure what a history is. Or maybe there is a temporal model here, like the one in the work of Hutter, Legg etc?

Actually, if I understand correctly, the "programs" P1, P2,.... represent the environment (as expressed in Hutter's formalism). (Or perhaps P1, P2, ... represent different programs the agent could run inside itself?) If P1, P2... are the environment, why have multiple programs, ..., when we could combine them into one thing called "environment"? In your article there is a utility function, and Hutter's model has rewards coming from the environment according to an unknown reward function. But I don't understand the essential difference between approaches here. Since the final choice is a maxarg, I still haven't figured out what this definition of UDT adds to the trivial idea "make the choice with highest expected utility."

The article is great for what it is intended to be , and I am glad we have it. But I'd like to see an intro/overview to UDT.

Comment author: JoshuaFox 16 June 2014 02:29:03PM *  2 points [-]

Just read Daniel Hintze's BA thesis (Arizona State University). It is the best intro to UDT and TDT I have seen so far.

(My understanding of Hintze's writing is partly based on lots of other reading on TDT and UDT that I didn't understand as well, but I think that even if I did not have that background, it would be the best intro.)