Manfred

Value learners & wireheading

Dewey 2011 lays out the rules for one kind of agent with a mutable value system. The agent has some distribution over utility functions, which it has rules for updating based on its interaction history (where "interaction history" means the agent's observations and actions since its origin). To choose an...

Feb 3, 20169

Communicating concepts in value learning

Epistemic status: Trying to air out some thoughts for feedback, we'll see how successfully. May require some machine learning to make sense, and may require my level of ignorance to seem interesting. Many current proposals for value learning are garden-variety regression (or its close cousin, classification). The agent doing the...

Dec 14, 20157

Meetup : Urbana-Champaign: Quorum for discourse

Discussion article for the meetup : Urbana-Champaign: Quorum for discourse WHEN: 06 September 2015 02:00:00PM (-0500) WHERE: Altgeld Hall, W. Green Street, Urbana, IL, 61801 Another year, another chance to come to a LW meetup. Find us at the scenic north entrance of Altgeld Hall. I'll bring delicious food. Depending...

Aug 24, 20153

Moral AI: Options

Epistemic status: One part quotes (informative, accurate), one part speculation (not so accurate). One avenue towards AI safety is the construction of "moral AI" that is good at solving the problem of human preferences and values. Five FLI grants have recently been funded that pursue different lines of research on...

Jul 11, 201514

Limited agents need approximate induction

[This post borders on some well-trodden ground in information theory and machine learning, so ideas in this post have an above-average chance of having already been stated elsewhere, by professionals, better. EDIT: As it turns out, this is largely the case, under the subjects of the justifications for MML prediction...

Apr 24, 201516

Selfish preferences and self-modification

One question I've had recently is "Are agents acting on selfish preferences doomed to having conflicts with other versions of themselves?" A major motivation of TDT and UDT was the ability to just do the right thing without having to be tied up with precommitments made by your past self...

Jan 14, 201514

Treating anthropic selfish preferences as an extension of TDT

I When preferences are selfless, anthropic problems are easily solved by a change of perspective. For example, if we do a Sleeping Beauty experiment for charity, all Sleeping Beauty has to do is follow the strategy that, from the charity's perspective, gets them the most money. This turns out to...

Jan 1, 201513

LESSWRONG
LW

LESSWRONG
LW

Manfred

Manfred

Magic players: "How do I lose?"

An anecdote about names

Fundamentals of kicking anthropic butt

Kidnapping and the game of Chicken

Manfred

Magic players: "How do I lose?"

An anecdote about names

Fundamentals of kicking anthropic butt

Kidnapping and the game of Chicken

Value learners & wireheading

Communicating concepts in value learning

Meetup : Urbana-Champaign: Quorum for discourse

Moral AI: Options

Limited agents need approximate induction

Selfish preferences and self-modification

Treating anthropic selfish preferences as an extension of TDT