LESSWRONG
LW

Is exploitability necessarily unstable? Could there be a tolerable level of exploitability, especially if it allows for tradeoffs with desirable characteristics that are only available to non-EU maximizers?"

Clarifying Power-Seeking and Instrumental Convergence

Kerrigan13d10

Why is this not true for most humans? Many religious people would not want to modify the lightcone as they think that it's God's territory to modify.

why assume AGIs will optimize for fixed goals?

Kerrigan13d10

The initial distribution of values need not be highly related to the resultant values after moral philosophy and philosophical self-reflection. Optimizing hedonistic utilitariansm, for example, looks very little like any values from the outer optimization loop of natural selection.

Coherent decisions imply consistent utilities

Kerrigan13d10

Although there would be pressure for an AI to not be exploitable, wouldn't there also be pressure for adaptability and dynamism? The ability to alter preferences and goals given new environments?

Humans aren't agents - what then for value learning?

Kerrigan26d10

Why can't the true values live at the level of anatomy and chemistry?

The Anthropic Trilemma

Kerrigan1mo10

Would this be solved if cresting a copy is creating someone functionally the same as you but who is someone else's identity, and not you?