User Comment Replies

Consciousness as a conflationary alliance term for intrinsically valued internal experiences

People who study consciousness would likely tell you that it feels like the precise opposite - memetic rivalry/parasitism? That's because they talk about consciousness in a very specific sense - they mean qualia (see Erik Hoel's answer to your post). For some people, internalizing what they mean is extremely hard and I don't blame them - my impression is that many illusionists have something like aphantasia applied to the metacognition about consciousness. They are right to be suspicious about something that feels ineffable and irreducible, however that's ... (read more)

A new place to discuss cognitive science, ethics and human alignment

Daniel_Friedrich2y*10

Thanks for the response, lot of fun prompts!

Most importantly, I believe there is shouldness to values, particularly, it sounds like a good defining feature of moral values - even though it might be an illusion we get to decide them freely (but that seems besides the point).

I don't think it's clear we don't get to edit our terminal values. I might be egoistic at core and yet I could decide to undergo an operation that would make me a pure utilitarian. It might be signalling or a computational mistake on my part but I could. It also could be that the br... (read more)

2jacob_cannell2y

So by 'root utility function', I meant something like the result of using a superintelligent world model or oracle to predict possible futures, and then allowing the human to explore those futures and ultimately preference rank them. So we don't get to edit our root utility function - which is not to say we could not in theory with some hypothetical future operation as you mention - but we don't in practice, and most would not want to. Morality/ethics is more like an attempt to negotiate some set of cooperative instrumental values and is only loosely related to our root utility function in the sense that it ultimately steers everything. That is not argument for locking in values, it is an argument against. But thankfully it is not all a given that values will get locked in. Human values seem to evolve slowly over time. A successfully aligned AGI will either model that evolution correctly (as in brain-like AGI and/or successful value learning), or be largely immune to it (through safe bounding via external empowerment for example) or utility uncertainty. There are numerous potential paths to the goal that don't involve any value lock in (which could be disastrous). To the limited extent that makes sense to me, it does so as a non-technical vague analogy to utility uncertainty. There is only one thing we want to lock in: optimization aligned with our true unknown dynamic terminal utility function.

LESSWRONG
LW

All of Daniel_Friedrich's Comments + Replies