Hi there, I've been thinking a lot about AI Alignment and values, the latter for longer than the former, admittedly. I'm in graduate school and study values through ethics. I would love to start a conversation about a thought that shot through my mind just last night. In thinking about values, we often focus on the principles, the concepts such as "good" and "bad" -- most simply, the nouns and adjectives. These are challenging to bridge consensus even in the same language, let alone across cultural, linguistic, and geographic boundaries. In my past experience as an English teacher, conveying verbs was always easier than trying to explain things like integrity.
Here's my question: what if instead of fixed concepts and rules, AI alignment focused on actions as the underlying reward function? In other words, might programming AI to focus on the means rather than the ends facilitate an environment in which humans are freer to act and reach their own ends, prioritizing activated potential over predetermined outcome? Can the action, instead of the outcome, become the parameter, rendering AI a facilitator rather than a determiner?
There's a lot more to these questions, with details and explanations that I would be happy to dive into with anyone interested in discussing further (I didn't think it appropriate to make my first post too lengthy). Either way, I'm happy to have found this group and look forward to connecting with likeminded and unlikeminded folks. Thank you for reading! ~Elisabeth
If I'm understanding your right, which I'm not sure I am, I think this just collapses back to the normal case but where the thing being optimized for are those that you demarcate as "means" rather than "ends". That is, the means literally become the ends because they are the things being optimized for.
I think you are understanding correctly and I see your point. So the question becomes: we intervene before it becomes cyclical so that the focus is process and not outcome? That's where the means and the ends remain separate. In effect, can a non-deterministic AI model be written?