phillchris — LessWrong

LESSWRONG
LW

Replying toWhat Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

Hey! Absolutely, I think a lot of this makes sense. I assume you were meaning this paragraph with the Reverse Engineering Roles and Norms paragraph:

I want to be clear that I do not mean AI systems should go off and philosophize on their own until they implement the perfect moral theory without human consent. Rather, our goal should be to design them in such a way that this will be a interactive, collaborative process, so that we continue to have autonomy over our civilizational future^[10].

For both points here, I guess I was getting more at this question by asking these: how ought we structure this collaborative process? Like what constitutes feedback a... (read more)

Replying toWhat Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

phillchris3y

What Should AI Owe To Us? Accountable and Aligned AI Systems via Contractualist AI Alignment

Great post! I'm curious if you could elaborate on when you would feel comfortable making an agent to make some kind of "enlightened" decision, as opposed to one based more on "mere compliance"? Especially given an AI system that is perhaps not very interpretable, or operates on very high-stakes applications, what sort of certificate / guarantee / piece of reasoning would you want from a system to allow it to enact fundamental social changes? The nice thing about "mere compliance" is there are benchmarks for 'right' and 'wrong' decisions. But here I would expect people to reasonably disagree on whether an AI system or community of systems has made a good decision,... (read more)

Replying toWorlds Where Iterative Design Fails

phillchris3y

Worlds Where Iterative Design Fails

I wonder if implications for this kind of reasoning go beyond AI: indeed, you mention the incentive structure for AI as just being a special case of failing to incentivize people properly (e.g. the software executive), and the only difference being AI occurring at a scale which has the potential to drive extinction. But even in this respect, AI doesn't really seem unique: take the economic system as a whole, and "green" metrics, as a way to stave off catastrophic climate change. Firms, with the power to extinguish human life through slow processes like gradual climate destruction, will become incentivized towards methods of pollution that are easier to hide as regulations on... (read more)