AI-created pseudo-deontology
I'm soon going to go on a two day "AI control retreat", when I'll be without internet or family or any contact, just a few books and thinking about AI control. In the meantime, here is one idea I found along the way.
We often prefer leaders to follow deontological rules, because these are harder to manipulate by those whose interests don't align with ours (you could say the similar things about frequentist statistics versus Bayesian ones).
What about if we applied the same idea to AI control? Not giving the AI deontological restrictions, but programming with a similart goal: to prevent a misalignment of values to be disastrous. But who could do this? Well, another AI.
My rough idea goes something like this:
AI A is tasked with maximising utility function u - a utility function which, crucially, it doesn't know yet. Its sole task is to create AI B, which will be given a utility function v and act on it.
What will v be? Well, I was thinking of taking u and adding some noise - nasty noise. By nasty noise I mean v=u+w, not v=max(u,w). In the first case, you could maximise v while sacrificing u completely, it w is suitable. In fact, I was thinking of adding an agent C (which need not actually exist). It would be motivated to maximise -u, and it would have the code of B and the set of u+noise, and would choose v to be the worst possible option (form the perspective of a u-maximiser) in this set.
So agent A, which doesn't know u, is motivated to design B so that it follows its motivation to some extent, but not to extreme amounts - not in ways that might sacrifice some of the values of some sub-part of its utility function, because that might be part of the original u.
Do people feel this idea is implementable/improvable?
What Deontology gets right
Let me preface this with an acknowledgement that Deontology has blind spots and that I'm not a Deontologist. Much like Logical Positivism, however, Deontology has good things to learn from that many Consequentialist decision algorithms miss.
Social Considerations
Your decision has consequences outside of the direct results. More specifically, if you decide to tell a lie, people are more likely to view you as a liar. This portion of consequences are easy to neglect when making a decision. So while Deontology over-corrects for this (for example, if you put a gun to my head and demand that I profess belief X, I'm going to say that I believe X, which a Deontological prohibition against lying forbids), it does so in a way that is better than many people's naive consequential thinking.
Deontological arguments are also better at convincing people that you have socially valued traits. People expect truth-tellers to tell the truth, so you want to be viewed as a truth-teller. "Lying doesn't work, so I don't lie" is a more awkward and involved argument than "lying is wrong". On a related note, Deonotological reasoning is easier for other people to model. Deontology can screen off the cost-benefit analysis that someone makes when thinking about their decisions, since all you need is the rules that they are following.
Habits and Policies
Decisions aren't made in a vacuum. They also form an implicit rule that people tend to follow. In other words, people form habits. They find it easier to do the same kinds of things that they've always done. Eating one piece of cake doesn't do measurable harm to your waistline, but having a policy of eating one piece of cake whenever you want to does.
If you're familiar with set theory, it's the distinction between {x|P(x)} and {x1, x2, x3...}. If you make decisions without consulting what policy P(x) you'd like to follow, you can make mistakes. Choosing x1 means not only having done x1, but also choosing a P(x) such that P(x1) is true.
When I sign a gay marriage petition, it doesn't just increase the chance that gay marriage gets enacted. It also makes me more likely to do other things that support the gay marriage movement, as well as make me more likely to sign worthwhile-sounding petitions in general. This is part of why I avoid social movements: trying to fight rape culture or conservatives or racism means that I'm more likely to do similar kinds of things when they don't help (Or alternatively, convince people to join whatever movement in question even when more support for that movement isn't helpful).
In short, the Deontological focus on following rules can help people enact the kinds of policies that they want to follow, even if they are bad at evaluating the value gained from following certain policies. It's a way of implementing a Schelling point, in other words - a way to choose a better policy even if breaking the policy this one time seems to work better.
Enforcing pro-social behavior
It's fairly straightforward to tell whether or not someone has crossed an arbitrary line separating pro-social and anti-social behavior. Evaluating someone's consequentialist reasoning, on the other hand, is much more difficult. Let's take, for example, the case of Christopher Dorner, the former LAPD officer who decided to expose and fight what he saw as a corrupt LAPD by declaring a personal war on them. A Deontological "don't kill cops" definitively indicts him as anti-social, whereas it's much more ambiguous whether or not trading some dead cops for a better police force is a good deal or not.
Pro-social reasons for selfish actions are also rather cheap to make or say. If you want a millionaire lifestyle, it's easy to say that your immoral business practices are for feeding starving children in Africa. It's a lot harder to say that your immoral business practices don't violate the rule "don't use immoral business practices". In general, rule-breaking is much easier to detect than utility functions you don't want to have around.
A confusion about deontology and consequentialism
I think there’s a confusion in our discussions of deontology and consequentialism. I’m writing this post to try to clear up that confusion. First let me say that this post is not about any territorial facts. The issue here is how we use the philosophical terms of art ‘consequentialism’ and ‘deontology’.
The confusion is often stated thusly: “deontological theories are full of injunctions like ‘do not kill’, but they generally provide no (or no interesting) explanations for these injunctions.” There is of course an equivalently confused, though much less common, complaint about consequentialism.
This is confused because the term ‘deontology’ in philosophical jargon picks out a normative ethical theory, while the question ‘how do we know that it is wrong to kill?’ is not a normative but a meta-ethical question. Similarly, consequentialism contains in itself no explanation for why pleasure or utility are morally good, or why consequences should matter to morality at all. Nor does consequentialism/deontology make any claims about how we know moral facts (if there are any). That is also a meta-ethical question.
Some consequentialists and deontologists are also moral realists. Some are not. Some believe in divine commands, some are hedonists. Consequentialists and deontologists in practice always also subscribe to some meta-ethical theory which purports to explain the value of consequences or the source of injunctions. But consequentialism and deontology as such do not. In order to avoid strawmaning either the consequentialist or the deontologist, it’s important to either discuss the comprehensive views of particular ethicists, or to carefully leave aside meta-ethical issues.
This Stanford Encyclopedia of Philosophy article provides a helpful overview of the issues in the consequentialist-deontologist debate, and is careful to distinguish between ethical and meta-ethical concerns.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)