I suspect doing a good job of this is going to be extremely challenging. My loose-order-of-magnitude estimate of the Kolmogorov complexity of a decent Ethics/human values calculator is somewhere in the terabytes (of the order of the size of our genome, i.e. a few gigabytes, is a plausible lower bound, but there's no good reason for it to be an upper bound). However, a sufficiently rough approximation might be a lot smaller, and even that could be quite useful (if prone to running into Goodhart's Law under optimization pressure). I think it's quite likely that doing something like this will be useful in AI-Assisted Alignment, in which case having sample all-human attempts from which to start is likely to be valuable.
Did you look at the order of magnitude of standard civil damages for various kinds of harm? That seems like the sort of thing your model should be able to predict successfully.
Also, these sorts of “pleasure” involve not taking responsibility for one’s emotions and thus act to reduce self-esteem, which in turn reduces one’s tendency to experience life overall as “positive.” Therefore, these pleasures were actually considered as value destructions.
I know a number of intelligent, apparently sane, yet kinky people who would disagree with you. If you're interested in the topic, you might want to read some more on it, e.g.: Safe, Sane, and Consensual—Consent and the Ethics of BDSM If nothing else, your model should be able to account for the fact that at least a few percent of people do this.
Thank you for the comment. Yes, I agree that "doing a good job of this is going to be extremely challenging.” I know it’s been challenging for me just to get to the point that I’ve gotten to so far (which is somewhat past my original post). I like to joke that I’m just smart enough to give this a decent try and just stupid enough to actually try it. And yes, I’m trying to find a rough approximation as a good starting point, in hopes that it’ll be useful.
Thanks for the suggestion about civil damages - I haven’t looked into that, only criminal “damages” (in terms of criminal sentences) thus far. I actually don’t expect that the first version of my calculations, based on my own ethics/values, will particularly agree with civil damages, but it may be interesting to see if the calculations can be modified to follow an alternate ethical framework (one less focused on self-esteem) that does give reasonable agreement.
Regarding masochistic and sadistic pleasure, it depends on how we define them. One might regard people who enjoy exercise as being into “masochistic pleasure.” That’s not what I mean by it. For masochistic pleasure I basically mean pleasure that comes from one’s own pain, plus self-loathing. Sadistic pleasure would be pleasure that comes from the thought of others’ pain, plus self-loathing (even if it may appear as loathing of other, the way I see it, it’s ultimately self-loathing). Self-loathing involves not taking responsibility for one’s emotions about oneself and is part of having a low self-esteem. I appreciate you pointing to the need for clarification on this, and hope it's now clarified a bit. Thanks again for the comment!
If Artificial General Intelligence (AGI) is achieved without a highly consistent way of determining what’s the most ethical decision for it to make, there’s a very good chance it’ll do things that many humans won’t like. One way to give an AGI the ability to consistently make ethical decisions could be to provide it with a straightforward mathematical framework to calculate the ethics of a situation based on approximated parameters. This would also likely enable some level of explainability for the AGI’s decisions. I’ve been pursuing such a framework and have come up with a preliminary system that appears to calculate the “ethics” of some idealized decisions in a manner that’s consistent with my values and ethical intuitions, meaning it hasn’t produced any wildly counterintuitive results for the admittedly very limited number of ethical decision scenarios I’ve looked at so far. I don’t put forward my values and ethical intuitions as the “right” ones, but believe they're reasonably consistent so should provide a decent foundation to build a proof-of-concept ethics calculation system around.
For determining the "most ethical” decision an AGI could make in a given situation, the criterion I’ve chosen is that the decision should maximize expected long-term value in the world. I define value to be how useful something ultimately is in supporting and promoting life and net “positive” experiences, where “positive” can contain significant subjectivity. This is basically a utilitarian philosophical approach, although I include the expected value of upholding rights as well.
Setting Up and Applying a Mathematical Framework
Here’s an outline of the steps I’ve used to devise and apply this system:
My preliminary minimal sets of value destructions and builds are given here. I’ll likely refine these lists further in the future. [Update, Jan. 19, 2024: I've updated these value change lists (same link as before), including to avoid a given value build being simply not a given value destruction, and vice versa - I believe this is a better way to avoid the "double counting" that I talk about 3 paragraphs below.]
Regarding #1 above: combinations of value destructions from the minimal set should be able to describe other value destructions not explicitly in this minimal set. An example would be arson of someone else’s property without their permission, which involves the minimal set value destructions of (at the very least) violation of property rights and destruction of property, but could also involve pain, long-term health issues and/or someone dying.
Regarding #6 above: in terms of personal value builds or destructions that people might not want considered in the calculations, an example situation could be having a hypothetical switch that would kill one person if you flipped it or pinch everyone in the world if you didn’t flip it. If people knew that someone had to die so they wouldn’t get pinched, a large fraction of them likely wouldn’t want that on their conscience and would want the weight of their individual pain to not be considered in the ethics calculations of whether to flip the switch or not.
For accounting purposes in these ethics calculations, the “do nothing” option is set as a baseline of zero value destruction and zero value build, and other options have their value builds and destructions considered with respect to this baseline. For instance, acting to save a life that would’ve been lost if you did nothing would be considered to be a value build in the life saved by acting, and not a value destruction in the life lost when doing nothing. If the value of the life were included both as a build for the case of taking action and a destruction when not taking action, it would constitute “double counting” of the relative value difference between the options. [Update, Jan. 19, 2024: I've changed the way I avoid double counting - by updating the value destruction and build lists so they don't overlap as opposites of each other (see update above). Therefore, the "do nothing" option now has its own associated value builds/destructions, i.e., the "opposites" of these value builds/destructions aren't included in the value equations of the other options.]
In this methodology, simplifications necessarily had to be made to approximate the complex world in which we live where everything is interconnected and decisions and actions can have unforeseen long-term effects. Nevertheless, if the calculations yield seemingly self-consistent results over a broad range of scenarios, they should provide a useful starting point for conveying human ethics to an AGI.[1]
In the current, proof-of-concept version of these calculations, some of the value weight equations, such as the one for someone dying, are “zeroth order” approximations and could use significant refinement.[2] This is left to future work, as is incorporating the effects of uncertainty in the input parameters.
I chose the relative weights of different value destructions and builds by “feel” to try to match my ethical intuitions. An example would be the value of rights versus the value of a human life. These relative weights are certainly open to debate, although there are likely only limited ranges over which they could be modified before the calculations yield some obviously counterintuitive results. By the way, I believe the calculations should strongly favor rights over “classic” utilitarianism considerations, in order to keep an AGI from doing bad things (violating rights) on a potentially massive scale, in the name of the “greater good.”
Philosophical Aspects
Some aspects of this work that may be interesting to philosophers include:
For more about how these calculations handle different ethical dilemmas, including variations of the well-known “trolley problem,” click here.
To my GitHub account, I’ve posted Python code for a "demo" ethics calculator that considers a significantly smaller set of value destructions and builds than in the full version of the code, which is not yet complete.
Towards Implementing this Framework in an AGI System
Before integrating these ethics calculations into a “real-life” AI or AGI decision-making system, the following steps should be performed:
This is a fairly limited description of what I’ve put together, and leaves open many questions. The point of this write-up is not to provide all the answers, but to report the viability of one possible “ethics calculator” and suggest its potential utility. I believe these ethics calculations could provide a useful method for helping an AGI make the most long-term value building decisions it can. If there’s interest, I may provide more details of the calculations and the reasoning behind them in the future.
Thanks for reading.
Some References
Prior to and while working on these ethics calculations, I read a number of different philosophical and self-help resources, many of them listed below. These helped hone some of my ideas, especially with the various ethical dilemmas, thought experiments, and logical arguments they presented.
Bostrom, N., "Ethical issues in advanced artificial intelligence." Science fiction and philosophy: from time travel to superintelligence (2003): 277-284.
Branden, N., “The Six Pillars of Self-Esteem,” (2011).
Bruers, S., Braeckman, J., “A Review and Systemization of the Trolley Problem,” Philosophia: Philosophical Quarterly of Israel, 42, 251-69 (2014).
Chapell, R.Y., Meissner, D., and MacAskill, W., utilitarianism.net
D’Amato, A., Dancel, S., Pilutti, J., Tellis, L., Frascaroli, E., and Gerdes, J.C., “Exceptional Driving Principles for Autonomous Vehicles,” Journal of Law and Mobility 2022: 1-27 (2022).
Friedman, A.W., “Minimizing Harm: Three Problems in Moral Theory,” PhD thesis, MIT, (2002).
Greene, J.D., Cushman, F.A., Stewart, L.E., Lowenberg, K., Nystrom, L.E., and Cohen, J.D., “Pushing Moral Buttons: The Interaction between Personal Force and Intention in Moral Judgment,” Cognition, (2009) 111 ( 3 ): 364 – 371.
Guttormsen, T.J., “How to Build Healthy Self-Esteem,” Udemy course: https://www.udemy.com/course/healthy-self-esteem
Huemer, M. “Knowledge, Reality and Value,” (2021).
Huemer, M., Fake Nous blog: https://fakenous.substack.com/
Internet Encyclopedia of Philosophy, “Ethics of Artificial Intelligence” https://iep.utm.edu/ethics-of-artificial-intelligence/
Kamm, F., lectures on the trolley problem: https://www.youtube.com/watch?v=A0iXklhA5PQ and https://www.youtube.com/watch?v=U-T_zopKRCQ
Kaufmann, B.N., “Happiness Is a Choice,” (1991).
Lowe, D., “The deep error of political libertarianism: selfownership, choice, and what’s really valuable in life,” Critical Review of International Social and Political Philosophy, 23 (6): 683-705 (2020).
MacAskill, W., Bykvist, K., and Ord, T., “Moral Uncertainty,” (2020).
MacAskill, W. “What We Owe the Future,” (2022).
Pearce, D., “Can Biotechnology Abolish Suffering?,” (2018).
Robbins, T., “Awaken the Giant Within,” (1991).
Shafer Landau, R., “The Fundamentals of Ethics,” 1st edition, (2010).
Shafer Landau, R., “The Ethical Life: Fundamental Readings in Ethics and Moral Problems,” (2018).
Singer, P., “Ethics in the Real World,” (2016).
Singer, P., “The Life You Can Save,” (2009).
Singer, P., “Ethics and Intuitions,” The Journal of Ethics (2005) 9: 331-52.
Stanford Encyclopedia of Philosophy, “Deontological Ethics” https://plato.stanford.edu/entries/ethics-deontological/
Thomson, J.J., “Killing, Letting Die, and the Trolley Problem,” Monist (1976) 59: 204-17.
Thomson, J.J., “Turning the Trolley,” Philosophy & Public Affairs, 36 (4): 359-74 (2008).
Vinding, M., “Suffering-Focused Ethics: Defense and Implications,” (2020).
If, instead of making all decisions itself, an AGI were used as an aid to a human, the question arises as to how much an AGI should aid a human in pursuing a decision that goes against its calculations for the most ethical decision. In this case, the AGI could be programmed to not aid a human in pursuing a certain option of a decision if that option didn’t involve the least overall rights violations of all the options. Alternatively, a “damage threshold” could be set wherein a human wouldn’t be aided (and, possibly, the AGI would intervene against the human) in any decision options that exceeded the threshold of rights violations or risk of rights violations. The benefit of this would be to leave some space for humans to still “be human" and mess things up as part of their process of learning from their mistakes. It should also help provide a path for people to raise their self-esteems when they take responsibility for damages they’ve caused.
Though I haven’t worked out the relative weights yet, the value of a human life will likely include considerations for: 1) intrinsic value, 2) the value of someone’s positive experiences, 3) social value to others, 4) potential for earning money (on the negative side, potential for stealing), 5) potential non-paid labor (on the negative side, potential for violence, killing and abuse), 6) cuteness/attractiveness, 7) reproductive potential, and 8) setting a good example for others and inspiring effort.