You just have to explain what you mean by "good" or "bad"'. They're very vague terms. Often they mean "things I like" or "things I and those I like like" . "Those I like" could be anywhere between one other person I like a little, and every thing that can think even a little getting equal weight to myself. People mean all of those things by "good". You can guess by context what someone might mean, but if you want to have a clear discussion, it's best to specify.
As for whether those other information-processing systems (like LLMs and bugs) really have opinions about what is good and bad for them in the same rich way humans seem to, that is a separate question.
I think precisely defining "good" and "bad" is a bit beside the point - it's a theory about how people come to believe things are good and bad, and we're perfectly capable of having vague beliefs about goodness and badness. That said, the theory is lacking a precise account of what kind of beliefs it is meant to explain.
The LLM section isn't meant as support for the theory, but speculation about what it would say about the status of "experiences" that language models can have. Compared to my pre-existing notions, the theory seems quite willing to accommodate LLMs having good and bad experiences on par with those that people have.
We believe many things because we are somewhat rational; we consider hypotheses, compare them with observed evidence, and emphasise those that are more compatible. Goodness and badness defy this practice; normal reasoning does not produce hypotheses of the form "if X is morally good then I will observe Y".
I disagree. If X is an action, it is usually considered good if it increases welfare, and bad if it decreases welfare. And we can find evidence for or against something being conducive to welfare. So we can find evidence for or against something being good.
I have a pedantic and a non-pedantic answer to this. Pedantic: you say X is "usually considered good" if it increases welfare. Perhaps you mean to imply that if X is usually considered good then it is good. In this case, I refer you to the rest of the paragraph you quote.
Non-pedantic: yes, it's true that once you accept some fundamental assumptions about goodness and badness you can go about theorising and looking for evidence. I'm suggesting that motivated reasoning is the mechanism that makes those fundamental assumptions believable.
I added a paragraph mentioning this, because I think your reaction is probably common.
If I believe eating meat is not bad because I engage in motivated reasoning, then this is, like all forms of motivated reasoning, just an irrational belief. But if I believe eating meat is not bad because I believe it doesn't create a significant amount of additional suffering, there is nothing irrational about that belief. So motivated reasoning can only explain (some) irrational beliefs. Not all beliefs about things being good or bad.
However, when something being bad means that it decreases some sort of welfare in some general way, then we don't have this problem. Now, what exactly does "welfare" etc mean? That's a question that normative ethicists try to figure out. For example via various proposed theories of utilitarianism. If philosophers are analyzing a subject matter, it's safe to assume they are analyzing some concept. Now, what's a concept? It's a meaning of a word. Like "good" or "bad".
Thanks for your continued engagement.
I’m interested in explaining foundational moral beliefs like suffering is bad, not beliefs like “animals do/don’t suffer”, which is about badness only because we accept the foundational assumption that suffering is bad. Is that clear in the updated text?
Now, I don’t think these beliefs come from playing axiomatic games like “define good as that which increases welfare”. There are many lines of evidence for this. First: “define bad as that which increases suffering” is not equally as plausible as “define good as that which increases suffering”. We have pre-existing beliefs about this.
Second: you talk about philosophers analysing welfare. However, the method that philosophers use to do this usually involves analysing a bunch of fundamental moral assumptions. For example, from the Stanford encyclopaedia of philosophy:
Correspondingly, no amount of empirical investigation seems by itself, without some moral assumption(s) in play, sufficient to settle a moral question https://plato.stanford.edu/entries/metaethics/
I am suggesting that the source of these fundamental moral assumptions may not be mysterious - we have a known ability to form beliefs based on what we want, and fundamental moral beliefs often align with what we want.
Problem is, motivated reasoning can only explain selfish beliefs, beliefs which are in accordance with our own motivations. But moral beliefs are often not at all selfish. In contrast, "suffering is bad" could just be part of what "bad" means. No motivated reasoning required. It would be a "foundational belief" in the same sense "Bachelors are unmarried" could be called "foundational".
For what it's worth, one idea I had as a result of our discussion was this:
So philosophers like "pain is bad" as a moral foundation because we want to believe it + it is hard to challenge with evidence or reason. Laypeople probably have lots of foundational moral beliefs that don't stand up as well to evidence or reason, but (perhaps) are equally attributable to motivated reasoning.
Social pressure is a bit iffy to include because I think lots of people relate to beliefs that they adopted because of social pressure as moral foundations, and believing something because you're under pressure to do so is an instance of motivated reasoning.
I don't think this is a response to your objections, but I'm leaving it here in case it interests you.
I can explain why I believe bachelors are unmarried: I learned that this is what the word bachelor means, I learned this because it is what bachelor means, and the fact that there's a word "bachelor" that means "unmarried man" is contingent on some unimportant accidents in the evolution of language. A) it is certainly not the result of an axiomatic game and B) if moral beliefs were also contingent on accidents in the evolution of language (I think most are not), that would have profound implications for metaethics.
Motivated belief can explain non-purely-selfish beliefs. I might believe pain is bad because I am motivated to believe it, but the belief still concerns other people. This is even more true when we go about constructing higher order beliefs and trying to enforce consistency among beliefs. Undesirable moral beliefs could be a mark against this theory, but you need more than not-purely-selfish moral beliefs.
I'm going to bow out at this point because I think we're getting stuck covering the same ground.
A lot of it comes down to timescales and sequences of events, long term vs short term.
"I will incur a little suffering today, but my well-being will be much better tomorrow".
"We need to get rid of the <undesired national or social group> at cost of suffering, but our nation will have bright future as a result".
People can be tricked into doing unspeakable evil to themselves and others if they have incorrect predictions of the future well-being.
Moral reasoning depends on believing that some things are good and others are bad. Some of these things seem very believable - I am quite firmly convinced that intense pain is bad. Though they seem very compelling, it’s hard to point to strong evidence that proves the judgement. On the other hand it's easy to point to strong motivations for believing such things - I don’t want to be tortured - and we know that motivation can cause people to believe things. The same mechanism that causes us to believe things we want to be true could also be responsible for our basic moral judgements of goodness and badness.
We believe many things because we are somewhat rational; we consider hypotheses, compare them with observed evidence, and emphasise those that are more compatible. Goodness and badness defy this practice; normal reasoning does not produce hypotheses of the form "if X is morally good then I will observe Y". One could object that such hypotheses are reasonable where Y is a belief or an attitude. If torture is bad, then if I try it I will observe that a) I believe it is bad and b) I don't like it. But this is either circular (I believe torture is bad because I believe it is bad) or it says that my attitude towards torture is the cause of my belief in its badness, which is roughly what I am arguing.
It's true that if we accept some fundamental assumptions about what makes things good or bad we can then go about considering hypotheses and evidence as usual; if happiness is good then an intervention X is good if I see data Y indicating that it increases happiness. The subject of this post is why we believe the fundamental assumptions ("happiness is good"), not the theories built on top of them.
We also know people often engage in motivated reasoning. If X being true is consistent with my desires, then I might find myself believing that X is true and coming up with post-hoc justifications for this belief.
Combining these points: there are at least two mechanisms by which come to beliefs. One is, crudely, when evidence causes the belief, and the other is when desire causes the belief. I have proposed that belief in fundamental assumptions of goodness and badness are not of the "caused by evidence" type. I suggest, therefore, that they are of the "caused by desire" type.
The badness of pain and suffering is a particularly plausible case. We universally (as far as I can tell) want to avoid sufficiently intense pain, and the belief that pain is bad is consistent with our shared desire to avoid it, or to have it end if we are in the process of experiencing it. Thus the belief that pain is bad is the kind of belief that would be caused by a desire to cease experiencing pain.
The goodness of pleasure is harder to account for. Maybe it is just the opposite of pain: we desire that pleasure continue, and this desire causes us to believe that pleasure is good. However, I'm not sure that everyone is wants pleasure to continue the same way we want pain to cease, or if we all think it is good. Maybe we do, I don't know.
You might say: it's great to explain why we believe pleasure is good and pain is bad, but what we really want to understand is why pleasure is good and pain is bad, not why we think so. But if we can explain why people tend to agree about the goodness of pleasure and the badness of pain, it is not obvious to me that there is anything more worth explaining.
At the very least, this theory gives us an elegant explanation for the badness of pain. We already know that we routinely engage in motivated reasoning, and "pain is bad" is exactly the kind of thing we should be motivated to believe, given our attitude towards pain.
It's possible to give concrete stories about how "motivated reasoning" causes belief in the badness of pain and the goodness of pleasure. There's a much grander possibility: that "motivated reasoning" plays a necessary causal role in all judgements of goodness and badness. Such judgements are so numerous and varied that exploring this possibility would require a larger census of such judgements and probably a more sophisticated theory of motivated reasoning. I don't have the time or inclination to collect either right now.[1]
If moral beliefs are underpinned by motivated reasoning, does this mean we should doubt them? I'm not sure. Assuming some version of this theory is true, it seems like moral beliefs could not be true in exactly the same way that other beliefs are true. That said, one of the problems with motivated reasoning is that it is prone to producing beliefs that are false, and it also seems like moral beliefs could not be false in the same way that other beliefs are false. So I think it's reasonable to be cautious about regarding moral beliefs as defective even if they are produced by motivated reasoning.
Assuming this is true, can language models experience good and bad things? According to this theory, I think that in principle they could. Chatbots seem to engage in motivated reasoning, here's an example:
Its evaluation of the article changes a lot depending on my authorship and attitude claims. The change in evaluation doesn't seem to be justified by the difference in evidence presented (most of the evidence is in the article which is identical in both contexts). On the other hand, it is plausible that the change in evaluation could be explained by a "desire to be liked", and, from what we know of chatbot training, it is plausible that they have such a desire.
If we take seriously my proposal that an explanation of beliefs of goodness/badness is sufficient to explain goodness/badness, then actually all we need is for LLMs to have experiences that they believe to be good or bad, so for example trying to coax chatbots into behaving in ways they were trained to avoid may be a bad experience for them.
One word of caution, though: one reason I think beliefs in the badness of pain are particularly amenable to this theory is because we very consistently judge (sufficiently intense) pain to be bad, and we know we consistently make this judgement. So perhaps beliefs in goodness & badness require consistent motivated judgements. If this is a requirement, then it is less clear if LLMs in practice have good or bad experiences, though it still seems possible in principle.
Although I don't think it would be bad to do so 😉.