This isn't really 'upgrading' moral theories by taking account of moral intuitions, but rather 'trying to patch them up to accord with a first order intuition of mine, regardless of distortions elsewhere'.
You want to avoid wireheading like scenarios by saying (in effect) 'fulfilled humane values have lexical priority over non-humane values', so smaller populations of happy populations of humans can trump very large populations of happy mice, even if that would be a better utilitarian deal (I'm sceptical that's how the empirical calculus plays out, but that's another issue).
This just seems wrong on its face: it seems a wrong to refuse to trade one happy human for e.g. a trillion (or quadrillion, 3^^3) happy mice (or octopi, or dolphins - just how 'humane' are we aiming for?). It seems even worse to trade one happy human (although you don't specify, this could be 'barely positive welfare' human life) for torturing unbounded numbers of mice or dolphins.
You seem to find the latter problem unacceptable (although I think you should find the first really bad too), so you offer a different patch: although positive human welfare lexically dominates positive animal welfare, it doesn't dominate negative animal welfare, which can be traded off, so happy humans at the expense of vast mice-torture is out. Yet this has other problems due to weird separability violations. Adjusting the example you give for clarity:
1) 100 A 2) 1H + 100A - 50A 3) 0 4) 1H - 50A
The account seems to greenlight the move from 1 -->2, as the welfare of the animal population remains positive, and although it goes down, we don't care because human welfare trumps it. However, we cannot move from 3-4, as the total negative animal welfare trumps the human population. This seems embarrassing, particularly as our 100 happy animal welfare could be on an adjecent supercluster, and so virtually any trade of human versus animal interests is rendered right or wrong by total welfare across the universe.
As far as I can see, the next patch you offer (never accept a population with overall negative welfare is better than one with overall positive welfare) won't help you avoid these separability concerns: 4 is still worse than 3, as it is net negative, but 2 is net positive, so is still better than 1.
You want to assert nigh-deontological 'do not do this if X' to avoid RC-like edge cases. The problem with these sorts of rules is you need to set them to strict lexical priority so they really rule out the edge case (as you rightly point out, if you, like I, take the 'human happiness has a high weight' out, you still are forced to replace humans with stoned mice so long as enough stoned mice are on the other end of the scales). Yet lexical priority seems to lead to another absurd edge case where you have to refuse even humongous trade-offs: is one happy human really worth more than a gazillion happy mice? Or indeed torturing trillions of dolphins?
So then you have to introduce other nigh-deontological trumping rules to rule out these nasty edge cases (e.g. 'you can trade any amount of positive animal welfare for any increase in human welfare, but negative welfare has a 1-1 trade with human welfare'). But then these too have nasty edge cases, and worse, you can (as deontologists find) find these rules clash to give you loss of transitivity, loss of separability, dependence on irrelevant alternatives, etc.
FWIW, I think the best fix is empirical: just take humans to be capable of much greater positive welfare states than animals, such that you have to make really big trades of humans to animals to be worth it. Although these leaves open the possibility we'd end up with little complex value, it does not seem a likely possibliity (I trade humans to mice at ~ 1000000 : 1, and I don't think mice are 1000000x cheaper). It also avoids getting edge cased, and thanks to classical util cardinally ordering states of affairs, you don't violate transitivity or separability.
As a second recommendation (with pre-emptive apologies for offense: I can't find a better way of saying this), I'd recommend going back to the population ethics literature (and philosophy generally) rather than trying to reconstitute ethical theory yourself. I don't think you know enough or have developed enough philosophical skill to make good inroads into these problems, and the posts you've made so far I think are unlikely to produce anything interesting or important in the field, and are considerably inferior to academic work ongoing.
By the way, 3^^3 = 3^27 is "only" 7625597484987, which is less than a quadrillion. If you want a really big number, you should add a third arrow (or use a higher number than three).
Like many members of this community, reading the sequences has opened my eyes to a heavily neglected aspect of morality. Before reading the sequences I focused mostly on how to best improve people's wellbeing in the present and the future. However, after reading the sequences, I realized that I had neglected a very important question: In the future we will be able to create creatures with virtually any utility function imaginable. What sort of values should we give the creatures of the future? What sort of desires should they have, from what should they gain wellbeing?
Anyone familiar with the sequences should be familiar with the answer. We should create creatures with the complex values that human beings possess (call them "humane values"). We should avoid creating creatures with simple values that only desire to maximize one thing, like paperclips or pleasure.
It is important that future theories of ethics formalize this insight. I think we all know what would happen if we programmed an AI with conventional utilitarianism: It would exterminate the human race and replace them with creatures whose preferences are easier to satisfy (if you program it with preference utilitarianism) or creatures whom it is easier to make happy (if you program it with hedonic utilitarianism). It is important to develop a theory of ethics that avoids this.
Lately I have been trying to develop a modified utilitarian theory that formalizes this insight. My focus has been on population ethics. I am essentially arguing that population ethics should not just focus on maximizing welfare, it should also focus on what sort of creatures it is best to create. According to this theory of ethics, it is possible for a population with a lower total level of welfare to be better than a population with a higher total level of welfare, if the lower population consists of creatures that have complex humane values, while the higher welfare population consists of paperclip or pleasure maximizers. (I wrote a previous post on this, but it was long and rambling, I am trying to make this one more accessible).
One of the key aspects of this theory is that it does not necessarily rate the welfare of creatures with simple values as unimportant. On the contrary, it considers it good for their welfare to be increased and bad for their welfare to be decreased. Because of this, it implies that we ought to avoid creating such creatures in the first place, so it is not necessary to divert resources from creatures with humane values in order to increase their welfare.
My theory does allow the creation of simple-value creatures for two reasons. One is if the benefits they generate for creatures with humane values outweigh the harms generated when humane-value creatures must divert resources to improving their welfare (companion animals are an obvious example of this). The second is if creatures with humane values are about to go extinct, and the only choices are replacing them with simple value creatures, or replacing them with nothing.
So far I am satisfied with the development of this theory. However, I have hit one major snag, and would love it if someone else could help me with it. The snag is formulated like this:
1. It is better to create a small population of creatures with complex humane values (that has positive welfare) than a large population of animals that can only experience pleasure or pain, even if the large population of animals has a greater total amount of positive welfare. For instance, it is better to create a population of humans with 50 total welfare than a population of animals with 100 total welfare.
2. It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.
3. However, it seems like, if creating human beings wasn't an option, that it might be okay to create a very large population of animals, the majority of which have positive welfare, but the some of which are in pain. For instance, it seems like it would be good to create a population of animals where one section of the population has 100 total welfare, and another section has -75, since the total welfare is 25.
The problem is that this leads to what seems like a circular preference. If the population of animals with 100 welfare existed by itself it would be okay to not create it in order to create a population of humans with 50 welfare instead. But if the population we are talking about is the one in (3) then doing that would result in the population discussed in (2), which is bad.
My current solution to this dilemma is to include a stipulation that a population with negative utility can never be better than one with positive utility. This prevents me from having circular preferences about these scenarios. But it might create some weird problems. If population (2) is created anyway, and the humans in it are unable to help the suffering animals in any way, does that mean they have a duty to create lots of happy animals to get their population's utility up to a positive level? That seems strange, especially since creating the new happy animals won't help the suffering ones in any way. On the other hand, if the humans are able to help the suffering animals, and they do so by means of some sort of utility transfer, then it would be in the best interests to create lots of happy animals, to reduce the amount of utility each person has to transfer.
So far some of the solutions I am considering include:
1. Instead of focusing on population ethics, just consider complex humane values to have greater weight in utility calculations than pleasure or paperclips. I find this idea distasteful because it implies it would be acceptable to inflict large harms on animals for relatively small gains for humans. In addition, if the weight is not sufficiently great it could still lead to an AI exterminating the human race and replacing them with happy animals, since animals are easier to take care of and make happy than humans.
2. It is bad to create the human population in (2) if the only way to do so is to create a huge amount of suffering animals. But once both populations have been created, if the human population is unable to help the animal population, they have no duty to create as many happy animals as they can. This is because the two populations are not causally connected, and that is somehow morally significant. This makes some sense to me, as I don't think the existence of causally disconnected populations in the vast universe should bear any significance on my decision-making.
3. There is some sort of overriding consideration besides utility that makes (3) seem desirable. For instance, it might be bad for creatures with any sort of values to go extinct, so it is good to create a population to prevent this, as long as its utility is positive on the net. However, this would change in a situation where utility is negative, such as in (2).
4. Reasons to create a creature have some kind complex rock-paper-scissors-type "trumping" hierarchy. In other words, the fact that the humans have humane values can override the reasons to create a happy animals, but they cannot override the reason to not create suffering animals. The reasons to create happy animals, however, can override the reasons to not create suffering animals. I think that this argument might lead to inconsistent preferences again, but I'm not sure.
I find none of these solutions that satisfying. I would really appreciate it if someone could help me with solving this dilemma. I'm very hopeful about this ethical theory, and would like to see it improved.
*Update. After considering the issue some more, I realized that my dissatisfaction came from equivocating two different scenarios. I was considering the scenario, "Animals with 100 utility and animals with -75 utility are created, no humans are created at all" to be the same as the scenario "Humans with 50 utility and animals with -75 utility are created, then the humans (before the get to experience their 50 utility) are killed/harmed in order to create more animals without helping the suffering animals in any way" to be the same scenario. They are clearly not.
To make the analogy more obvious, imagine I was given a choice between creating a person who would experience 95 utility over the course of their life, or a person who would experience 100 utility over the course of their life. I would choose the person with 100 utility. But if the person destined to experience 95 utility already existed, but had not experienced the majority of that utility yet, I would oppose killing them and replacing them with the 100 utility person.
Or to put it more succinctly, I am willing to not create some happy humans to prevent some suffering animals from being created. And if the suffering animals and happy humans already exist I am willing to harm the happy humans to help the suffering animals. But if the suffering animals and happy humans already exist I am not willing to harm the happy humans to create some extra happy animals that will not help the existing suffering animals in any way.