This is a special post for quick takes by ProgramCrafter. Only they can create top-level comments. Comments here also appear on the Quick Takes page and All Posts page.
17 comments, sorted by Click to highlight new comments since:

When someone optimizes utility of a group of agents, all the utilities need to be combined. Taking sum of all utilities can create an issue where the world is optimized according to single agent's utility function at the cost of others, if that function increases fast enough.

It's probably better to maximize  - this way the top utility does not take over the full function since the difference can only add a finite amount of utility, but is still considered (so that there is incentive to improve the top utility if the minimal one has run into limitations).

Though, this creates a problem that agents would try to divide the utility function by some number to make their function considered more. It's necessary to normalize the functions in some way, and I'm not sure how to do that.

if that function increases fast enough

Nit: I'm not seeing how "increase" is well defined here, but I probably know what you mean anyways.

I thought we were talking about combining utility functions, but I see only one utility function here, not counting the combined one:

U_{combined} = \min{U_{agent}} + \arctan(\max{U_{agent}} - \min{U_{agent}})

If I wanted to combine 2 utility functions fairly, I'd add them, but first I'd normalize them by multiplying each one by the constant that makes its sum over the set of possible outcomes equal to 1. In symbols:

U_combined(o) = U_1(o) / (\sum_{o2 in O} U_1(o2)) + U_2(o) / (\sum_{o2 in O} U_2(o2)) for all o in O where O is the set of outcomes (world states or more generally world histories).

Wouldn't that break if the sum (or integral) of an agent's utility function over the world-state space was negative? Normalization would reverse that agent's utility.

The situation becomes even worse if the sum over all outcomes is zero.

Good catch.

However, if you and I have the seed of a super-intelligence in front of us, waiting only on our specifying a utility function and for us to press the "start" button, then if we can individually specify what we want for the world in the form of a utility function, then it would prove easy for us to work around the first of the two gotchas you point out.

As for the second gotcha, if we were at all pressed for time, I'd go ahead with my normalization method on the theory that the probability of the sum's turning out to be exactly zero is very low.

I am interested however in hearing from readers who are better at math than I: how can the normalization method can be improved to remove the two gotchas?

ADDED. What I wrote so far in this comment fails to get at the heart of the matter. The purpose of a utility function is to encode preferences. Restricting our discourse to utility functions such that for every o in O, U(o) is a real number greater than zero and less than one does not restrict the kinds of preferences that can be encoded. And when we do that, every utility function in our universe of discourse can be normalized using the method already given--free from the two gotchas you pointed out. (In other words, instead of describing a gotcha-free method for normalizing arbitrary utility functions, I propose that we simply avoid defining certain utility functions that might be trigger one of the gotchas.)

Specifically, if o_worst is the worst outcome according to the agent under discussion and o_best is its best outcome, set U(o_worst)=0, U(o_best)=1 and for every other outcome o, set U(o) = p where p is the probability for which the agent is indifferent between o and the lottery [p, o_best; 1-p, o_worst].

Specifically, if o_worst is the worst outcome according to the agent under discussion and o_best is its best outcome, set U(o_worst)=0, U(o_best)=1 and for the other outcomes o, set U(o) = p where p is the probability for which the agent is indifferent between o and the lottery [p, o_best; 1-p, o_worst].

That's a nice workaround!

https://en.wikipedia.org/wiki/Prioritarianism and https://www.lesswrong.com/posts/hTMFt3h7QqA2qecn7/utilitarianism-meets-egalitarianism may be relevant here. Are you familiar with https://www.lesswrong.com/s/hCt6GL4SXX6ezkcJn as well? I'm curious how you'd compare the justification for your use of arctan to the justifications in those articles.

Thank you for those articles!

It seems that "Unifying Bargaining" sequence relies on being able to denominate utility in some units that can actually be obtained and suggested as trade to any party so that party would get worse results claiming to have other utility function (with the same preference ordering but with other values).

In humans, and perhaps all complex agents, utility is an unmeasurable abstraction about multidimensional preferences and goals.  It can't be observed, let alone summed or calculated.  It CAN be modeled and estimated, and it's fair to talk about aggregation functions of one's estimates of utilities, or about aggregation of self-reported estimates or indications from others.

It is your own modeling choice to dislike the outcome of outsized influence via larger utility swings in some participants.  How you normalize it is a preference of yours, not an objective fact about the world.

Yes, this is indeed a preference of mine (and other people as well), and I'm attempting to find the way to combine utilities that is as good as possible according to my and other people preferences (so that it can be incorporated into AGI, for example).

Continuing to make posts into songs! I believe I'm getting a bit better, mainly in rapid-lyrics-writing; would appreciate pointers how to improve further.

  1. https://suno.com/song/ef734c80-bce6-4825-9906-fc226c1ea5b4 (based on post Don't teach people how to reach the top of a hill)
  2. https://suno.com/song/c5e21df5-4df7-4481-bbe3-d0b7c1227896 (based on post Effectively Handling Disagreements - Introducing a New Workshop)

Also, if someone is against me creating a musical form of your post, please say so! I don't know beforehand which texts would seem easily convertible to me.

The LessWrong's AI-generated album was surprisingly nice and, even more importantly, pointed out the song generator to me! (I've tried to find one a year ago and failed)

So I've decided to try my hand on quantum mechanics sequence. Here's what I have reached yet: https://app.suno.ai/playlist/81b44910-a9df-43ce-9160-b062e5b080f8/. (10 generated songs, 3 selected, unfortunately not the best quality)

Anthropics - starting from scratch

I'm trying to derive coherent scheme of anthropic reasoning from scratch, posting my thoughts here as I go. Pointing flaws out is welcome!

The anthropic trilemma (https://www.lesswrong.com/posts/y7jZ9BLEeuNTzgAE5/the-anthropic-trilemma)

So here's a simple algorithm for winning the lottery:

Buy a ticket.  Suspend your computer program just before the lottery drawing - which should of course be a quantum lottery, so that every ticket wins somewhere.  Program your computational environment to, if you win, make a trillion copies of yourself, and wake them up for ten seconds, long enough to experience winning the lottery.  Then suspend the programs, merge them again, and start the result.  If you don't win the lottery, then just wake up automatically.

The odds of winning the lottery are ordinarily a billion to one.  But now the branch in which you win has your "measure", your "amount of experience", temporarily multiplied by a trillion.  So with the brief expenditure of a little extra computing power, you can subjectively win the lottery - be reasonably sure that when next you open your eyes, you will see a computer screen flashing "You won!"  As for what happens ten seconds after that, you have no way of knowing how many processors you run on, so you shouldn't feel a thing.

Yes, there is something like paradox on whether you should anticipate winning or losing the lottery. So let's taboo "anticipate":

Postulate 1. Anticipating (expecting) something is only relevant to decision making (for instance, expected utility calculation).

So to measure the expectation one needs to run a prediction market. The price of an outcome share is equal to the probability of its outcome, so that expected payoff of each share is exactly zero. For sake of argument, let's assume that the market is unmovable.

Once the branch when you won was split into trillion copies, you can expect winning the lottery at odds 1000:1. So your 10^12 + 10^9 copies (trillion who have won the lottery, billion who have lost) will each buy 1 "WIN" share at price of 1000/1001.

Now, let's consider what should happen during merge for such expectation to be sound. If your trillion copies are merged into one who has bought 1 "WIN" share, then the summary profit is 1 - (10^9 + 1) * 1000/1001, which is clearly negative. Seems like something went wrong and you shouldn't have anticipated 1000:1 odds of winning?

The different merging procedure makes everything right. In it, you need to sum all purchased "WIN" shares, so that your winning copy has 10^12 "WIN" shares and each of 10^9 losing copies has one share. The summary profit is 10^12 - (10^9 + 10^12) * 1000 / 1001 = 0!

That can be translated as "if merging procedure will multiply your utility of an outcome by number of merged copies, then you can multiply the odds by copies count". On the other hand, if during merge all copies except one are going to be ignored, then the right probability to expect is 1:10^9 - totally unchanged. And if you don't know the merging procedure, you'll have to use some priors and calculate the probabilities based on that.

Why doesn't this affect real life much?

I guess that splitting rate (creation of new Boltzmann brains, for example) and merging rate are approximately equal so all the possible updates cancel each other.

I've just noticed that Harry James Potter-Evans-Verres has ran into one decision theory issue.

After going to Azkaban (in chapter 65), professor Quirrell suggested Harry to run a play with fake Voldemort to gain power in Britain. Harry was not sure whether the real Voldemort was alive and wanted to know that first. He thought at that problem and decided that in both cases the optimal way was not to run the play, and on that basis he concluded that he should not fight with impostor Dark Lord.

However, two propositions "it is optimal not to fight impostor Dark Lord if real one is alive" and "it is optimal not to fight impostor Dark Lord if real one is dead" are insufficient to reach that conclusion; that requires a hidden assumption "real Voldemort being alive does not depend on Harry's choice". (In that regard, the problem is similar to Newcomb's paradox.) And actually, Tom Riddle's decision to become Voldemort again could have depended on Harry's choice pretty heavily.

It is hard to notice that problem, though Harry could have done that; in Azkaban, he understood that events happening around do depend on his decision process.

A sufficient condition for a good [possibly AI-gen] plan

Current posts mainly focus on necessary properties of plans they'd like to see/be executed. I suggest a sufficient condition:

Plan is good, should be acted upon, etc at least when it is endorsed in advance, endorsed in retrospect and endorsed in counterfactual.

  1. Endorsed in advance: everyone relevant hears the plan and possible outcomes in advance, evaluates acceptability and accepts the plan.

  2. Endorsed in retrospect: everyone relevant looks upon intended outcomes, checks what happened actually, evaluates plan and has no regret.

  3. Endorsed in counterfactual: given choice in a set of plans, person would evaluate the specific plan as acceptable - somewhat satisfying them, not inducing much desire to switch.

Choice according to these criteria is still hard, but it should be a bit less mysterious.

Anthropics based on prediction markets - Part 2

Follow-up to https://www.lesswrong.com/posts/xG98FxbAYMCsA7ubf/programcrafter-s-shortform?commentId=ySMfhW25o9LPj3EqX.

Which Questions Are Anthropic Questions? (https://www.lesswrong.com/posts/SjEFqNtYfhJMP8LzN/which-questions-are-anthropic-questions)

1. The Room Assignment Problem

You are among 100 people waiting in a hallway. The hallway leads to a hundred rooms numbered from 1 to 100. All of you are knocked out by a sleeping gas and each put into a random/unknown room. After waking up, what is the probability that you are in room No. 1?

2. The Incubator

An incubator enters the hallway. It will enter room No.1 and creat a person in it then does the same for the other 99 rooms.  It turns out you are one of the people the incubator has just created. You wake up in a room and is made aware of the experiment setup. What is the probability that you are in room No.1?

3. Incubator + Room Assignment

This time the incubator creats 100 people in the hall way, you are among the 100 people created. Each person is then assigned to a random room. What is the probability that you are in Room 1?

In all of those cases, betting YES on probability 1% is coherent in the sense that it leads to zero expected profit: each of the people buys 1 "ROOM-1" share at price of 1/100, and one of them wins, getting back 1 unit of money.

To phrase it better: You find yourself in room N, how many total rooms are there?

I know UDASSA accounts for the description length of the room address, but remember that given a number of rooms, each room will have the same description length. If there are 64 rooms, then room 1 will have address "000000" and not simply "0" or "1".

This way if you find yourself in a room, without knowing how many total rooms there are, and only knowing your room number, then you write it out in binary and take 2 to the bit-length of your room's address. For ex, you find yourself in room number "100111", 6 bits. So with 50% chance, there will be 64 rooms in total. Then you add an extra bit with 50% of the remaining measure (25%), 128 rooms, and repeat. If the payout doesn't scale with the number of rooms, then 64 rooms would be the most profitable bet. It's easy to test this either irl, or with a python script.

python script: https://pastebin.com/b41Sa6s6

After doing so, I got unexpected results: given your room number, the most likely number of total rooms is a number whose description length is one-bit longer than the description length of your room. Weird.

The experiment is commonly phrased in non-anthropic way by statisticians: there are many items getting sequential unique numbers, starting from 1. You get to see a single item's number  and have to guess how many items are there, and the answer is . (Also, there are ways to guess count of items if you've seen more than one index)