All of elspood's Comments + Replies

This is a great draft and you have collated many core ideas. Thank you for doing this!

As a matter of practical implementation, I think it's a good idea to always have a draft of official, approved statements of capabilities that can be rehearsed by any individual who may find themselves in a situation where they need to discuss them. These statements can be thoroughly vetted for second- and higher-order information leakage ahead of time, instead of trying to evaluate in real-time what their statements might reveal. It can be counterproductive in many circu... (read more)

I'm glad you found it useful, even in this form. If the thing you're working on is something you could share, I'd be happy to offer further assistance, if you like.

1tamgent
Thanks kindly for the offer, I will DM you
elspood110

Obviously this can't be answered with justice in a single comment, but here are some broad pointers that might help see the shape of the solution:

  • Israeli airport security focuses on behavioral cues, asking unpredictable questions, and profiling. A somewhat extreme threat model there, with much different base rates to account for (but also much lower traffic volume).
  • Reinforced cockpit doors address the hijackers with guns and knives scenarios, but are a fully general kind of a no-brainer control.
  • Good policework and better coordination in law enforcement are
... (read more)
1[anonymous]
This is very interesting. Thanks for taking the time to explain :)

I appreciate the nudge here to put some of this into action. I hear alarm bells when thinking about formalizing a centralized location for AI safety proposals and information about how they break, but my rough intuition is that if there is a way these can be scrubbed of descriptions of capabilities which could be used irresponsibly to bootstrap AGI, then this is a net positive. At the very least, we should be scrambling to discuss safety controls for already public ML paradigms, in case any of these are just one key insight or a few teraflops away from being world-ending.

I would like to hear from others about this topic, though; I'm very wary of being at fault for accelerating the doom of humanity.

My project seems to have expired from the OWASP site, but here is an interactive version that should have most of the data:

https://periodictable.github.io/

You'll need to mouse over the elements to see the details, so not really mobile friendly, sorry.

I agree that linters are a weak form of automatic verification that are actually quite valuable. You can get a lot of mileage out of simply blacklisting unsafe APIs and a little out of clever pattern matching.

3tamgent
I just want to let you know that this table was really useful for me for something I'm working on. Thank you for making it.
1tamgent
Thanks for sharing, this is a really nice resource for a number of problems and solutions.

I would say that some formal proofs are actually impossible, but would agree that software with many (or even all) of the security properties we want could actually have formal-proof guarantees. I could even see a path to many of these proofs today.

While the intent of my post was to draw parallel lessons from software security, I actually think alignment is an oblique or orthogonal problem in many ways. I could imagine timelines in which alignment gets 'solved' before software security. In fact, I think survival timelines might even require anyone who migh... (read more)

3redlizard
Plausible. In the aftermath of spectre and meltdown I spent a fair amount of time thinking on how you could formally prove a piece of software to be free of information-leaking side channels, even assuming that the same thing holds for all dependent components such as underlying processors and operating systems and the like, and got mostly nowhere. Does that include those working on software correctness and reliability in general, without a security focus? I would expect better tools for making software that is free of bugs, such as programs that include correctness proofs as well as some of the lesser formal methods, to be on the critical path to survival -- for the simple reason that any number of mundane programming mistakes in a supposedly-aligned AI could easily kill us all. I was under the impression that you agree with this ["Assurance Requires Formal Proofs"]. I expect formal proofs of security in particular to be largely a corollary of this -- a C program that is proven to correctly accomplish any particular goal will necessarily not have any buffer overflows in it, for this would invoke undefined behavior which would make your proof not go through. This does not necessarily apply to all security properties, but I would expect it to apply to most of them.

The halting problem only makes it impossible to write a program that can analyze a piece of code and then reliably say "this is secure" or "this is insecure".

It would be nice to able to have this important impossible thing. :)

I think we are trying to say the same thing, though. Do you agree with this more concise assertion?

"It's not possible to make a high confidence checker system that can analyze an arbitrary specification, but it is probably possible (although very hard) to design systems that can be programmatically checked for the important qualities of alignment that we want, if such qualities can also be formally defined."

9redlizard
Yes, I agree with this. I cannot judge to what degree I agree with your strategic assessment of this technique, though. I interpreted your top-level post as judging that assurances based on formal proofs are realistically out of reach as a practical approach; whereas my own assessment is that making proven-correct [and therefore proven-secure] software a practical reality is a considerably less impossible problem than many other aspects of AI alignment, and indeed one I anticipate to actually happen in a timeline in which aligned AI materializes.

I would agree that some people figured this out faster than others, but the analogy is also instructional here: if even a small community like the infosec world has a hard time percolating information about failure modes and how to address them, we should expect the average ML engineer to be doing very unsafe things for a very long time by default.

To dive deeper into the XSS example, I think even among those that understood the output encoding and canonicalization solutions early, it still took a while to formalize the definition of an encoding context con... (read more)

I think you make good points generally about status motives and obstacles for breakers. As counterpoints, I would offer:

  • Eliezer is a good example of someone who built a lot of status on the back of "breaking" others' unworkable alignment strategies. I found the AI Box experiments especially enlightening in my early days.
  • There are lots of high-status breakers, and lots of independent status-rewarding communities around the security world. Some of these are whitehat/ethical, like leaderboards for various bug bounty programs, OWASP, etc. Some of them not so m
... (read more)
5John_Maxwell
Fair enough. Yeah personally building feels more natural to me. I agree a leaderboard would be great. I think it'd be cool to have a leaderboard for proposals as well -- "this proposal has been unbroken for X days" seems like really valuable information that's not currently being collected. I don't think I personally have enough clout to muster the coordination necessary for a tournament or leaderboard, but you probably do. One challenge is that different proposals are likely to assume different sorts of available capabilities. I have a hunch that many disagreements which appear to be about alignment are actually about capabilities. In the absence of coordination, I think if someone like you was to simply start advertising themselves as an "uberbreaker" who can shoot holes in any proposal, and over time give reports on which proposals seem the strongest, that could be really valuable and status-rewarding. Sort of a "pre-Eliezer" person who I can run my ideas by in a lower stakes context, as opposed to saying "Hey Eliezer, I solved alignment -- wallop me if I'm wrong!"
elspood150

Many! Thanks for sharing. This could easily turn into its own post.

In general, I think this is a great idea. I'm somewhat skeptical that this format would generate deep insights; in my experience successful Capture the Flag / wargames / tabletop exercises work best in the form where each group spends a lot of time preparing for their particular role, but opsec wargames are usually easier to score, so the judge role makes less sense there. That said, in the alignment world I'm generally supportive of trying as many different approaches as possible to see wh... (read more)

4Yitz
I wasn’t aware you were offering a bounty! I rarely check people’s profile pages unless I need to contact them privately, so it might be worth mentioning this at the beginning or end of posts where it might be relevant.

Thanks for the reply!

As some background on my thinking here, last I checked there are a lot of people on the periphery of the alignment community who have some proposal or another they're working on, and they've generally found it really difficult to get quality critical feedback. (This is based on an email I remember reading from a community organizer a year or two ago saying "there is a desperate need for critical feedback".)

I'd put myself in this category as well -- I used to write a lot of posts and especially comments here on LW summarizing how I'd g... (read more)

I definitely wouldn't rule out the possibility of being able to formally define a set of tests that would satisfy our demands for alignment. The most I could say with certainty is that it's a lot harder than eliminating software security bug classes. But I also wouldn't rule out the possibility that an optimizing process of arbitrarily strong capability simply could not be aligned, at least to a level of assurance that a human could comprehend.

Thank you for these additional references; I was trying to anchor this article with some very high-level concepts.... (read more)

I got part of the way through the process and then got stuck, but my situation may not be typical.

  • These bonds have to be purchased directly from the treasury, with an account at Treasury Direct.
  • Creation of a Treasury Direct account requires mailing in a form that has to be certified by a specific bank certifying agent in the US. A regular notary service is not accepted.
  • As far as I can tell, an equivalent certification service isn't available outside the country.
1rossry
Oh, yeah, I can't vouch for / walk through the operations side (not having done it myself). I have had the misfortune of looking at ways to get a Medallion certification outside the US, and it's not pretty (I failed).

I shudder to imagine the mutual funds created to fund bids on this thing.

How hard do you have squint to not see this thing as pyramid-shaped? This thing is like Sierpinski's pyramid. It's fractally a scam; a scam at every conceivable resolution.

Actually, the worst thing would be if the price of the minting increases at a rate slower than the value of half the pool grows. Then every next bid would still be "in the money", and then whoever doesn't go bankrupt first wins. This thing could eat the whole world. Terrible. Kill it with fire.

1taras
The price of minting will increase a little slower than the growth of the pot, but it will never reach a value greater than about 450x. It would approach it asymptotically. It would be interesting if several participants battle with each other for a while. Then, when they are worn out, tired and out of funds, a new participant claims the pot with a single mint at the end :)

Well maybe I'm missing something, but the game theory doesn't seem that interesting to me. And calling it a 'return on investment' seems a bit generous for what is really just a game of blockchain chicken. In fact, it might be as crazy as a dollar auction where people might end up bidding more than what half the accumulated contract is worth due to sunk cost fallacy or other irrational behaviors.

Either way, you're not really buying anything of value here: you're just betting that the auction gets so little attention that you can walk away with free money, ... (read more)

1gwern
Yeah, hasn't this 'timer unlocks deposit' game been run many times before? Either it lapses from lack of interest, or eventually the pot gets large enough for someone to pay a miner enough to lock in the necessary blocks to cash it out or DoSing rival transactions or something like that.
1elspood
Actually, the worst thing would be if the price of the minting increases at a rate slower than the value of half the pool grows. Then every next bid would still be "in the money", and then whoever doesn't go bankrupt first wins. This thing could eat the whole world. Terrible. Kill it with fire.

What happens to the other half? This seems underspecified as you've described it.

1Annapurna
It stays in the contract until a new mint (called it mint B) happens. Then the 30 day clock begins again. If no new mints happen in those 30 days, then mint B takes 50% of that 50% remaining. 
elspood270

In the interest of science, I ran 10 more simulations with our submitted population. This is not to open a can of worms or to challenge the results in any way - we all knew we had to win on the first try!

https://drive.google.com/file/d/1mSqaNlo5KT9l9vmY3ckd8KSTXA0xOz0u/view

Some things that I observed:

  • The results were highly sensitive to randomness. Almost no species survived consistently.
  • Sometimes defenseless creatures survived and sometimes they didn't.
  • LeavyTanky (ViktorThink) survived basically every time. Looks like there was no competition for the invi
... (read more)
4Measure
Another interesting experiment would be to try to maximize the number of surviving species (starting either from the current set or from nothing).

Here's our Brier scores for our predictions:

https://docs.google.com/spreadsheets/d/1qhuACrtD0esgCqz8rQvYcZOC0I1y1l66/edit#gid=225287990

The defenseless creature result really surprised most of us. Well done, aphyer, you knew what was up.

3aphyer
Not well enough to get more than one animal to survive, alas. The one thing I did manage to improve my survival odds by doing was steering clear of the Tundra...even after the buff to Lichen (prior to which Tundra was mathematically impossible to survive) it seemed pretty likely that even a single non-trivial Tundra predator being submitted would inevitably wipe the whole region out. It looked like...rather a lot of people submitted distinctly non-trivial predators there.
5SarahNibs
I'm giving myself credit for a much better score than that. ;) Edit: as per elspood's simulations, I am revoking all of my supposed credit.

Of all the things, the coconuts were by far the most difficult to get anything to survive on. In my simulations, usually the coconut eaters that survived were also eating something else.

In theory, coconuts should sustain a 13.1 E creature; In practice, with such a small food source this size creature gets outcompeted at first by much smaller organisms that then get hunted to extinction by predators.

3Measure
The equilibrium population for my crab (size 13.6) is around 110. I guess its population must have been so low by the time the last smaller coconut-eater was hunted out that a random fluctuation killed it before it could recover. If it had managed to recover, it should have eventually migrated to establish itself in both River and Shore.

Ah, I read the wrong line. So yeah, we submitted the exact same creature.

There were definitely reliably BAD creatures, and certainly some reliably good ones, but a lot of variance based on the overall makeup of the population. I certainly didn't expect so many total creatures to be submitted; there was a lot more variability in results with 500-creature populations. In 5000-creature populations, basically the only thing that ever survived was invincibles.

With this size population, I don't think it's a coincidence that your minimal invincible survived - and certainly wasn't just luck that you arrived at its design. Give yourself SOME credit. :)

I submitted the exact same 10 speed leaf eater that you did, I just started it in the Temperate Forest. Luck of the draw that yours got here first, I guess.

4Vanessa Kosoy
What do you mean, you just started it in the Temperate Forest? The Snark comes from the Temperate Forest.

Damn, now I'm upset I didn't spend more time thinking of a good name. A brown bear isn't even a pure predator! Really wish I had called THIS one the Trash Panda, instead. :)

Wait, are you initializing and running each biome separately? I expected all biomes to be seeded at once with the complete set of submitted organisms.

2lsusr
They're all run simultaneously. I'm just writing up each biome separately.

My definition of "minimal invincibles" here:

0 ATK, 10 DEF, 1SPD, Antivenom herbivore

OR

0 ATK, 0 DEF, 10SPD herbivore

These definitely win in a field of hundreds of participants. In my simulations, they were outcompeted by "less" invincible creatures fitting the invincible prototypes with 20-50 participants (200-500 creatures). I hedged my bets with a few invincibles, some hard-to-kills, and some things I found surprisingly hard to kill.

Also, my daughter's creature, so she has a chance to embarrass us all. :)

Did anyone find a way to reliably crash the populations of non-invincibles with fewer than 200 creatures (a reasonable amount of confederates you could wrangle)?

3Bruce G
Why would something with full armor, no weapons, and antivenom benefit from even 1 speed?  It does not need to escape from anything.  And if it has no weapons or venom, it can not catch any prey either. Edit: I suppose if you want it to occasionally wander to other biomes, then that could be a reason to give it 1 speed.
elspood130

Embarrassing story:

I spent a lot of time writing a fast simulator and testing all kinds of approaches. Today I let my daughter (8) design a species without really understanding the game mechanics...and it performed better than every other creature on the first try. Granted, I had to help her correct some obviously suboptimal choices, but still...let's just say my confidence is not high.

I'll precommit to suggesting a secondary scoring mechanism for bragging rights: not simply the highest total number of surviving organisms but the total energy of the organisms (population * base energy).

Good luck everyone!

6__nobody
I didn't trusty myself to reimplement the simulator - any subtle change would likely have invalidated all results. So simulations were real slow... I still somehow went through about 0.1% of the search space (25K of about 27M possible different species), and I hope it was the better part of the space / largely excluding "obviously" bad ideas. (Carefully tweaking random generation to bias it towards preferring saner choices, while never making weird things too unlikely.) Of course, the pairings matter a lot so I'm not at all certain that I didn't accidentally discard the best ones just because they repeatedly ended up in very unfortunate circumstances. There certainly were some kinda non-intuitive choices found, for example: A Benthic creature that can (also) eat grass -- it can't start in the river, but that's where it wants to go; and travel-wise, Ocean/Benthic are equivalent! (Also, for some reason, others trying the same strategy in the ocean performed way worse... absolutely no idea why yet.) I'd have loved for this to happen in a less-busy week (not exactly the end of the quarter year with all the bookkeeping) and to have about 2-3x as much time to get the infrastructure working... managed to barely get simple mutation working, but didn't have time for the full genetic algorithm or other fancy stuff. :(

Can you give a more specific deadline? What timezone?

1__nobody
Seconding this, does 'by Sep 30th' mean start or end of the day? I'm currently assuming 'end of', in some unspecified time zone. My computer's still crunching numbers and I'm about to head to bed… would be sad to miss the deadline.

It would also be kind of a pain in the ass to change! :)

Not what I'm seeing. Roamers start roaming before the encounters in each biome, then after every biome is processed, the roamers find a new home. So the roamers go a whole generation without competing or foraging. Is that not what was intended?

3lsusr
You're right. I'm wrong. Good spotting! This behavior wasn't intended but I'm keeping it because it's interesting and makes some biological sense.

I thought the same thing at first, but I think if the interact method is called with only one argument, then that creature ends up foraging normally. Since spawning depends on creature size and reproduction depends on energy, it seems equally likely that each biome will have an even number of creatures after each generation as they would odd. So this situation would happen whether roaming is occurring or not.

The tough situation is for carnivores; if they're the odd one out, they'll die, even if there are species that they could eat.

5aphyer
Hm.  I'm looking at the code and I don't quite understand how it works (argh, Lisp).  Is anyone able to explain what's going on here:   Suppose you have 1001 animals in your region.  It looks to me like: * The range fills up with numbers up to 500 or 501 depending on rounding. * Then animal 1 interacts with animal 2. * I think the intended next step is for animal 3 to interact with animal 4, and so on. * At that point, depending on which way the rounding went we could either: * Stop when animal 999 interacts with animal 1000.  Animal 1001 won't interact with anything, and will die by default. * Attempt to have animal 1001 interact with animal 1002.  In languages I know this would lead to an error (when you try to pull element 1002 out of a 1001-element array).  Does this actually work as-intended in Lisp? * On looking at this in more detail, though, I'm worried that what actually happens is that animal 1 interacts with animal 2, then animal 2 interacts again with animal 3, then animal 3 with animal 4, and so on through animal 500 interacting with animal 501, and then animals 502-1001 do not interact with anything and just die?   I'd imagine that the code should instead use 2i and 2i + 1 instead of i and i + 1 to index into the population, but I don't actually know Lisp and maybe I'm misunderstanding how loops/increments work.

There is no initial check to see if a species can survive in its spawning biome. Obviously this doesn't matter for breathing, but species could live in the desert or tundra for free without the corresponding traits.

2lsusr
Whoops. That is a bug. Thanks for spotting. I will fix it before running the final competition, either by writing code or just manually removing the organisms initially spawned into a biome they can't survive.

Ah, ok. So instead of competing in that generation, the individual roams.

2lsusr
Roamers don't skip competitions. Roaming happens between competitions for food. Edit: I was wrong. See thread.

If my understanding of the code is correct, if the organism successfully roams, it basically spawns another copy of itself, leaving the original behind to compete in the source biome . That organism isn't removed from the competition pool. Given the relatively low roaming rate, I'm not sure this makes a huge difference, but it doesn't seem like it should be intended behavior.

4lsusr
The method .pop removes the roamer from the original population.

Can you elaborate on the winning condition? I expect most biomes will have surviving species; will that mean multiple winners, or will the ultimate winner be the species with the most total biomass? How long will the simulation be run? I can imagine stable equilibrium conditions with multiple survivors, even after an arbitrarily large number of simulation rounds.

2lsusr
Everyone who survives counts as a winner. If you enter multiple species then you can win multiple times. This is a non-zero-sum game. I may rank them according to average population. The simulation will be run until I feel like it has hit an equilibrium. I encourage you to enter one or more species even if you don't think it'll win. More biodiversity makes for a more interesting experience. It is possible there will be no winners.
2lsusr
Fixed. Thanks.
elspood-10

Reading this reply, I was immediately reminded of a situation described by Jen Peeples, I think in an episode of The Atheist Experience, about her co-pilot's reaction of prayer during a life-threatening helicopter incident. ( This Comment is all I could find as reference. )

Unless your particular prayer technique is useful for quickly addressing emergency situations, you probably don't want to be in the habit of relying on it as a general practice. I think the "rubber duck" Socratic approach could still be useful, so this isn't a disagreement with... (read more)

4pjeby
Rubber ducking is for when you're uncertain how to proceed. An incident on a military aircraft is not such a situation: there are checklists that detail precisely how you're supposed to proceed, which you'd better be following. If you are doing problem-solving in a distressed aircraft, and that problem-solving activity is not explicitly listed on the checklist for the current issue, you are Doing It Wrong. And if you're praying in such a scenario, it had better be something like, "grant me the calm and clarity to follow the checklist, so I'm not distracted by any panicky impulses".
elspood20

Isn't there a separate axis for every aspect of human divergence? Maybe this was already explicit in asking if there is anything more complicated that romance for "multiplayer" relationships, but really this problem seems fully general: politics, or religion, or food, or any other preference that has a distribution among humans could be a candidate for creating schism (or indeed all axes at once). "Catgirl for romance" is one very specific failure mode, but the general one could be called "an echo chamber for every mind".

The e... (read more)

elspood20

It was hard to muster a proper sense of indignation when you were confronting the same dignified witch who, twelve years and four months earlier, had given both of you two weeks' detention after catching you in the act of conceiving Tracey.

Given the fact that there is a Tracey, then that act of conception must have completed. So, either McGonagall caught them at exactly the right moment, or the Davises had just kept on going after they were caught...

No matter how it happened, this scene must have played out hilariously.

2Aureateflux
Er, it's not like people can't be caught during the second round or after completion. This is also from McGonagall's point of view and could be unreliable. The time she caught them probably wasn't the ONLY time they had sex within the window of time that would have produced Tracey. It could just be a convenient conceit for McGonagall to be thinking it was during the time she caught them that the girl was conceived, since she only knows of one encounter during the appropriate timeframe.
elspood00

If consequentialism and deontology shared a common set of performance metrics, they would not be different value systems in the first place.

At least one performance metric that allows for the two systems to be different is: "How difficult is the value system for humans to implement?"

elspood20

[edited out emotional commentary/snark]

  1. If you can't multiply B by a probability factor, then it's meaningless in the context of xB + (1-x)C, also. xB by itself isn't meaningless; it roughly means "the expected utility on a normalized scale between the utility of the outcome I least prefer and the outcome I most prefer". nyan_sandwich even agrees that 0 and 1 aren't magic numbers, they're just rescaled utility values.
  2. I'm 99% confident that that's not what nyan_sandwich means by radiation poisoning in the original post, considering the fact that
... (read more)
1nshepperd
Oh, I was going to reply to this, and I forgot. All this business with radiation poisoning is just a roundabout way of saying the only things you're allowed to do with utilities are "compare two utilities" and "calculate expected utility over some probability distribution" (and rescale the whole utility function with a positive affine transformation, since positive affine transformations happen to be isomorphisms of the above two calculations). Looking at utility values for any other purpose than comparison or calculating expected utilities is a bad idea, because your brain will think things like "positive number is good" and "negative number is bad" which don't make any sense in a situation where you can arbitrarily rescale the utility function with any positive affine transformation. "xB + (1-x)0" which is formally equivalent to "xB" means "the expected utility of B with probability p and the outcome I least prefer on a normalized scale with probability (1-p)", yes. The point I'm trying to make here though is that probability distributions have to add up to 1. "Probability p of outcome B" — where p < 1 — is a type error, plain and simple, since you haven't specified the alternative that happens with probability (1-p). "Probability p of outcome B, and probability (1-p) of the outcome I least prefer" is the closest thing that is meaningful, but if you mean that you need to say it.
elspood00

I think what you mean to tell me is: "say 'my preferences' instead of 'my utility function'". I acknowledge that I was incorrectly using these interchangeably.

I do think it was clear what I meant when I called it "my" function and talked about it not conforming to VNM rules, so this response felt tautological to me.

elspood00

I notice we're not understanding each other, but I don't know why. Let's step back a bit. What problem is "radiation poisoning for looking at magnitude of utility" supposed to be solving?

We're not talking about adding N to both sides of a comparison. We're talking about taking a relation where we are only allowed to know that A < B, multiplying B by some probability factor, and then trying to make some judgment about the new relationship between A and xB. The rule against looking at magnitudes prevents that. So we can't give an answer to the q... (read more)

0nshepperd
1. You can't just multiply B by some probability factor. For the situation where you have p(B) = x, p(C) = 1 - x, your expected utility would be xB + (1-x)C. But xB by itself is meaningless, or equivalent to the assumption that the utility of the alternative (which has probability 1 - x) is the magic number 0. "1/400 chance of a whale day" is meaningless until you define the alternative that happens with probability 399/400. 2. For the purpose of calculating xB + (1-x)C you obviously need to know the actual values, and hence magnitudes of x, B and C. Similarly you need to know the actual values in order to calculate whether A < B or not. "Radiation poisoning for looking at magnitude of utility" really means that you're not allowed to compare utilities to magic numbers like 0 or 1. It means that the only thing you're allowed to do with utility values is a) compare them to each other, and b) obtain expected utilities by multiplying by a probability distribution.
elspood00

It's too late for me. It might work to tell the average person to use "awesomeness" as their black box for moral reasoning as long as they never ever look inside it. Unfortunately, all of us have now looked, and so whatever value it had as a black box has disappeared.

You can't tell me now to go back and revert to my original version of awesome unless you have a supply of blue pills whenever I need them.

If the power of this tool evaporates as soon as you start investigating it, that strikes me as a rather strong point of evidence against it. It was fun while it lasted, though.

0evand
You seem to be generalizing from one example. Have you attempted to find examples of people who have looked inside the box and not destroyed its value in the process? I suspect that the utility of this approach is dependent on more than simply whether or not the person has examined the "awesome" label, and that some people will do better than others. Given the comments I see on LW, I suspect many people here have looked into it and still find value. (I will place myself into that group only tentatively; I haven't looked into it in any particular detail, but I have looked. OTOH, that still seems like strong enough evidence to call "never ever look inside" into question.)
elspood00

Ooops, you tried to feel a utility. Go directly to type theory hell; do not pass go, do not collect 200 utils.

I don't think this example is evidence against trying to 'feel' a utility. You didn't account for scope insensitivity and the qualitative difference between the two things you think you're comparing.

You need to compare the feeling of the turtle thrown against the wall to the cumulative feeling when you think about EACH individual beheading, shooting, orphaned child, open grave, and every other atrocity of the genocide. Thinking about the vague concept "genocide" doesn't use the same part of your brain as thinking about the turtle incident.

elspood-10

What I mean by "normalized" is that you're compressing the utility values into the range between 0 and 1. I am not aware of another definition that would apply here.

Your rule says you're allowed to compare, but your other rule says you're not allowed to compare by magnitude. You were serious enough about this second rule to equate it with radiation death.

You can't apply probabilities to utilities and be left with anything meaningful unless you're allowed to compare by magnitude. This is a fatal contradiction in your thesis. Using your own example... (read more)

3nshepperd
There's something missing here, which is that "1/400 chance of a whale day" means "1/400 chance of whale + 399/400 chance of normal day". To calculate the value of "1/400 chance of a whale day" you need to assign a utility for both a whale day and a normal day. Then you can compare the resulting expectation of utility to the utility of a sandwhich = 1/500 (by which we mean a sandwich day, I guess?), no sweat. The absolute magnitudes of the utilities don't make any difference. If you add N to all utility values, that just adds N to both sides of the comparison. (And you're not allowed to compare utilities to magic numbers like 0, since that would be numerology.)
elspood00

No, I mean if my utility function violates transitivity or other axioms of VNM, I more want to fix it than to throw out VNM as being invalid.

1A1987dM
then it's not a utility function in the standard sense of the term.
elspood10

I think I have updated slightly in the direction of requiring my utility function to conform to VNM and away from being inclined to throw it out if my preferences aren't consistent. This is probably mostly due to smart people being asked to give an example of a circular preference and my not finding any answer compelling.

Expectation. VNM isn't really useful without uncertainty. Without uncertainty, transitive preferences are enough.

I think I see the point you're trying to make, which is that we want to have a normalized scale of utility to apply probab... (read more)

1[anonymous]
If you don't conform to VNM, you don't have a utility function. I think you mean to refer to your decision algorithms.
1[anonymous]
You are allowed to compare. Comparison is one of the defined operations. Comparison is how you decide which is best. I'm uneasy with this "normalized". Can you unpack what you mean here?
elspood00

That was one of the major points. Do not play with naked utilities. For any decision, find the 0 anchor and the 1 anchor, and rank other stuff relative to them.

I understood your major point about the radioactivity of the single real number for each utility, but I got confused by what you intended the process to look like with your hell example. I think you need to be a little more explicit about your algorithm when you say "find the 0 anchor and the 1 anchor". I defaulted to a generic idea of moral intuition about best and worst, then only mad... (read more)

1[anonymous]
Yes, approximately. I consider all the axioms of VNM to be totally reasonable. I don't think the human decision system follows the VNM axioms. Hence the project of defining and switching to this VNM thing; it's not what we already use, but we think it should be. VNM is required to use VNM, but if you encounter a circular preference and decide you value running in circles more than the benefits of VNM, then you throw out VNM. You can't throw it out from the inside, only decide whether it's right from outside. Expectation. VNM isn't really useful without uncertainty. Without uncertainty, transitive preferences are enough. If being a whale has utility 1, and getting nothing has utility 0, and getting a sandwich has utility 1/500, but the whale-deal only has a probability of 1/400 with nothing otherwise, then I don't know until I do expectation that the 1/400 EU from the whale is better than the 1/500 EU from the sandwich.
elspood00

"Awesomeness" is IMO the simplest effective pointer to morality that we currently have, but that morality is still inconsistent and dynamic.

The more I think about "awesomeness" as a proxy for moral reasoning, the less awesome it becomes and the more like the original painful exercise of rationality it looks.

0[anonymous]
see this tl;dr: don't dereference "awesome" in verbal-logical mode.
elspood00

I've been very entertained by this framing of the problem - very fun to read!

I find it strange that you claim the date with Satan is clearly the best option, but almost in the same breath say that the utility of whaling in the lake of fire is only 0.1% worse. It sounds like your definition of clarity is a little bit different from mine.

On the Satan date, souls are tortured, steered toward destruction, and tossed in a lake of fire. You are indifferent to those outcomes because they would have happened anyway (we can grant this a premise of the scenario). Bu... (read more)

1[anonymous]
That was one of the major points. Do not play with naked utilities. For any decision, find the 0 anchor and the 1 anchor, and rank other stuff relative to them. Yep, you are not VNM compliant, or the whole excercise would be worthless. The philosophy involved in actually making your preferences consistent is hard of course. I swept that part under the rug.
elspood10

Edited, thanks for the style correction.

I suspect you're probably right that more examples makes this more interesting, given the lack of upvotes. In fact, I probably found the quote relevant mostly because it more or less summed up the experience of my OWN life at the time I read it years ago.

I spent much of my youth being contrarian for contradiction's sake, and thinking myself to be revolutionary or somehow different from those who just joined the cliques and conformed, or blindly followed their parents, or any other authority.

When I realized that defin... (read more)

Load More