All of ErickBall's Comments + Replies

Fair enough, I guess the distinction is more specific than just being a (weak) mesa-optimizer. This model seems to contradict https://www.lesswrong.com/posts/pdaGN6pQyQarFHXF4/reward-is-not-the-optimization-target because it has, in fact, developed reward as the optimization target without ever being instructed to maximize reward. It just had reward-maximizing behaviors reinforced by the training process, and instead of (or in addition to) becoming an adaptation executor it became an explicit reward optimizer. This type of generalization is surprising and ... (read more)

4gwern
It doesn't contradict Turntrout's post because his claims are about an irrelevant class of RL algorithms (model-free policy gradients) . A model-based RL setting (like a human, or a LLM like Claude pretrained to imitate model-based RL agents in a huge number of settings ie. human text data) optimizes the reward, if it's smart and knowledgeable enough to do so. (This comment is another example of how Turntrout's post was a misfire because everyone takes away the opposite of what they should have.)

we can be confident about why it’s doing this: to get a high RM score

Does this constitute a mesa-optimizer? If so, was creating it intentional or incidental? I was under the impression that those were still basically theoretical.

2Fabien Roger
This is a mesa-optimizer in a weak sense of the word: it does some search/optimization. I think the model in the paper here is weakly mesa-optimizing, maybe more than base models generating random pieces of sports news, and maybe roughly as much as a model trying to follow weird and detailed instructions - except that here it follows memorized "instructions" as opposed to in-context ones.
6evhub
I would argue that every LLM since GPT-3 has been a mesa-optimizer, since they all do search/optimization/learning as described in Language Models are Few-Shot Learners.

I think this topic is important and many of your recommendations sound like great ideas, but they also involve a lot of "we should" where it's not clear who "we" is. I would like to see, for some of these, targeting to a specific audience: who actually has the capability to help streamline government procurement processes for AI, and how? What organizations might be well positioned to audit agency needs and bottlenecks? I'm left with the sense that these things would be good in the abstract, but that there's little I personally (or most other readers, unle... (read more)

Why on earth would pokemon be AGI-complete?

ErickBall157

There are big classes of problems that provably can't be solved in a forward pass. Sure, for something where it knows the answer instantly the chain of thought could be just for show. But for anything difficult, the models need the chain of thought to get the answer, so the CoT must contain information about their reasoning process. It can be obfuscated, but it's still in there. 

I kind of see your point about having all the game wikis, but I think I disagree about learning to code being necessarily interactive. Think about what feedback the compiler provides you: it tells you if you made a mistake, and sometimes what the mistake was. In cases where it runs but doesn't do what you wanted, it might "show" you what the mistake was instead. You can learn programming just fine by reading and writing code but never running it, if you also have somebody knowledgeable checking what you wrote and explaining your mistakes. LLMs have tons of examples of that kind of thing in their training data.

Yeah but we train AIs on coding before we make that comparison. And we know that if you train an AI on a videogame it can often get superhuman performance. Here we're trying to look at pure transfer learning, so I think it would be pretty fair to compare to someone who is generally competent but has never played videogames. Another interesting question is to what extent you can train an AI system on a variety of videogames and then have it take on a new one with no game-specific training. I don't know if anyone has tried that with LLMs yet.

2β-redex
I am not a 100% convinced by the comparison, because technically LLMs are only "reading" a bunch of source code, they are never given access to a compiler/interpreter. IMO actually running the code one has written is a very important part of learning, and I think it would be a much more difficult task for a human to learn to code just by reading a bunch of books/code, but never actually trying to write & run their own code.[1] Also, in the video linked earlier in the thread, the girlfriend playing Terraria is deliberately not given access to the wiki, and thus I believe is an unfair comparison. I expect to see much better human performance if you give them access to manuals & wikis about the game. Not sure either, but I agree that this would be an interesting experiment. (Human gamers are often much quicker at picking up new games and are much better at them than someone with no gaming background.) ---------------------------------------- 1. I would expect the average human to stay very bad at coding, no matter how many books & code examples you give them. I would also expect some smaller class of humans to nevertheless be able to pull that feat off. (E.g. maybe a mathematician well versed in formal logic, who is used to doing complex symbolic manipulation correctly "only on paper", could probably write non-trivial correct programs just by reading about the subject. In fact, a lot of stuff from computer science was worked out well before computers were built, e.g. Ada Lovelace is usually credited with writing the "first computer program", well before the first digital computer existed.) ↩︎

The cornerstone of all control theory is the idea of having a set-point and designing a controller to reduce the deviation between the state and the set-point.

But control theory is used for problems where you need a controller to move the system toward the set-point, i.e. when you do not have instant total control of all degrees of freedom. We use tools like PID tuning, lead-lag, pole placement etc. to work around the dynamics of the system through some limited actuator. In the case of AI alignment, not only do we have a very vague concept of what our set-... (read more)

I would think things are headed toward these companies fine tuning an open source near-frontier LLM. Cheaper than building one from scratch but with most of the advantages.

Yeah, something along the lines of an ELO-style rating would probably work better for this. You could put lots of hard questions on the test and then instead of just ranking people you compare which questions they missed, etc.

This works for corn plants because the underlying measurement "amount of protein" is something that we can quantify (in grams or whatever) in addition to comparing two different corn plants to see which one has more protein. IQ tests don't do this in any meaningful sense; think of an IQ test more like a Moh's hardness scale, where you can figure out a new material's position on the scale by comparing it to a few with similar hardness and seeing which are harder and which are softer. If it's harder than all of the previously tested materials, it just goes at the top of the scale.

gwern*216

IQ tests include sub-tests which can be cardinal, with absolute variables. For example, simple & complex reaction time; forwards & backwards digit span; and vocabulary size. (You could also consider tests of factual knowledge.) It would be entirely possible to ask, 'given that reaction time follows a log-normalish distribution in milliseconds and loads on g by r = 0.X and assuming invariance, what would be the predicted lower reaction time of someone Y SDs higher than the mean on g?' Or 'given that backwards digit span is normally distributed...' T... (read more)

8GeneSmith
You can definitely extrapolate out of distribution on tests where the baseline is human performance. We do this with chess ELO ratings all the time.

I wasn't saying it's impossible to engineer a smarter human. I was saying that if you do it successfully, then IQ will not be a useful way to measure their intelligence. IQ denotes where someone's intelligence falls relative to other humans, and if you make something smarter than any human, their IQ will be infinity and you need a new scale.

2tailcalled
IQ tests are built on item response theory, where people's IQ is measured in terms of how difficult tasks they can solve. The difficulty of tasks is determined by how many people can solve them, so there is an ordinal element to that, but by splitting the tasks off you could in principle measure IQ levels quite high, I think.
5GeneSmith
I don't think this is the case. You can make a corn plant with more protein than any other corn plant, and using standard deviatios to describe it will still be useful. Granted, you may need a new IQ test to capture just how much smarter these new people are, but that's different than saying they're all the same.
ErickBall6-2

it’s not even clear what it would mean to be a 300-IQ human

IQ is an ordinal score, not a cardinal one--it's defined by the mean of 100 and standard deviation of 15. So all it means is that this person would be smarter than all but about 1 in 10^40 natural-born humans. It seems likely that the range of intelligence for natural-born humans is limited by basic physiological factors like the space in our heads, the energy available to our brains, and the speed of our neurotransmitters. So a human with IQ 300 is probably about the same as IQ 250 or IQ 1000 or IQ 10,000, i.e. at the upper limit of that range.

2tailcalled
IQ is an ordinal score in that it's relationship to outcomes of interest is nonlinear, but for the most important outcomes of interest, e.g. ability to solve difficult problems or income or similar, the relationship between IQ and success at the outcome is exponential, so you'd be seeing accelerating returns for a while. Presumably fundamental physics limits how far these exponential returns can go, but we seem quite far from those limits (e.g. we haven't even solved aging yet).
6localdeity
The original definition of IQ, intelligence quotient, is mental age (as determined by cognitive test scores) divided by chronological age (and then multiplied by 100).  A 6-year-old with the test scores of the average 9-year-old thus has an IQ of 150 by the ratio IQ definition. People then found that IQ scores roughly followed a normal distribution, and subsequent tests defined IQ scores in terms of standard deviations from the mean.  This makes it more convenient to evaluate adults, since test scores stop going up past a certain age in adulthood (I've seen some tests go up to age 21).  However, when you get too many standard deviations away from the mean, such that there's no way the test was normed on that many people, it makes sense to return to the ratio IQ definition. So an IQ 300 human would theoretically, at age 6, have the cognitive test scores of the average 18-year-old.  How would we predict what would happen in later years?  I guess we could compare them to IQ 200 humans (of which we have a few), so that the IQ 300 12-year-old would be like the IQ 200 18-year-old.  But when they reached 18, we wouldn't have anything to compare them against. I think that's the most you can extract from the underlying model.
4GeneSmith
I would be quite surprised if this were true. We should expect scaling laws for brain volume alone to continue well beyond the current human range, and brain volume only explains about 10% of the variance in intelligence.

I've heard doctors ask questions like this but I don't think they usually get very helpful answers. "My diet's okay I guess, pretty typical, a lot of times I don't sleep great, and yeah I have a pretty stressful job." Great, what do you do with that?

"Food" in general is about the easiest and most natural thing for a dog to identify. Distinguishing illegal drugs from all the other random stuff a person might be carrying (soap, perfume, medicine, etc.) at least requires a lot better training than finding food.

7Ben
Very possible. I am not fully convinced. The dog had to identify the people who had food in there bags, and tell them apart from all the people who used to have food in those same bags, or were eating on the flight and have food on there breath or hands. A dog trying to identify (for example) canabis would probably have an easier time. My stance is not "I know 100% that sniffer dogs are a silver bullet", but the weaker position "The majority of the value of a sniffer dog comes from it actually smelling things, rather than giving the officer controlling it a plausible way of profiling based on other (possibly protected) characteristics."

It's interesting that 3.5 Sonnet does not seem to match, let alone beat, GPT-4o on the leaderboard (https://chat.lmsys.org/?leaderboard). Currently it shows GPT-4o with elo 1287 and Claude 3.5 Sonnet at 1271.

8gwern
Yeah, there's a decent amount of debate going on about how good 3.5 Sonnet is vs 4o, or if 4o was badly underperforming its benchmarks + LMsys to begin with. Has 4o been crippled by something post-deployment?* Is this something about long-form interaction with Claude, which is missed by benchmarks and short low-effort LMsys prompts? Are Claude users especially tilting into coding now given the artifact/project features, which seems to be the main strength of Claude-3.5-Sonnet? Every year, it seems like benchmarking powerful generalist AI systems gets substantially harder, and this may be the latest iteration of that difficulty. (Given the level of truesight and increasing level of persistency of account history, we may be approaching the point where different models give different people intrinsically different experiences - eg. something like, Claude genuinely works better for you than for me, while I genuinely find ChatGPT-4o more useful, because you happen to be politer and ask more sensible questions like Claude is a co-worker and that works better with the Claude RLAIF, while the RLHF crushes GPT-4o into submission so while it's a worse model it's more robust to my roughshod treatment of GPT-4o as a slave. Think of it as like Heisenbugs on steroids, or operant conditioning into tacit knowledge: some people just have more mana and mechanical sympathy, and they can't explain how or why.) * I've noticed what seems like some regressions in GPT-4o since the launch, in my Gwern.net scripts, where it seems to have gotten oddly worse at some simple tasks like guessing URLs or picking keywords to bold in abstracts, and is still failing to clean some URL titles despite ~40 few-shot examples collected from previous errors.

Although it would also be nice to distinguish that from "I read this post already somewhere else"

I would love to have a checkbox or something next to each post to indicate "I saw this and I don't want to click on it"

3Ruby
Yeah, I think we should do something like this. Maybe the box is "don't show me posts like this"
1ErickBall
Although it would also be nice to distinguish that from "I read this post already somewhere else"

As a counterpoint, take a look at this article: https://peterattiamd.com/protein-anabolic-responses/

The upshot is that the studies saying your body can only use 45g of protein per meal for muscle synthesis are mostly based on fast-acting whey protein shakes. Stretching out the duration of protein metabolism (by switching protein sources and/or combining it with other foods in a gradually-digested meal) can mitigate the problem quite a bit.

Saturated fats are definitely manageable in small amounts. For most of history, and still in many places today, the biggest concern for an infant was getting sufficient calories, and saturated fat is a great choice for that. When you look at modern hunter-gatherer diets, they contain animal products, but in most cases they do not make up the majority of calories (exceptions usually involve lots of seafood), the meats are wild and therefore fairly lean, and BMI stays generally quite low. Under those conditions, heart disease risk is small and whether it is ... (read more)

Real can of worms that deserves its own post I would think

I think in this case just spacing them out would help more.

Downvoted because I waded through all those rhetorical shenanigans and I still don't understand why you didn't just say what you mean.

5RHollerith
As a deep-learning novice, I found the post charming and informative.
2[comment deleted]
7abramdemski
To me, the lengthy phrases do in fact get closer to "zack saying what zack meant" than the common terms like 'deep learning' -- but, like you, I didn't really get anything new out of the longer phrases. I believe that people who don't already think of deep learning as function approximation may get something out of it tho. So in consequence I didn't downvote or upvote.

This comment had been apparently deleted by the commenter (the comment display box having a "deleted because it was a little rude, sorry" deletion note in lieu of the comment itself), but the ⋮-menu in the upper-right gave me the option to undelete it, which I did because I don't think my critics are obligated to be polite to me. (I'm surprised that post authors have that power!) I'm sorry you didn't like the post.

Separate clocks would be a pain to manage in a board game, but in principle "the game ends once 50% of players have run out of time" seems like a decent condition.

2mako yass
In practice what I was going to do was just say that each turn is limited to like 40 seconds or whatever.

Oh, good point, I had forgotten about the zero-sum victory points. The extent to which the other parts are zero sum depends a lot on how large the game board is relative to the number of players, so it could be adjusted. I was thinking about having a time limit instead of a round limit, to encourage the play to move quickly, but maybe that's too stressful. If you want the players to choose to end the game, then you'd want to build in a mechanic that works against all of them more and more as the game progresses, so that at some point continuing becomes counterproductive...

3mako yass
I like time limits because time constraints are what make negotiation difficult (imperfect compromise), though just having a single shared time limit lets players filibuster. If players have separate time limits it's basically still a round limit, but good point to remember to impose a time limit.

Would a good solution be to just play Settlers, but instead of saying "the goal is to get more points than anyone else," say "this is a variant where the goal is to get the highest score you can, individually"? That seems like it would change the negotiation dynamics in a potentially interesting way without having to make or teach a brand new game. Does this miss the point somehow?

9mako yass
Solution to what. That would be cohabitive, I'd like to play that at least once, but I wouldn't expect it to work that well. 4 of 10 victory points in catan come from criteria that're inherently zero sum (having a longer road or bigger army than anyone else) (I wouldn't know how to adapt those). I'm not sure to what extent land scarcity makes the other conditions fairly zero sum as well. I haven't played a lot of Catan. You'd have to replace the end condition with a round limit. P1 (and the other one I'm going to publish soon, Final Autumn) also just ends after a certain number of rounds, and the only way to pace it well is to make it end 'too early', so that every game will be a study of haste. I don't love it. I wonder if we should try for a mechanic where players have to, to some extent somewhat deliberately build the true peace by taking some actions in the world that freezes current conditions in place/ends the game. I think that could be pretty interesting.

So, then it seems like the client's best move in this scenario is to lie to you strategically, or at least omit information strategically. They could say "I know for sure you won't find any fingerprints or identifiable face in the camera footage" and "I think my friends will confirm that I was playing video games with them", and as long as they don't actually tell you that's a lie, you can put those friends on the stand, right?

5ymeskhout
Correct, there are indeed potential advantages to lying to your attorney under very specific and narrow circumstances. You also have to consider the risky gamble this presents because you can't predict every aspect of the machinery. Maybe the jury never would've paid attention to the alibi aspect of the case, but if the alibi witnesses get exposed as liars by the prosecution, that alone could swing jurors from acquittal and towards conviction.

You say that lying to you can only hurt them but "There is a kernel of an exception that is almost not worth mentioning" because it is rarely relevant. I find this pretty hard to believe. If your client tells you "yeah I totally robbed that store, but I was wearing a ski mask and gloves so I think a jury will have reasonable doubt assuming my friends say I was playing video games with them the whole time", would you be on board with that plan? There must be plenty of cases where the cops basically know who did it but have trouble proving it. Maybe those just don't get to the point of a public defender getting assigned?

4ymeskhout
If a client tells me they know for sure that their alibi witness will be lying in their favor, then I'm not allowed to elicit the false testimony from that witness. If they admit to me to robbing the store but (truthfully and without omissions) say they were wearing a mask and functional gloves, then that lets me know what facets to focus on and what to avoid. If they're sure enough they left no fingerprints, then I can comfortably ask the investigating detectives if any fingerprints were found. If the circumstances allow it, then I may even get my own expert to dust the entire scene for fingerprints with the aim of presenting their absence as exculpatory evidence to the jury. Keep in mind that my job is not to help the government prosecute my client. And yes, there are plenty of cases where the perpetrator might be obvious from a common-sense perspective, but it would be legally difficult to prove in court.

That's like saying that because we live in a capitalist society, the default plan is to destroy every bit of the environment and fill every inch of the world with high rise housing projects. It's... true in some sense, but only as a hypothetical extreme, a sort of economic spherical cow. In reality, people and societies are more complicated and less single minded than that, and also people just mostly don't want that kind of wholesale destruction.

I didn't think the implication was necessarily that they planned to disassemble every solar system and turn it into probe factories. It's more like... seeing a vast empty desert and deciding to build cities in it. A huge universe, barren of life except for one tiny solar system, seems not depressing exactly but wasteful. I love nature and I would never want all the Earth's wilderness to be paved over. But at the same time I think a lot of the best the world has to offer is people, and if we kept 99.9% of it as a nature preserve then almost nobody would be around to see it. You'd rather watch the unlifted stars, but to do that you have to exist.

2jbash
No, the probes are instrumental and are actually a "cost of doing business". But, as I understand it, the orthodox plan is to get as close as possible to disassembling every solar system and turning it into computronium to run the maximum possible number of "minds". The minds are assumed to experience qualia, and presumably you try to make the qualia positive. Anyway, a joule not used for computation is a joule wasted.

I don't think governments have yet committed to trying to train their own state of the art foundation models for military purposes, probably partly because they (sensibly) guess that they would not be able to keep up with the private sector. That means that government interest/involvement has relatively little effect on the pace of advancement of the bleeding edge.

Fair point, but I can't think of a way to make an enforceable rule to that effect. And even if you could make that rule, a rogue AI would have no problem with breaking it.

1RogerDearnaley
Frontier models are all behind APIs, and the number of companies offering them is currently two, likely to soon be three. If they all agree this is unsafe, it's not that hard to prevent. For anything more than mildly intimate, it's also already blocked by their Terms of Service and their models will refuse. For a rogue, I agree. And one downside of not letting frontier models do this would be leaving unfulfilled demand for a rogue to take advantage of.

I think if you could demonstrably "solve alignment" for any architecture, you'd have a decent chance of convincing people to build it as fast as possible, in lieu of other avenues they had been pursuing.

2Seth Herd
Some people. But it would depend what the prospects were for that type of AGI. Because I don't think you could convince everyone else to stop working on other types of AGI. So it would be a race between the new "more alignable" type and the currently-leading types. If the "more alignable" type seemed guaranteed to lose that race, I'm not sure many people would even try building it.

Since our info doesn't seem to be here already: We meet on Sundays at 7pm, alternating between virtual and in-person in the lobby of the UMBC Performing Arts and Humanities Building. For more info, you can join our Google group (message the author of this post, bookinchwrm).

I found this post interesting, mostly because it illustrates deep flaws in the US tax system that we should really fix. I downvoted it because I think it is a terrible strategy for giving more money to charity. Many other good objections have been raised in the comments, and the post itself admits that lack of effectiveness is a serious problem. One problem I did not see addressed anywhere is reputational risk. The world is not static, and a technique that works for an individual criminal or a few conscientious objectors probably will not work consistently... (read more)

I always thought it would be great to have one set of professors do the teaching, and then a different set come in from other schools just for a couple weeks at the end of the year to give the students a set of intensive written and oral exams that determines a big chunk of their academic standing.

Answer by ErickBall10

I can now get real-time transcripts of my zoom meetings (via a python wrapper of the openai api) which makes it much easier to track the important parts of a long conversation. I tend to zone out sometimes and miss little pieces otherwise, as well as forget stuff.

That's fair, most of them were probably never great teachers.

ErickBall8-11

You are attributing a lot more deviousness and strategic boldness to the so-called deep state than the US government is organizationally capable of. The CIA may have tried a few things like this in banana republics but there's just no way anybody could pull it off domestically.

5trevor
This is a good point, that much of the data we have comes from leaked operations in South America (e.g. the Church Hearings), and CIA operations are probably much easier there than on American soil. However, there are also different kinds of systems pointed inward which look more like normal power games e.g. FBI informants, or lobbyists forming complex agreements/arrangements (like how their lawyer counterparts develop clever value-handshake-like agreements/arrangements to settle out-of-court). It shouldn't be surprising that domestic ops are more complicated and look like ordinary domestic power plays (possibly occasionally augmented by advanced technology). The profit motive alone could motivate Microsoft execs to leverage their access to advanced technology to get a better outcome for Microsoft. I was pretty surprised by the possibility that silicon valley VCs alone could potentially set up sophisticated operations e.g. using pre-established connections to journalists to leak false information or access to large tech companies with manipulation capabilities (e.g. Andreessen Horowitz's access to Facebook's manipulation research).

Professors being selected for research is part of it. Another part is the tenure you mentioned - some professors feel like once they have tenure they don't need to pay attention to how well they teach. But I think a big factor is another one you already mentioned: salaries. $150k might sound like a lot to a student, but to the kind of person who can become a math or econ professor at a top research university this is... not tiny but not close to optimal. They are not doing it for the money. They are bought in to a culture where the goal is building status ... (read more)

2Seth Herd
That's not luck. Non-research universities do select faculty by teaching skill.
0ChrisRumanov
I'm not fully convinced by the salary argument, especially with quality-of-life adjustment. As an example, let's imagine I'm a skilled post-PhD ML engineer, deciding between: Jane Street Senior ML Engineer: $700-750k, 50-55hrs/week, medium job security, low autonomy [Harvard/Yale/MIT] Tenured ML Professor: $200-250k, 40-45hrs/week, ultra-high job security, high autonomy A quick google search says that my university grants tenure to about 20 people per year. Especially as many professors have kids, side jobs, etc. it seems unlikely that a top university really can't find 20 good people across all fields who are both good teachers and would take the second option (in fact, I would guess that being a good teacher predisposes you to taking the second option). Is there some part of the tradeoff I'm missing?
6Viliam
I imagine that if they taught well before, they would still teach well by the sheer force of habit. Maybe slightly worse because they no longer bother to do it perfectly, but not "consistently present things in unclear or inconsistent ways".

But that sort of singularity seems unlikely to preserve something as delicately balanced as the way that (relatively well-off) humans get a sense of meaning and purpose from the scarcity of desirable things.

I think our world actually has a great track record of creating artificial scarcity for the sake of creating meaning (in terms of enjoyment, striving to achieve a goal, sense of accomplishment). Maybe "purpose" in the most profound sense is tough to do artificially, but I'm not sure that's something most people feel a whole lot of anyway?

I'm pretty opti... (read more)

Excellent, I think I will give something like that a try

I know this is an old thread but I think it's interesting to revisit this comment in light of what happened at Twitter. Musk did, in fact, fire a whole lot of people. And he did, in fact, unban a lot of conservatives without much obvious delay or resistance within the company. I'm not sure how much of an implication that has about your views of the justice department, though. Notably, it was pretty obvious that the decisions at Twitter were being made at the top, and that the people farther down in the org chart had to implement those decisions or be fired... (read more)

Thanks! I'd love to hear any details you can think of about what you actually do on a daily basis to maintain mental health (when it's already fairly stable). Personally I don't really have a system for this, and I've been lucky that my bad times are usually not that bad in the scheme of things, and they go away eventually.

2Sable
Great question. You've got the basics - eat right, workout, sleep, etc., but just saying that isn't much help. I've gotten a great deal out of habit chaining/trigger-action planning when used consistently; basically you create chains of actions that feed into one another so once you've started the chain, it takes no extra willpower to just keep following it to its conclusion. For instance: Wake up -> make breakfast -> get pills -> turn on sunlamp -> eat is one, that makes sure I take my medication, eat breakfast, and get some light everyday (the latter is especially important in the winter). Another is: Meditate -> Workout -> Shower which, while I mix up both the kinds of meditation and the kinds of workout, ensures all three get done, roughly every other day. Do it consistently, and eventually you can just do them on autopilot. You don't really forget anything and somehow, not doing them becomes the unnatural state. Hope that helps!

I'm not sure how I would work it out. The problem is that presumably you don't value one group more because they chose blue (it's because they're more altruistic in general) or because they chose red (it's because they're better at game theory or something). The choice is just an indicator of how much value you would put on them if you knew more about them. Since you already know a lot about the distribution of types of people in the world and how much you like them, the Bayesian update doesn't really apply in the same way. It only works on what pill they'... (read more)

Doesn't "trembling hand" mean it's a stable equilibrium even if there are?

6Richard_Kennaway
Yes, but if someone accidentally picks blue, that's their own fault. The blue-picker injures only themselves, hence the stability against trembling hands. I would care enough to warn them against doing that, but I'm not going to quixotically join in with that fault, just so that I can die as well.

I mean definitely most people will not use a decision procedure like this one, so a smaller update seems very reasonable. But I suspect this reasoning still has something in common with the source of the intuition a lot of people have for blue, that they don't want to contribute to anybody else dying.

Sure, if you don't mind the blue-choosers dying then use the stable NE.

5Richard_Kennaway
There are no blue-choosers in the stable NE, so no, I don't mind at all about zero people dying.
1Roko
well they literally chose it.... maybe they are suicidal?

People are all over the place but definitely not 50/50. The qualitative solution I have will hold no matter how weak the correlation with other people's choices (for large enough values of N).

If you make the very weak assumption that some nonzero number of participants will choose blue (and you prefer to keep them alive), then this problem becomes much more like a prisoner's dilemma where the maximum payoff can be reached by coordinating to avoid the Nash equilibrium.

6Roko
There is also a moral dimension of not wanting to encourage perverse behaviour This game has a stable, dominant NE with max reward, just use that
Load More