Why do some societies exhibit more antisocial punishment than others? Martin explores both some literature on the subject, and his own experience living in a country where "punishment of cooperators" was fairly common.

25Martin Sustrik
Author here. In the hindsight, I still feel that the phenomenon is interesting and potentially important topic to look into. I am not aware of any attempt to replicate or dive deeper though. As for my attempt to explain the psychology underlying the phenomenon I am not entirely happy with it. It's based only on introspection and lacks sound game-theoretic backing. By the way, there's one interesting explanation I've read somewhere in the meantime (unfortunately, I don't remember the source): Cooperation may incur different costs on different participants. If you are well-off, putting $100 into a common pool is not a terribly important matter. If others fail to cooperate all you can lose is $100. If you just barely getting along, putting $100 into a common pool may threaten you in a serious way. Therefore, rich will be more likely to cooperate than poor. Now, if the thing is framed in moral terms (those cooperating are "good", those not cooperating are "bad") the whole thing may feel like a scam providing the rich a way to buy moral superiority. As a poor person you may thus resort to anti-social punishment as a way to punish the scam.
Customize
Elizabeth599
11
According to a friend of mine in AI, there's a substantial contingent of people who got into AI safety via Friendship is Optimal. They will only reveal this after several drinks. Until then, they will cite HPMOR. Which means we are probably overvaluing HPMOR and undervaluing FIO.
Linch*122
7
Shower thought I had a while ago: Everybody loves a meritocracy until people realize that they're the ones without merit. I mean you never hear someone say things like: (I framed the hypothetical this way because I want to exclude senior people very secure in their position who are performatively pushing for meritocracy by saying poor kids are excluded from corporate law or whatever). In my opinion, if you are serious about meritocracy, you figure out and promote objective tests of competency that a) has high test-retest reliability so you know it's measuring something real, b) has high predictive validity for the outcome you are interested in getting, and c) has reasonably high accessibility so you know you're drawing from a wide pool of talent. For the selection of government officials, the classic Chinese imperial service exam has high (a), low (b), medium (c). For selecting good actors, "Whether your parents are good actors" has maximally high (a), medium-high (b), very low (c). "Whether your startup exited successfully" has low (a), medium-high (b), low (c). The SATs have high (a), medium-low (b), very high (c). If you're trying to make society more meritocratic, your number 1 priority should be the design and validation of tests of skill that push the Pareto frontier for various aspects of society, and your number 2 priority should be trying to push for greater incorporation of such tests. Given that ~ no one really does this, I conclude that very few people are serious about moving towards a meritocracy. (X-posted)^2
orthonormalΩ562
2
How do you formalize the definition of a decision-theoretically fair problem, even when abstracting away the definition of an agent as well as embedded agency?  I've failed to find anything in our literature. It's simple to define a fair environment, given those abstractions: a function E from an array of actions to an array of payoffs, with no reference to any other details of the non-embedded agents that took those actions and received those payoffs. However, fair problems are more than just fair environments: we want a definition of a fair problem (and fair agents) under which, among other things: * The classic Newcomb's Problem against Omega, with certainty or with 1% random noise: fair * Omega puts $1M in the box iff it predicts that the player consciously endorses one-boxing, regardless of what it predicts the player will actually do (e.g. misunderstand the instructions and take a different action than they endorsed): unfair * Prisoner's Dilemma between two agents who base their actions on not only each others' predicted actions in the current environment, but also their predicted actions in other defined-as-fair dilemmas: fair * For example, PrudentBot will cooperate with you if it deduces that you will cooperate with it and also that you would defect against DefectBot, because it wants to exploit CooperateBots). * Prisoner's Dilemma between two agents who base their actions on each others' predicted actions in defined-as-unfair dilemmas: unfair * It would let us smuggle in unfairness from other dilemmas; e.g. if BlueEyedBot only tries Löbian cooperation against agents with blue eyes, and MetaBlueEyedBot only tries Löbian cooperation against agents that predictably cooperate with BlueEyedBot, then the Prisoner's Dilemma against MetaBlueEyedBot should count as unfair. Modal combat doesn't need to worry about this, because all the agents in it are fair-by-construction. Yeah, I know, it's about a decade late to be asking this question.
Sam Marks*464
5
I've noticed some common misconceptions that people have about OpenAI's recent paper on reward hacking and chain-of-thought (CoT) monitoring. I'll try to correct some of these below. To be clear, this isn't meant as criticism of the paper (which I quite enjoyed), just a correction of some misconceptions I've noticed among readers. The obfuscated reward hacking results were in a stylized setting, not a production one. Recall that in the paper there are two experimental settings: 1. Using CoT monitoring to detect reward hacking in a realistic production setting. 2. Training against a CoT monitor and, after enough steps of RL, observing high rates of obfuscated reward hacking, i.e. reward hacking that isn't disclosed in the model's CoT. Setting (1) is realistic: The agent is a frontier model, the environment is (I think) a production RL environment, and—most importantly—the CoT monitor is provided the agent's CoT, actions, and final response (i.e. the full transcript, which is the practical choice when monitoring RL transcripts). However, I think some readers don't realize that setting (2) is different in a number of ways: 1. The agent is not a frontier model. 2. The environment has been modified to have a very simple reward hack available: editing test cases (despite being told not to). 3. [Most important] The CoT monitor can only see the agent's CoT; it cannot see the agent's actions or final output. Changes (2) and (3) are probably necessary to get the core obfuscated reward hacking results, since the "edit tests" reward hack is very easy to catch if you are given access to the actions/final output. (I.e. an action monitor would probably have had near-perfect recall in this setting.) Why does this matter? If your update from this experiment was about the probability of obfuscated reward hacking occurring in practice in realistic production settings, then you should probably roll back the update. Rather, I think this experiment should be viewed as a nice d
Fabien RogerΩ2654-4
26
I ran quick experiments that make me think that it's somewhat hard for LLMs to learn radically new encodings in an unsupervised way, and thus that LLMs probably won't learn to speak new incomprehensible languages as a consequence of big r1-like RL in the next few years. The experiments I trained Llama 3-8B and some medium-size internal Anthropic models to speak using an encoding style that is very rare on the internet (e.g. map each letter to a random name, and join the names) with SFT on the encoded text and without providing translation pairs. I find that the resulting models: * Have relatively high next-token-prediction losses * Can't speak well (e.g. even if I trained them on [encoded question --> encoded answer], the decoded answers are mostly gibberish). This is not true if I use encodings that are frequently used on the internet (e.g. base64, random letter permutations, ...), and this is less true if I add translation pairs to the training mix. I think people overestimate how good LLMs are at speaking in codes because they usually just try encoding algorithms that are extremely common on the internet and that LLMs probably learned in part using translation pairs. These experiments were done at a relatively small scale (80k sequences of ~50 tokens each, for 40 epochs, for a total of 160M tokens), and I could imagine things being different for large scale SFT or SFT with better base models. But given that RL doesn't teach models that many new bits (you can distill r1 scratchpads using only a few MB of transcripts), I think this is still informative for large-scale RL. Context I ended up giving up on the broader projects as part of which I ran these experiments, but I think they are decently informative about the plausibility of LLMs learning to speak new incomprehensible languages. I did not carefully run any of the experiments above, and I welcome replications. Related observations from the literature I think there is also a somewhat common miscon

Popular Comments

Recent Discussion

I've been pushing for latent variables to be added to prediction markets, including by making a demo for how it could work. Roughly speaking, reflective latent variables allow you to specify joint probability distributions over a bunch of observed variables. However, this is a very abstract description which people tend to find quite difficult, which probably helps explain why the proposal hasn't gotten much traction.

If you are familiar with finance, another way to think of latent variable markets is that they are sort of like an index fund for prediction markets, allowing you to make overall bets across multiple markets. (Though the way I've set them up in this post differs quite a bit from financial index funds.)

Now I've just had a meeting with some people working...

Your linked paper is kind of long - is there a single part of it that summarizes the scoring so I don't have to read all of it?

Either way, yes, it does seem plausible that one could create a market structure that supports latent variables without rewarding people in the way I described it.

Thanks to Jesse Richardson for discussion.

Polymarket asks: will Jesus Christ return in 2025?

In the three days since the market opened, traders have wagered over $100,000 on this question. The market traded as high as 5%, and is now stably trading at 3%. Right now, if you wanted to, you could place a bet that Jesus Christ will not return this year, and earn over $13,000 if you're right.

There are two mysteries here: an easy one, and a harder one.

The easy mystery is: if people are willing to bet $13,000 on "Yes", why isn't anyone taking them up?

The answer is that, if you wanted to do that, you'd have to put down over $1 million of your own money, locking it up inside Polymarket through the end of...

7Linch
I thought about this a bit more, and I'm worried that this is going to be a long-running problem for the reliability of prediction markets for low-probability events.  Most of the problems we currently observe seem like "teething issues" that can be solved with higher liquidity, lower transaction costs, and better design (for example, by having bets denominated in S&P 500 or other stock portfolios rather than $s). But if you should understand "yes" predictions for many of those markets as an implicit bet on differing variances of time value of money in the future, it might be hard to construct a good design that gets around these issues to allow the markets to reflect true probabilities, especially for low-probability events. (I'm optimistic that it's possible, unlike some other issues, but this one seems thornier than most).
Ben20

Wouldn't higher liquidity and lower transaction costs sort this out? Say you have some money tied up in "No, Jesus will not return this year", but you really want to bet on some other thing. If transaction costs were completely zero then, even if you have your entire net worth tied up in "No Jesus" bets you could still go to a bank, point out you have this more-or-less guaranteed payout on the Jesus market, and you want to borrow against it or sell it to the bank. Then you have money now to spend. This would not in any serious way shift the prices of the "... (read more)

2Archimedes
Yep. Meme NFTs are an existence proof of such people. https://en.wikipedia.org/wiki/List_of_most_expensive_non-fungible_tokens
1ThomasJ
This is not a great No bet at current odds even if you are certain the event will not happen. The market resolves Dec 31, which means that you have to lock up your cash for about 9 months for about a 3% rate of return. The best CDs are currently paying around 4-4.5% for 6mo-1y terms. So even for people who bought No at 96% it seems like a bad trade, since you're getting less than the effective risk-free rate, and you're not getting compensated for the additional idiosyncratic risk (e.g. Polymarket resolves to yes because shenanigans, polymarket gets hacked, etc).

you should not reject the 'offer' of a field that yields an 'unfair' amount of grain! - Ultimatum Game (Arbital)

 

In this post, I demonstrate a problem in which there is an agent that outperforms Logical Decision Theory, and show how for any agent you can construct a problem and competing agent that outperforms it. Defining rationality as winning, this means that no agent is rational in every problem.

Symmetrical Ultimatum Game

We consider a slight variation on the ultimatum game to make it completely symmetrical. The symmetrical ultimatum game is a two-player game in which each players says how much money they want. The amount is a positive integer number of dollars. If the sum is ≤$10, both players get the amount of money they choose. Otherwise, they both...

1Christopher King
Towards the end of the post in the No agent is rational in every problem section, I provided a more general argument. I was assuming LDT would fall under case 1, but if not then case 2 demonstrates it is irrational.
2Davidmanheim
"More rational in a given case" isn't more rational! You might as well say it's more rational to buy a given lottery ticket because it's the winning ticket.
1Christopher King
I'm assuming the LDT agent knows what the game is and who their opponent is.

But you really aren't assuming that, you're doing something much stranger. 

Either the actual opponent is a rock, in which case it gains nothing from "winning" the game, and there's no such thing as being more or less rational than something without preferences, or the actual opponent is the agent who wrote the number on the rock and put it in front of the agent, in which case the example fails because the game actually started with an agent explicitly trying to manipulate the LDT agent into underperforming.

2avturchin
Try to put it into Deep Research with the following prompt: "Rewrite in style of Gwern and Godel combined". 

Thank you for the suggestion.

A while ago I tried using AI suggest writing improvements on a different topic, and I didn't really like any of the suggestions. It felt like the AI didn't understand what I was trying to say. Maybe the topic was too different from its training data.

But maybe it doesn't hurt to try again, I heard the newer AI are smarter.

If I keep procrastinating maybe AI capabilities will get so good they actually can do it for me :/

Just kidding. I hope.

confidence level: I am a physicist, not a biologist, so don’t take this the account of a domain level expert. But this is really basic stuff, and is very easy to verify.

Edit: I have added a few revisions and included a fact check of this post by an organic chemist. You can also read the comments on the EA forum to see Yudkowsky's response. 

Recently I encountered a scientific claim about biology, made by Eliezer Yudkowsky. I searched around for the source of the claim, and found that he has been repeating versions of the claim for over a decade and a half, including in “the sequences” and his TED talk. In recent years, this claim has primarily been used as an argument for why an AGI attack...

This post is a stronger arguments against Drexlerian nanomachines that outperform biology in general, which doesn't rely on the straw man.

To get the best posts emailed to you, create an account! (2-3 posts per week, selected by the LessWrong moderation team.)
Log In Reset Password
...or continue with

LessWrong has been receiving an increasing number of posts and comments that look like they might be LLM-written or partially-LLM-written, so we're adopting a policy. This could be changed based on feedback.

Humans Using AI as Writing or Research Assistants

Prompting a language model to write an essay and copy-pasting the result will not typically meet LessWrong's standards. Please do not submit unedited or lightly-edited LLM content. You can use AI as a writing or research assistant when writing content for LessWrong, but you must have added significant value beyond what the AI produced, the result must meet a high quality standard, and you must vouch for everything in the result.

A rough guideline is that if you are using AI for writing assistance, you should spend a minimum of...

So, I've got a question about the policy. My brain is just kind of weird so I really appreciate having claude being able to translate my thoughts into normal speak.

The case study is the following comments in the same comment section:

13 upvotes - written with help of claude

1 upvote (me) - written with the help of my brain only

I'm honestly quite tightly coupled to claude at this point, it is around 40-50% of my thinking process (which is like kind of weird when I think about it?) and so I don't know how to think about this policy change?

3habryka
We get easily like 4-5 LLM-written post submissions a day these days. They are very evidently much worse than the non-LLM written submissions. We sometimes fail to catch one, and then people complain: https://www.lesswrong.com/posts/PHJ5NGKQwmAPEioZB/the-unearned-privilege-we-rarely-discuss-cognitive?commentId=tnFoenHqjGQw28FdY 
4Chris_Leong
Yeah, but how do you know that no one managed to sneak one past both you and the commentators? Also, there's an art to this.
5Chris_Leong
Also, I did not realise that collapsable sections were a thing on Less Wrong. They seem really useful. I would like to see these promoted more. 
2Cole Wyeth
This called a Hurwicz decision rule / criterion (your t is usually alpha). I think the content of this argument is not that maxmin is fundamental, but rather that simplicity priors "look like" or justify Hurwicz-like decision rules. Simple versions of this are easy to prove but (as far as I know) do not appear in the literature.

Thanks for this!

What I was saying up there is not a justification of Hurwicz' decision rule. Rather, it is that if you already accept the Hurwicz rule, it can be reduced to maximin, and for a simplicity prior the reduction is "cheap" (produces another simplicity prior).

Why accept the Hurwicz' decision rule? Well, at least you can't be accused of a pessimism bias there. But if you truly want to dig deeper, we can start instead from an agent making decisions according to an ambidistribution, which is a fairly general (assumption-light) way of making decision... (read more)