As is always the case, this person changed their mind because they were made to feel valued. The community treated what they'd done with respect (even though, fundamentally, they were unsuccessful and the actual release of the model would have had no impact on the world), and as a result they capitulated.
While I agree that this is an important factor when modelling people’s decision-making, I think there is some straightforward evidence that this was not the primary factor here.
Firstly, after the person spent an hour talking to friendly and helpful people from the high-status company, they did not change their decision, which is evidence against most parsimonious of status-based motives. (Relatedly, there was not a small set of people the author promised to read feedback from, but literally 100% of respondents, which is over-and-above what would be useful for getting the attention of key people.)
And secondly, which is more persuasive for me though harder to communicate, I read the extensive reasons for their decisions for doing so, and they seemed clear and well-reasoned, and then the reasons against were important factors that are genuinely nuanced and hard to notice. It seemed to me more of a situation where someone actually improves their understanding of the world than one in which they were waiting for certain high-status-to-them people to give them attention. My sense is that writing that explains someone’s decisions that is wholly motivated by status makes less sense than these two posts did.
You might still be right and I might have missed something, or just not have a cynical enough prior. Though I do believe people do sometimes change their actions due to good reasoning about the world and not solely due to immediate status considerations, and I feel very skeptical of any lens on the world that can’t (“As is always the case”) register a positive result on the question “Did this person make their decision due to updating their world model rather than short-sighted status-grabbing?”.
Am interested to hear further thoughts of yours on the broader topic of modelling people’s decision making as primarily status based, if you have more things to add to the discussion.
The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully. That said, I do think that its interesting to understand the way status plays into these events.
First, they started the essay with a personality-focused explanation:
To explain how this all happened, and what we can learn from it, I think it’s important to learn a little bit more about my personality and with what kind of attitude and world model I came into this situation.
and
I have a depressive/paranoid streak, and tend to assume the worst until proven otherwise. At the time I made my first twitter post, it seemed completely plausible in my mind that no one, OpenAI or otherwise, would care or even notice me. Or, even worse, that they would antagonize me."
The narrative that the author themselves is setting up is that they had irrational or emotional reasons for behaving the way they did, then they considered longer and changed their mind. They also specifically call out that their perceived lack of self-status as an influencing factor.
If someone has an irrational, status-focused explanation for their own initial reasoning, and then we see high-status people providing them extensive validation, it doesn't mean that they changed their mind because of the high-status people, but it's suggestive. My real model is that they took those ideas extra seriously because the people were nice and high status.
Imagine a counterfactual world where they posted their model, and all of the responses they received were the same logical argument, but instead made on 4Chan and starting with "hey fuckhead, what are you trying to do, destroy the world?" My priors suggest that this person would have, out of spite, continued to release the model.
The gesture they are making here, not releasing the model, IS purely symbolic. We know the model is not as good as mini-GPT2. Nonetheless, it may be useful to people who aren't being supported by large corporate interests, either for learning or just for understanding ML better for real hackers. Since releasing the model is not a bona fide risk, part of not releasing it is so they can feel like they are part of history. Note the end where they talk about the precedent they are setting now by not releasing it.
I think the fact that the model doesn't actually work is an important aspect of this. Many hackers would have done it as a cool project and released it without pomp, but this person put together a long essay, explicitly touting the importance of what they'd done and the impact it would have on history. Then, it turned out the model did not work, which must have been very embarrassing. It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status: writing an essay about why they were not releasing the model for good rationalist approved reasons. It is not even necessarily the case that the person is aware that this is influencing the decision, this is a fully Elephant in the Brain situation.
When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made.
How you feel about this should probably turn on questions like "Who has the status in this community to have their arguments taken seriously? Do I agree with them?" and "Is it good for only well-funded entities to have access to current state-of-the-art ML models?"
I agree with a lot of claims in your comment, and I think it's valuable to think through how status plays a role in many situations, including this.
There is an approach in your comments toward explaining someone's behaviour that I disagree with, though it may just be a question of emphasis. A few examples:
My real model is that they took those ideas extra seriously because the people were nice and high status.
...a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully
These seem to me definitely true and simultaneously not that important*.
When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made. (emphasis added)
The word 'probably' in that sentence feels false to me. I feel somewhat analogous to hearing someone argue that a successful tech startup is 100s of people working together in a company, and that basically running a tech startup is about status and incentives, though "probably code still had to be written" to make it successful. They're both necessary.
More generally, there are two types of games going on. One we're allowed to talk about, and one we're not, or at least not very directly. And we have to coordinate on both levels to succeed. This generally warps how our words relate to reality, because we're also using those words to do things we're pretending to ourselves we're not doing, to let everyone express their preferences and coordinate in the silent games. These silent games have real and crucial implications for how well we can coordinate and where resources must be spent. But once you realise the silent games are being played, it isn't the right move to say that the silent games are the only games, or always the primary games.
I think the fact that the model doesn't actually work is an important aspect of this. Many hackers would have done it as a cool project and released it without pomp, but this person put together a long essay, explicitly touting the importance of what they'd done and the impact it would have on history. Then, it turned out the model did not work, which must have been very embarrassing. It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status: writing an essay about why they were not releasing the model for good rationalist approved reasons. It is not even necessarily the case that the person is aware that this is influencing the decision, this is a fully Elephant in the Brain situation.
Again, I agree that something in this reference class is likely happening. But, for example, the long essay was not only about increasing the perceived importance of the action. It was also a strongly pro-social and cooperative move to the broader AI community to allow counterarguments to be presented, which is what successfully happened. There are multiple motives here, and (I think) it's the case that the motive you point to was not the main one, even while it is a silent motive folks systematically avoid discussing.
--
*Actually I think that Connor in particular would've engaged with arguments even if they'd not been delivered respectfully, given that he responded substantively to many comments on Twitter/HackerNews/Medium, some of which were predominantly snark.
When Robin Hanson is interviewed about The Elephant in the Brain, he is often asked "Are you saying that status accounts for all of our behaviour?". His reply is that he+KevinSimler aren't arguing that the hidden motives are the only motive, but that they're a far more common motive than we give credit for in our normal discourse. Here's an example of him saying this kind of thing on the 80k podcast:
As we just said the example that, in education, your motive isn’t to learn the material, or when you go to the doctor, your motive isn’t to get well primarily, and the hidden motives are the actual motive. Now, how could I know what the hidden motives are, you might ask? The plan here, that’s where the book is … In each area, we identify the usual story, then we collect a set of puzzles that don’t make sense from the point of view of the usual story, strange empirical patterns, and then we offer an alternative motive that makes a lot more sense of those empirical patterns, and then we suggest that that is a stronger motive than the one we usually say.
Now, just to be clear, almost every area of human life is complicated, and there’s a lot of people with a lot of different details and so, of course, almost every possible motive shows up in almost every area of human life, so we can’t be talking about the only motive, and so the usual motive does actually apply sometimes. Actually, you could think of the analogy to the excuse that the dog ate my homework. It only works because sometimes dogs eat homework. We don’t say the dragon ate my homework. That wouldn’t fly, so the usual story is part of the story. It’s just a smaller part than we like to admit, and what we’re going to call the hidden motive, the real motive is a bigger part of the story, but it’s still not the only part.
it turned out the model did not work... It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status
Reading this I realise I developed most of my attitudes toward the topic when I believed that the copy was full-strength, and only in writing the post did I find out that it wasn't - in fact it seems that it was weaker than the initial 117M version OpenAI released. You're right that this makes the 'release' option less exciting from the perspective of one's personal status, which (the status lens) would then predict taking whichever different action would give more personal status, and this is arguably one of those actions.
Just now I found this comment in the medium comment section, where Connor agrees with you about it being symbolic, and mentions how this affected his thinking.
...I did admit failure as I linked to said failure in the very first paragraph, and I have no intentions of hiding that. In fact, after learning of my failure I was convinced I might as well release, since most safety issues were no longer a threat anyways (though there remains the possibility it could be used as a “warm start” to train a better model). So if anything, my failure encouraged me to dump it, apologize and let history take its course.
My decision not to release is mostly symbolic. I’m doing it to signal good faith cooperation. Even if I failed today, some day someone will succeed, and we should have a default of cooperation before that.
(Meta: Wow, Medium requires you to click twice to go down one step in a comment thread! Turns out there are like 20 comments on the OP.)
Yeah, this is quite important, the attempted copy was weaker than the nerfed model OpenAI initially released. Thanks for emphasising this 9eB1, I've updated my post in a few places accordingly.
The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taking seriously and engaged with respectfully.
Yeah, respectful and serious engagement with people’s ideas, even when you’re on the opposite sides of policy/norm disputes, is very important.
On reading that I was genuinely delighted to see such pro-social and cooperative behaviour from the person who believed OpenAI was wrong.
I think the pro-social and cooperative thing to do was to email OpenAI privately rather than issuing a public ultimatum.
I’m imagining here something like a policy of emailing OpenAI and telling them your plan and offering them as much time to talk as possible, and saying that in a week you’ll publicly publish your reasoning too so that other people can respond + potentially change your mind. I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.
It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him) (Added: Buck says this is true). Given the OP I don’t think it would’ve been able to resolve privately, and I think I am quite actively happy that it has resolved the way it has: Someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.
I’d be interested to know if you think that it would’ve been perfectly pro-social to give OpenAI a week’s heads-up and then writing your reasoning publicly and reading everyone else’s critiques (100% of random people from Hacker News and Twitter and longer chats with Buck). I have a sense that you wouldn’t but I’m not fully sure why.
I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.
Yeah, that seems reasonable, but it doesn't seem like you could reasonably have 99% confidence in this.
It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him).
I agree with this, but it's ex-post reasoning, I don't think this was predictable with enough certainty ex-ante.
Given the OP I don’t think it would’ve been able to resolve privately, but if it had I think I’d be less happy than with what actually happened, which is someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.
It's always possible to publicly post after you've come to the decision privately. (Also, I'm really only talking about what should have been done ex-ante, not ex-post.)
I’d be interested to know if you think that it would’ve been perfectly pro-social to give OpenAI a week’s heads-up and then writing your reasoning publicly and reading everyone else’s critiques (100% of random people from Hacker News and Twitter and longer chats with Buck). I have a sense that you wouldn’t but I’m not fully sure why.
That seems fine, and very close to what I would have gone with myself. Maybe I would have first emailed OpenAI, and if I hadn't gotten a response in 2-3 days, then said I would make it public if I didn't hear back in another 2-3 days. (This is all assuming I don't know anyone at OpenAI, to put myself in the author's position.)
As I mentioned above, it's always possible to publicly post after you've come to the decision privately.
If people choose whether to identify with you at your first public statement, switching tribes after that can carry along lurkers.
Agreed that this is a benefit of what actually happened, but I want to note that if you're banking on this ex ante, you're deciding not to cooperate with a group X because you want to publicly signal allegiance to group Y with the expectation that you will then switch to group X and take along some people from group Y.
This is deceptive, and it harms our ability to cooperate. It seems pretty obvious to me that we should not do that under normal circumstances.
(I really do only want to talk about what should be done ex ante, that seems like the only decision-relevant thing here.)
I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use. That said, getting lurkers to identify with you, then gathering evidence that will sway you, and them, one way or the other, is a force multiplier on an asymmetric weapon pointed towards truth. You need only see the possibility of switching sides to use this. He was open about being open to be convinced. It's like preregistering a study.
You're right, it's too harsh to claim that this is deceptive. That does seem more reasonable. I still think it isn't worth it given the harm to your ability to coordinate.
I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use.
Sorry, I thought you were defending the decision. I'm currently only interested in decision-relevant aspects of this, which as far as I can tell means "how the decision should be made ex-ante", so I'm not going to speculate on nearsighted-consequentialist-reasons.
Given that status seem to have always been coupled to reproductive success for a very long time, it should not be surprising that evolution wired up humans to be status seekers. This wasn't recognized back in the mid 90s and I got in considerable trouble for claiming this to be a common motive.
This is a linkpost for some interesting discussions of info security norms in AI. I threw the post below together in 2 hours, just to have a bunch of quotes and links for people, and to have the context in one place for a discussion here on LW (makes it easier for common knowledge of what the commenters have and haven't seen). I didn't want to assume people follow any news on LW, so for folks who've read a lot about GPT-2 much of the post is skimmable.
Background on GPT-2
In February, OpenAI wrote a blogpost announcing GPT 2:
This has been a very important release, not least due to it allowing fans to try (and fail) to write better endings to Game of Thrones. Gwern used GPT-2 to write poetry and anime. There have been many Medium posts on GPT-2, some very popular, and at least one Medium post on GPT-2 written by GPT-2. There is a subreddit where all users are copies of GPT-2, and they imitate other subreddits. It got too meta when the subreddit imitated another subreddit about people play-acting robots-pretending-to-be-humans. Stephen Woods has lots of examples including food recipes.
Here in our rationality community, we created user GPT-2 trained on the entire corpus of LessWrong comments and posts and released it onto the comment section on April 1st (a user who we warned and then banned). And Nostalgebraist created a tumblr trained on the entire writings of Eliezer Yudkowsky (sequences+HPMOR), where Nostalgebraist picked their favourites to include on the Tumblr.
There was also very interesting analysis on LessWrong and throughout the community. The post that made me think most on this subject is Sarah Constantin's Human's Who Are Not Concentrating Are Not General Intelligences. Also see SlateStarCodex's Do Neural Nets Dream of Electric Hobbits? and GPT-2 As Step Toward General Intelligence, plus my teammate jimrandomh's Two Small Experiments on GPT-2.
However, these were all using a nerfed version of GPT-2, which only had 175 million parameters, rather than the fully trained model with 1.5 billion parameters. (If you want to see examples of the full model, see the initial announcement posts for examples with unicorns and more.)
Reasoning for only releasing a nerfed GPT-2 and response
OpenAI writes:
While the post includes some discussion of how specifically GPT-2 could be used maliciously (e.g. automating false clickbait news, automated spam, fake accounts) the key line is here.
Is this out of character for OpenAI - a surprise decision? Not really.
Public response to decision
There has been discussion in news+Twitter, see here for an overview of what some people in the field/industry have said, and what the news media has written. The main response that's been selected for by news+twitter is that OpenAI did this primarily as a publicity stunt.
For a source with a different bias than the news and Twitter (which selects heavily for anger and calling out of norm violation), I've searched through all Medium articles on GPT-2 and copied here any 'most highlighted comments'. Most posts actually didn't have any, which I think means they haven't had many viewers. Here are the three I found, in chronological order.
OpenAIs GPT-2: The Model, The Hype, and the Controvery
OpenAI GPT-2: Understanding Language Generation through Visualization
GPT-2, Counting Consciousness and the Curious Hacker
I wrote this linkpost to discuss the last one. See below.
Can someone else just build another GPT-2 and release the full 1.5B parameter model?
From the initial OpenAI announcement:
Since the release, one researcher has tried to reproduce and publish OpenAI's result. Google has a program called TensorFlow Research Cloud that gives loads of free compute to researchers affiliated with various universities, which let someone train an attempted copy of GPT-2 with 1.5 billion parameters. They say:
That said, it turned out that the copy did not match up in skill level, and is weaker even than nerfed model OpenAI released. The person who built it says (1) they think they know how to fix it and (2) releasing it as-is may still be a helpful "shortcut" for others interested in building a GPT-2-level system; I don't have the technical knowledge to assess these claims, and am interested to hear from others who do.
During the period where people didn't know that the attempted copy was not successful, the person who made the copy wrote a long and interesting post explaining their decision to release the copy (with multiple links to LW posts). It discussed reasons why this specific technology may cause us to better grapple with misinformation on the internet that we hear. The author is someone who had a strong object level disagreement with the policy people at OpenAI, and had thought pretty carefully about it. However, it opened thus:
And they later said
On reading the initial I was genuinely delighted to see such pro-social and cooperative behaviour from the person who believed OpenAI was wrong. They considered unilaterally overturning OpenAI's decision but instead chose to spend 11,000 words explaining their views and a month reading others' comments and talking to people. This, I thought, is how one avoids falling prey to Bostrom's unilateralist curse.
Their next post The Hacker Learns to Trust was released 6 days later, where they decided not to release the model. Note that they did not substantially change their opinions on the object level decision.
They then talked with Buck from MIRI (author of this great post). Talking with Buck lead them to their new view.
The person also came to believe that the AI (and AI safety) community was much more helpful and cooperative than they'd expected.
Overall, the copy turned out not to be strong enough to change the ability for malicious actors to automate spam/clickbait, but I am pretty happy with the public dialogue and process that occurred. It was a process whereby, in a genuinely dangerous situation, the AI world would not fall prey to Bostrom's unilateralist's curse. It's encouraging to see that process starting to happen in the field of ML.
I'm interested to know if anyone has any different takes, info to add, or broader thoughts on information-security norms.
Edited: Thanks to 9eB1 for pointing out how nerfed the copy was, I've edited the post to reflect that.