I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.
Thanks! I like your title more :)
It seems helpful to me if policy discussions can include phrases like "the evidence suggests that if the current ML systems were trying to deceive us, we wouldn't be able to change them not to".
I take this as evidence that TurnTrout's fears about this paper are well-grounded. This claim is not meaningfully supported by the paper, but I expect many people to repeat it as if it is supported by the paper.
We ended up talking about this in DMs, but to gist of it is:
Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.
Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.
...I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around
I agree that a control group is vital for good science. Nonetheless, I think that such an experiment is valuable and informative, even if it doesn't meet the high standards required by many professional science disciplines. I believe in the necessity of acting under uncertainty. Even with its flaws, this study is sufficient evidence for us to want to enact temporary regulation at the same time as we work to provide more robust evaluations.
But... this study doesn't provide evidence that LLMs increase bioweapon risk.
It doesn't let the government institute prior restraint on speech.
So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.
It seems to me like you've received this feedback already in this very thread. The fact that you're going to edit the claim to basically say "this doesn't effect most people because most people don't work on LLMs" completely dodges the actual issue here, which is that there's a large non-profit and independent open source LL...
Nora didn't say that this proposal is harmful. Nora said that if Zach's explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don't really know what they're talking about) then their existence is likely net-harmful.
That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.
Another reason is that it's impossible to enforce, and passing laws or regulations and then not enforcing them is re...
Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?
Drug regulation, weapons regulation, etc.
As far as I can tell, the commerce clause lets basically everything through.
CAIP is also advised by experts from other organizations and is supported by many volunteers.
Who are the experts that advise you? Are claims like "our proposals will not impede the vast majority of AI developers" vetted by the developers you're looking to avoid impacting?
It’s always interesting to see who has legitimacy in the eyes of mainstream media. The “other companies” mentioned are EleutherAI and Open Future, both of whom co-authored the letter, and LAION who signed it. All three orgs are major players in the open source AI space, and EAI & LAION are arguably bigger than GitHub and CC given that this is specifically about the impact of the EU AI Act on open source large scale AI R&D. Of course, MSN’s target audience hasn’t heard of EleutherAI or LAION.
Note that other orgs have also done blog posts on this top...
It's extremely difficult to create a fraudulent company and get it listed on the NYSE. Additionally, the Exchange can and does stop trading on both individual stocks and the exchange as a whole, though due to the downstream effects on consumer confidence this is only done rarely.
I don't know what lessons one should learn from the stock market regarding MM, but I don't think we should rush to conclude MM shouldn't intervene or shouldn't be blamed for not intervening.
I don’t understand the community obsession with Tao and recruiting him to work on alignment. This is a thing I hear about multiple times a year with no explanation of why it would be desirable other than “he’s famous for being very smart.”
I also don’t see why you’d think there’s be an opportunity to do this… it’s an online event, which heavily limits the ability to corner him in the hallway. It’s not even clear to me that you’d have an opportunity to speak with him… he’s moderating several discussions and panels, but any submitted questions to said events would go to the people actually in the discussions not the moderator.
Can you elaborate on what you’re actually thinking this would look like?
Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.
But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).
What deployed LLM system does Tesla make that you think should be evaluated alongside ChatGPT, Bard, etc?
Hi, I’m helping support the event. I think that some mistranslation happened by a non-AI person. The event is about having humans get together and do prompt hacking and similar on a variety of models side-by-side. ScaleAI built the app that’s orchestrating the routing of info, model querying, and human interaction. Scale’s platform isn’t doing the evaluation itself. That’s being done by users on-site and then by ML and security researchers analyzing the data after the fact.
I think there's a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response.
Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.
I agree completely. This is a plausible explanation, but it’s one of many plausible explanations and should not be put forward as a fact without evidence. Unfortunately, said evidence is impossible to obtain due to OpenAI’s policies regarding access to their models. When powerful RLHF models begin to be openly released, people can start testing theories like this meaningfully.
Linear warm-up over the first 10% of training, then cosine decay to a minimum of one-tenth the peak LR which is set to occur at the end of training (300B tokens). Peak LRs vary by model but are roughly consistent with GPT-3 and OPT values. You can find all the config details on GitHub. The main divergence relevant to this conversation from mainstream approaches is that we use a constant batch size (2M) throughout scaling. Prior work uses batch sizes up to 10x smaller for the smallest models, but we find that we can train large batch small models without an...
This is really exciting work to see, and exactly the kind of thing I was hoping people would do when designing the Pythia model suite. It looks like you're experimenting with the 5 smallest models, but haven't done analysis on the 2.8B, 6.9B, or 12B models. Is that something you're planning on adding, or no?
I am really very surprised that the distributions don't seem to match any standard parameterized distribution. I was fully ready to say "okay, let's retrain some of the smaller Pythia models initialized using the distribution you think the weights come ...
This is excellent work, though I want to generically recommend caution when making assumptions about the success of such attacks based only on blackbox evaluations. Thorough analysis of false positive and false negative rates with ground-truth access (ideally in an adversarially developed setting) is essential for validation. [Sidebar: this reminds me that I really need to write up my analysis in the EleutherAI discord showing why prompt extraction attacks can be untrustworthy]
That said, this is really excellent work and I agree it looks quite promising.
Do you have a reference to the work you’re talking about? I’m doing some stuff involving fitting curves to activation tails currently.
This is very interesting. The OP doesn’t contain any specific evidence of Gaussianness, so it would be helpful if they could provide an elaboration of what evidence lead them to conclude these are Gaussian.
I’m not sure when you developed this work, but the LLM.int8 paper identifies outliers as an essential factor in achieving performance for models larger than 2.7B parameters (see Fig. 1 and Fig. 3 especially). There’s also some follow-up work here and here. Very curiously, the GLM-130B paper reports that they don’t see outlier features at all, or the negative effects of their lack of impact.
I’ve spoken with Tim (LLM.int8 lead author) about this a bit and some people in EleutherAI, and I’m wondering if there’s some kind of explicit or implicit regularizing e...
I think that the answer is no, and that this reflects a common mental barrier when dealing with gradient descent. You would like different experts to specialize in different things in a human-interpretable way, but Adam doesn’t care what you say you want. Adam only cares about what you actually write down in the loss function.
Generally, a useful line of thinking when dealing with lines of thought like this is to ask yourself if your justification for why something should happen already justifies something that is known to not happen. If so, it’s probably f...
What sources do you have for your claim that “large groups” of people believe this?
Hi! I recently trained a suite of models ranging from 19M to 13B parameters with the goal of promoting research on LLM interpretability. I think it would be awesome to try out these experiments on the model suite and look at how the results change as the models scale. If your code used the HF transformers library it should work more or less out of the box with my new model suite.
You can find out more here: https://twitter.com/AiEleuther/status/1603755161893085184?s=20&t=6xkBsYckPcNZEYG8cDD6Ag
Individual MMMLU tasks are extremely noisy. They’re so noisy that the paper actually specifically recommends that you don’t draw conclusions from performance on individual tasks and instead look at four high level topical categories. The individual tasks also have extremely large variances in their variance. Some of them are pretty easy for a college educated adult, while others have genuine experts scoring less than 80%.
This is compounded by the fact that the sample sizes vary wildly. While many of the tasks have around 100 questions, while at the other e...
I agree with what Gwern said about things being behind-the-scenes, but it's also worth noting that there are many impactful consumer technologies that use DL. In fact, some of the things that you don't think exist actually do exist!
Interesting. Thank you.
To be clear, you now understand that the content of the sentence "I am a transgender man" is more or less "contrary to popular opinion, I am in fact a man and not a woman"? And that pronouns only even come up because they are one of the many ways people convey assessments of gender?
I'm not even going to pretend to address the first half of your comment. You're making extreme jumps of logic that are in no way justified by the conversation.
...So that is the strong-request/demand that it's reasonable for people to get from "society". (If people in power were unambiguously saying "In order to be polite and not be called bad, you must think of these people in a certain way", then I think there would be revolts.) If someone hasn't become emotionally close friends with any trans people, I'd say it's not too surprising if they haven
You're talking as though there is some background you don't share with me, so I shall establish that background.
I tried googling "fired for not using pronouns", and the results page had news articles pointing to several different cases of that—usually teachers—as well as this page, seemingly written by lawyers, titled "What Can Employers Do About Employees Who Refuse To Refer To Transgendered Employees By Their Preferred Names Or Pronouns?".
The page basically recommends firing them; it says "Even if the employee has “for cause” protection through an employ...
What does the word "man" mean in the sentence "contrary to popular opinion, I am in fact a man and not a woman"? Given that popular opinion is, in fact, wrong about this, we should be able to describe some observation or experimental test where the man makes better predictions than the populace, right? What is it, specifically? (I think there are real answers to this, but I'm interested in what you think.)
...As a cis person who has interacted occasionally with trans people for the past ten years, it literally never occurred to me until last year that what trans people were asking me to do was actually reconsider my impression of their gender! I sincerely thought they were just asking me to memorize a different word to call them. I will at least try out a "reconsidering" process the next time I regularly interact with a trans person IRL and see whether it works. (I have also never read about what kind of "reconsidering" processes work for people, but I have som
Basically, my experience went like this:
Not OP, but for what it's worth, I consider it unreasonable to request that other people think of you in a certain way (be it gender, or having personal traits or skills or anything), or at least for there to be any sense of expectation or obligation that they will fulfill such a request. That would be actual thought-policing, and abhorrent to me. It's reasonable to want people to think of you a certain way, to hope that they will, to take actions that will hopefully increase the likelihood of it, and possibly to only be close friends with peop...
To do this, we'll start by offering alignment as a service for more limited AIs. Value extrapolation scales down as well as up: companies value algorithms that won't immediately misbehave in new situations, algorithms that will become conservative and ask for guidance when facing ambiguity.
What are examples of AIs you think you can currently align and how much (order of magnitude, say) would it cost to have you align one for me? If I have a 20B parameter language model, can you align it for me?
The distinction between "large scale era" and the rest of DL looks rather suspicious to me. You don't give a meaningful defense of which points you label "large scale era" in your plot and largely it looks like you took a handful of the most expensive models each year to give a different label to.
On what basis can you conclude that Turing NLG, GPT-J, GShard, and Switch Transformers aren't part of the "large scale era"? The fact that they weren't literally the largest models trained that year?
There's also a lot of research that didn't make your analysis, in...
1: I expect that it's easier for authors to write longer thoughtful things that make sense;
I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don't want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it's a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it's easier to find 1,000 people to write 60 pages each than it is to find 10...
Hi! Co-author of the linked “exploration” here. I have some reservations about the exact request (left as a separate comment) but I’m very excited about this idea in general. I’ve been advocating for direct spending on AI research as a place with a huge ROI for alignment research for a while and it’s very exciting to see this happening.
I don’t have the time (or aptitude) to produce a really high quality dataset, but I (and EleutherAI in general) would be happy to help with training the models if that’s desired. We’d be happy to consult on model design or t...
What is the purpose of requesting such extremely long submissions? This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest. There’s inher...
Answer 1: Longer is easier to write per-step.
Fitting a coherent story with interesting stuff going on into 100 steps, is something I expect to be much harder for a human author than fitting that story into 1000 steps. Novels are ...
Also, I'm unclear on what constitutes a "run"... roughly how long does the text have to be, in words, to have a chance at getting $20,000?
Using the stated length estimates per section, a single run would constitute approximately 600 pages of single spaced text. This is a lot of writing.
Interesting… I was busy and wasn’t able to watch the workshop. That’s good to know, thanks!
For Sanh et al. (2021), we were able to negotiate access to preliminary numbers from the BIG Bench project and run the T0 models on it. However the authors of Sanh et al. and the authors of BIG Bench are different groups of people.
What makes you say BIG Bench is a joint Google / OpenAI project? I'm a contributor to it and have seen no evidence of that.
I think that 4 is confused when people talk about "the GPT-3 training data." If someone said "there are strings of words found in the GPT-3 training data that GPT-3 never saw" I would tell them that they don't know what the words in that sentence mean. When an AI researcher speaks of "the GPT-3 training data" they are talking about the data that GPT-3 actually saw. There's data that OpenAI collected which GPT-3 didn't see, but that's not what the words "the GPT-3 training data" refers to.
Or is it "Predict the next word, supposing what you are reading is a random-with-the-following-weights sample from dataset D? [where D is the dataset used to train GPT-3]
This is the correct answer.
The problem with these last two answers is that they make it undefined how well GPT-3 performs on the base objective on any prompt that wasn't in D, which then rules out psuedo-alignment by definition.
This is correct, but non-problematic in my mind. If data wasn’t in the training dataset, then yes there is no fact of the matter as to what training signal G...
My thinking is that prosaic alignment can also apply to non-super intelligent systems. If multimodal GPT-17 + RL = superintelligence, then whatever techniques are involved with aligning that system would probably apply to multimodal GPT-3 + RL, despite not being superintelligence. Superintelligence is not a prerequisite for being alignable.
If superintelligence is approximately multimodal GPT-17 plus reinforcement learning, then understanding how GPT-3-scale algorithms function is exceptionally important to understanding super-intelligence.
Also, if superintelligence doesn’t happen then prosaic alignment is the only kind of alignment.
Strong upvote.
My original exposure to LW drove me away in large part because issues you describe. I would also add (at least circa 2010) you needed to have a near-deistic belief in the anti-messianic emergence of some AGI so powerful that it can barely be described in terms of human notions of “intelligence.”
Yes, new information absolutely exists. Thinking about new information in some kind of absolute sense (“has anyone else ever had this thought?”) is the wrong approach in my mind. What we are really interested in is new information relative to an established set of knowledge. Information theory tells us that there’s a maximum amount of information that can be encoded in k bits, and (at least as long as our system is significantly smaller than the universe) so we can find information that’s not encoded in the existing system.
Whether GPT-3 is likely to succeed at doing this is a statistical and empirical question, but at a minimum the answer to the title question is a resounding “yes.”
It’s interesting how Microsoft and NVIDIA are plugging EleutherAI and open source work in general. While they don’t reference EleutherAI by name, the Pile dataset used as the basis for their training data and the LM Evaluation Harness mentioned in the post are both open source efforts by EleutherAI. EleutherAI, in return, is using the Megatron-DS codebase as the core of their GPT-NeoX model architecture.
I think that this is notable because it’s the first time we’ve really seen powerful AI research orgs sharing infra like this. Typically everyone wants to d...
Why is this problem better solved by systematically underpaying everyone as opposed to firing people who act “in favor of what advances their own power” or who promote infighting?
This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.