All of StellaAthena's Comments + Replies

This is one area where I hope the USG will be able to exert coercive force to bring companies to heel. Early access evals, access to base models, and access to training data seem like no-brainers from a regulatory POV.

I think you're misrepresenting Gwern's argument. He's arguing that terrorists are not optimizing for killing the most people. He makes no claims about whether terrorists are scientifically incompetent.

3seed
I agree that it's his main point; however, he's also making an observation that most terrorists are incompetent, impulsive, have poor preparation and planning, and choose difficult forms of attacks when better options are available. The post has several anecdotes illustrating that.  He believes the incompetence is caused by terrorist acting on social incentives instead of optimizing for their stated goals. However, what if some terrorist group has one earnest terrorist, or what if the chatbot provides the social encouragement need to spur a terrorist to action while simultaneously suggesting a more effective stategy? There also lone wolf terrorist who, while more practical, are limited to their own ideas, so probably less competent than a whole team of researchers.

It seems helpful to me if policy discussions can include phrases like "the evidence suggests that if the current ML systems were trying to deceive us, we wouldn't be able to change them not to".

I take this as evidence that TurnTrout's fears about this paper are well-grounded. This claim is not meaningfully supported by the paper, but I expect many people to repeat it as if it is supported by the paper.

3evhub
That's not evidence for Alex's claim that people will misinterpret our results, because that's not a misinterpretation—we explicitly claim that our results do in fact provide evidence for the hypothesis that removing (edit: deceptive-alignment-style) deception in ML systems is likely to be difficult.
1kave
Yeah I was fairly sloppy here. I did mean the "like" to include tweaking to be as accurate as possible, but that plausibly didn't bring the comment above some bar. For clarity: I haven't read the paper yet. My current understanding isn't able to guess what your complaint would be though. Ryan's more careful "the evidence suggests that if current ML systems were lying in wait with treacherous plans and instrumentally acting nice for now, we wouldn't be able to train away the treachery" seems reasonable from what I've read, and so does "some evidence suggests that if current ML systems were trying to deceive us, standard methods might well fail to change them not to".

We ended up talking about this in DMs, but to gist of it is:

Back in June Hoagy opened a thread in our "community research projects" channel and the work migrated there. Three of the five authors of the [eventual paper](https://arxiv.org/abs/2309.08600) chose to have EleutherAI affiliation (for any work we organize with volunteers, we tell them they're welcome to use an EleutherAI affiliation on the paper if they like) and we now have an entire channel dedicated to future work. I believe Hoagy has two separate paper ideas currently in the works and over a half dozen people working on them.

Ooops. It appeared that I deleted my comment (deeming it largely off-topic) right as you were replying. I'll reproduce the comment below, and then reply to your question.

I separately had a very weird experience with them on the Long Term Future Fund where Conor Leahy applied for funding for Eleuther AI. We told him we didn't want to fund Eleuther AI since it sure mostly seemed like capabilities-research but we would be pretty interested in funding AI Alignment research by some of the same people. He then confusingly went around to a lot of people around

... (read more)

I agree that a control group is vital for good science. Nonetheless, I think that such an experiment is valuable and informative, even if it doesn't meet the high standards required by many professional science disciplines. I believe in the necessity of acting under uncertainty. Even with its flaws, this study is sufficient evidence for us to want to enact temporary regulation at the same time as we work to provide more robust evaluations.

But... this study doesn't provide evidence that LLMs increase bioweapon risk.

0Chris_Leong
Define evidence. I'm not asking this just to be pedantic, but because I think it'll make the answer to your objection clearer.

It doesn't let the government institute prior restraint on speech.

So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.

It seems to me like you've received this feedback already in this very thread. The fact that you're going to edit the claim to basically say "this doesn't effect most people because most people don't work on LLMs" completely dodges the actual issue here, which is that there's a large non-profit and independent open source LL... (read more)

Nora didn't say that this proposal is harmful. Nora said that if Zach's explanation for the disconnect between their rhetoric and their stated policy goals is correct (namely that they don't really know what they're talking about) then their existence is likely net-harmful.

That said, yes requiring everyone who wants to finetune LLaMA 2 get a license would be absurd and harmful. la3orn and gallabyres articulate some reasons why in this thread.

Another reason is that it's impossible to enforce, and passing laws or regulations and then not enforcing them is re... (read more)

Also, such a regulation seems like it would be illegal in the US. While the government does have wide latitude to regulate commercial activities that impact multiple states, this is rather specifically a proposal that would regulate all activity (even models that never get released!). I'm unaware of any precedent for such an action, can you name one?

Drug regulation, weapons regulation, etc.

As far as I can tell, the commerce clause lets basically everything through.

CAIP is also advised by experts from other organizations and is supported by many volunteers.

Who are the experts that advise you? Are claims like "our proposals will not impede the vast majority of AI developers" vetted by the developers you're looking to avoid impacting?

4Thomas Larsen
We haven't asked specific individuals if they're comfortable being named publicly yet, but if advisors are comfortable being named, I'll announce that soon. We're also in the process of having conversations with academics, AI ethics folks,  AI developers at small companies, and other civil society groups to discuss policy ideas with them. So far, I'm confident that our proposals will not impede the vast majority of AI developers, but if we end up receiving feedback that this isn't true, we'll either rethink our proposals or remove this claim from our advocacy efforts.  Also, as stated in a comment below:

It’s always interesting to see who has legitimacy in the eyes of mainstream media. The “other companies” mentioned are EleutherAI and Open Future, both of whom co-authored the letter, and LAION who signed it. All three orgs are major players in the open source AI space, and EAI & LAION are arguably bigger than GitHub and CC given that this is specifically about the impact of the EU AI Act on open source large scale AI R&D. Of course, MSN’s target audience hasn’t heard of EleutherAI or LAION.

Note that other orgs have also done blog posts on this top... (read more)

It's extremely difficult to create a fraudulent company and get it listed on the NYSE. Additionally, the Exchange can and does stop trading on both individual stocks and the exchange as a whole, though due to the downstream effects on consumer confidence this is only done rarely.

I don't know what lessons one should learn from the stock market regarding MM, but I don't think we should rush to conclude MM shouldn't intervene or shouldn't be blamed for not intervening.

5Nicholas / Heather Kross
Agreed, most "fraudulent" listed public companies (on places like the NYSE, where they actually check stuff), fill weird conditions like: * being really old, to where their corporate history stretches back before modern high-standards checks. * being acquired/SPAC'd to allow a fraudulent private company to kinda list itself. * be based on a larger fraud that probably isn't accounting/insider related. (Disclaimer: not an expert, not financial advice.)

I don’t understand the community obsession with Tao and recruiting him to work on alignment. This is a thing I hear about multiple times a year with no explanation of why it would be desirable other than “he’s famous for being very smart.”

I also don’t see why you’d think there’s be an opportunity to do this… it’s an online event, which heavily limits the ability to corner him in the hallway. It’s not even clear to me that you’d have an opportunity to speak with him… he’s moderating several discussions and panels, but any submitted questions to said events would go to the people actually in the discussions not the moderator.

Can you elaborate on what you’re actually thinking this would look like?

Red teaming has always been a legitimate academic thing? I don’t know what background you’re coming from but… you’re very far off.

But yes, the event organizers will be writing a paper about it and publishing the data (after it’s been anonymized).

1VojtaKovarik
I imagine this would primarily be a report from the competition? What I was thinking about was more about how this sort of assessment should be done in general, what are the similarities and differences between cybersecurity, and how to squeeze more utility out of it. For example, a (naive version of) one low-hanging fruit is to withhold 10% of the obtained data (from the AI companies, then test those jailbreak strategies later). This would give us some insight into whether the current "alignment" methods generalise, or whether we are closer to playing whack-a-mole. Similarly to how we use test data in ML. There are many more considerations, and many more things you can do. And I don't claim to have all the answers, nor to be the optimal person to be writing about them. Just that it would be good if somebody was doing that (and wondering whether that is happening :-) ).
1VojtaKovarik
Theoretical CS/AI/game theory, rather than cybersecurity. Given the lack of cybersec background, I acknowledge I might be very far off. To me, it seems that the perception in cybersecurity might be different from the perception outside of it. Also, red teaming in the context of AI models might have important differences from cybersecurity context. Also, red teaming by public seems, to me, different from internal red-teaming or bounties. (Though this might be one of the things where I am far off.)

What deployed LLM system does Tesla make that you think should be evaluated alongside ChatGPT, Bard, etc?

1RomanS
I'm not aware of any LLM systems by Tesla.  But their self-driving AI is definitely worth evaluating. The task of self-driving on a busy city road is extremely hard to solve (if not AGI-complete), yet their AI is surprisingly good at that. It still fails in many circumstances, but is surprisingly good overall. Tesla could be closer to AGI than most people realize.

Hi, I’m helping support the event. I think that some mistranslation happened by a non-AI person. The event is about having humans get together and do prompt hacking and similar on a variety of models side-by-side. ScaleAI built the app that’s orchestrating the routing of info, model querying, and human interaction. Scale’s platform isn’t doing the evaluation itself. That’s being done by users on-site and then by ML and security researchers analyzing the data after the fact.

I think there's a mistake here which kind of invalidates the whole post. Ice cream is exactly the kind of thing we’ve been trained to like. Liking ice cream is very much the correct response.

Everything outside the training distribution has some value assigned to it. Merely the fact that we like ice cream isn’t evidence that something’s gone wrong.

I agree completely. This is a plausible explanation, but it’s one of many plausible explanations and should not be put forward as a fact without evidence. Unfortunately, said evidence is impossible to obtain due to OpenAI’s policies regarding access to their models. When powerful RLHF models begin to be openly released, people can start testing theories like this meaningfully.

Linear warm-up over the first 10% of training, then cosine decay to a minimum of one-tenth the peak LR which is set to occur at the end of training (300B tokens). Peak LRs vary by model but are roughly consistent with GPT-3 and OPT values. You can find all the config details on GitHub. The main divergence relevant to this conversation from mainstream approaches is that we use a constant batch size (2M) throughout scaling. Prior work uses batch sizes up to 10x smaller for the smallest models, but we find that we can train large batch small models without an... (read more)

This is really exciting work to see, and exactly the kind of thing I was hoping people would do when designing the Pythia model suite. It looks like you're experimenting with the 5 smallest models, but haven't done analysis on the 2.8B, 6.9B, or 12B models. Is that something you're planning on adding, or no?

I am really very surprised that the distributions don't seem to match any standard parameterized distribution. I was fully ready to say "okay, let's retrain some of the smaller Pythia models initialized using the distribution you think the weights come ... (read more)

2beren
Also, I meant to ask you, what does the learning rate schedule of these models look like? In a lot of the summary statistics plots we see either peaks and asymptotes and sometimes clear phase transitions between checkpoints 20 and 40, and I was wondering if this is related to the learning rate schedule somehow (end of warmup?)
2beren
We have done some preliminary analyses on these as well. Primary issue is just that these experiments take longer since the larger models take longer to instantiate from checkpoint (which adds up when there are 142 checkpoints). Am planning to run the same experiments on the larger models and update the post with them at some point however. I agree the distribution thing is weird and not what I was expecting. I have currently tried to fit to Gaussian, power law, logistic and none are super close in general. I have also tried general fits to generalised exponential functions of the form exp(kx^\alpha) where k and \alpha are free parameters but this optimization just tends to be numerically unstable and give bad results whenever I have tried it. Other people at Conjecture, following the PDLT book, have tried fitting the fourth order perturbative expansion -- i.e. exp(x^2 + \gamma x^4) which also runs into numerical issues. Maybe? I haven't studied Tensor programs in extreme detail but my understanding is that they assume Gaussian limits for their proofs. However, afaik muP does work in practice so maybe this isn't such a big deal? This is great to have clarified thanks! I'll tone down the disclaimer then and add the note about the new nomenclature.
4thomwolf
The Pythia models is an amazing source. This is a great tool and work. One experiment that could maybe help disentangle idiosyncrasies from robust behaviors would be to run these experiments with a pair of seeds on each model size. With the currently trained models this could maybe just involve plotting the exact same curves comparing the "deduplicated" versus "non deduplicated" trained models since dataset deduplication likely has a limited impact on the model averaged training dynamic of the weights as investigated here (there are obviously countless more experiments that could be added but this one is maybe an easy one).
3gwern
That was my own immediate response: "if these distributions are so universal, why doesn't this show that standard initializations suck, and that you should reverse-engineer the final distribution and initialize that way?" Either the model won't train or will train much slower, which suggests that the understanding or training setup here is totally wrong in some way; or it will train at the same speed, suggesting that the distributions are misleading and more like epiphenomena or side-effects of what is actually training/'doing the work' (which is still going on under the hood & just no longer visible in some crude summary statistics); or it will train much much faster, which is a huge optimization win and also verifies the importance of the initialization distribution being correct with all the theoretical implications thereof. Why doesn't Pythia let you do that? Sure, perhaps they aren't exactly a logistic or familiar power law or a convenient parametric function, but if you want to replicate the initialization distribution elsewhere, just do something nonparametrically like sample from a histogram/cdf or permute the parameters from a finished model, and then maybe train on some equivalent heldout text dataset to reduce any lottery-ticket weirdness. (Verify it does anything useful/interesting, and it won't be hard to find some flexible parametric distribution you can sample from if you need to; if there's one thing I know about the exponential family of distributions, it's that it has an awful lot of wiggly bois you've never heard of.)

This is excellent work, though I want to generically recommend caution when making assumptions about the success of such attacks based only on blackbox evaluations. Thorough analysis of false positive and false negative rates with ground-truth access (ideally in an adversarially developed setting) is essential for validation. [Sidebar: this reminds me that I really need to write up my analysis in the EleutherAI discord showing why prompt extraction attacks can be untrustworthy]

That said, this is really excellent work and I agree it looks quite promising.

Do you have a reference to the work you’re talking about? I’m doing some stuff involving fitting curves to activation tails currently.

2ryan_greenblatt
Unpublished and not written up. Sorry.

This is very interesting. The OP doesn’t contain any specific evidence of Gaussianness, so it would be helpful if they could provide an elaboration of what evidence lead them to conclude these are Gaussian.

StellaAthenaΩ6129

I’m not sure when you developed this work, but the LLM.int8 paper identifies outliers as an essential factor in achieving performance for models larger than 2.7B parameters (see Fig. 1 and Fig. 3 especially). There’s also some follow-up work here and here. Very curiously, the GLM-130B paper reports that they don’t see outlier features at all, or the negative effects of their lack of impact.

I’ve spoken with Tim (LLM.int8 lead author) about this a bit and some people in EleutherAI, and I’m wondering if there’s some kind of explicit or implicit regularizing e... (read more)

Answer by StellaAthena3-2

I think that the answer is no, and that this reflects a common mental barrier when dealing with gradient descent. You would like different experts to specialize in different things in a human-interpretable way, but Adam doesn’t care what you say you want. Adam only cares about what you actually write down in the loss function.

Generally, a useful line of thinking when dealing with lines of thought like this is to ask yourself if your justification for why something should happen already justifies something that is known to not happen. If so, it’s probably f... (read more)

11stuserhere
  Curious whether your high-level thoughts on these topics still hold or have changed.

What sources do you have for your claim that “large groups” of people believe this?

1Yair Halberstadt
I'm trying to find the ones that I saw a few years ago, but now all the results show up news articles about states banning various trans related therapies. It could be I'm misremembering (this was about 5 years ago), but I believe that I found this quite a common opinion at the time.

Hi! I recently trained a suite of models ranging from 19M to 13B parameters with the goal of promoting research on LLM interpretability. I think it would be awesome to try out these experiments on the model suite and look at how the results change as the models scale. If your code used the HF transformers library it should work more or less out of the box with my new model suite.

You can find out more here: https://twitter.com/AiEleuther/status/1603755161893085184?s=20&t=6xkBsYckPcNZEYG8cDD6Ag

2Fabien Roger
I launched some experiments. I'll keep you updated.

Individual MMMLU tasks are extremely noisy. They’re so noisy that the paper actually specifically recommends that you don’t draw conclusions from performance on individual tasks and instead look at four high level topical categories. The individual tasks also have extremely large variances in their variance. Some of them are pretty easy for a college educated adult, while others have genuine experts scoring less than 80%.

This is compounded by the fact that the sample sizes vary wildly. While many of the tasks have around 100 questions, while at the other e... (read more)

3alyssavance
See my response to Gwern: https://www.lesswrong.com/posts/G993PFTwqqdQv4eTg/is-ai-progress-impossible-to-predict?commentId=MhnGnBvJjgJ5vi5Mb In particular, extremely noisy data does not explain the results here, unless I've totally missed something. If the data is super noisy, the correlation should be negative, not zero, due to regression-to-mean effects (as indeed we saw for the smallest Gopher models, which are presumably so tiny that performance is essentially random). 
Answer by StellaAthena50

I agree with what Gwern said about things being behind-the-scenes, but it's also worth noting that there are many impactful consumer technologies that use DL. In fact, some of the things that you don't think exist actually do exist!

... (read more)
7Elizabeth
Google search gets less usable every year, even for Scholar, which has a much less adversarial search space. It's better for very common searches like popular tv shows, but approaching worthlessness for long tail stuff. Maybe this is just "search is hard", but improving the common case at the cost of the long tail is exactly what I'd expect AI search to do.

Interesting. Thank you.

To be clear, you now understand that the content of the sentence "I am a transgender man" is more or less "contrary to popular opinion, I am in fact a man and not a woman"? And that pronouns only even come up because they are one of the many ways people convey assessments of gender?

I'm not even going to pretend to address the first half of your comment. You're making extreme jumps of logic that are in no way justified by the conversation.

So that is the strong-request/demand that it's reasonable for people to get from "society".  (If people in power were unambiguously saying "In order to be polite and not be called bad, you must think of these people in a certain way", then I think there would be revolts.)  If someone hasn't become emotionally close friends with any trans people, I'd say it's not too surprising if they haven

... (read more)
1Said Achmiz
What are the extreme jumps of logic? I confess that I can see none, in the post you’re responding to. If you think otherwise, I should like to see you defend that claim.

You're talking as though there is some background you don't share with me, so I shall establish that background.

I tried googling "fired for not using pronouns", and the results page had news articles pointing to several different cases of that—usually teachers—as well as this page, seemingly written by lawyers, titled "What Can Employers Do About Employees Who Refuse To Refer To Transgendered Employees By Their Preferred Names Or Pronouns?".

The page basically recommends firing them; it says "Even if the employee has “for cause” protection through an employ... (read more)

What does the word "man" mean in the sentence "contrary to popular opinion, I am in fact a man and not a woman"? Given that popular opinion is, in fact, wrong about this, we should be able to describe some observation or experimental test where the man makes better predictions than the populace, right? What is it, specifically? (I think there are real answers to this, but I'm interested in what you think.)

As a cis person who has interacted occasionally with trans people for the past ten years, it literally never occurred to me until last year that what trans people were asking me to do was actually reconsider my impression of their gender! I sincerely thought they were just asking me to memorize a different word to call them. I will at least try out a "reconsidering" process the next time I regularly interact with a trans person IRL and see whether it works. (I have also never read about what kind of "reconsidering" processes work for people, but I have som

... (read more)
cata220

Basically, my experience went like this:

  1. I didn't know anyone trans or think about it at all.
  2. I moved to California and hung out with some rationalists and met some trans people IRL and online. I understood that it was polite to try to use whatever pronouns they preferred, decoupled from their physical appearance, so I did my best to do so, and other than that I continued to not think about it at all.
  3. After observing that it's hard to reliably remember to use pronouns that conflict with people's surface appearance to me, I adopted a "default to 'they', es
... (read more)

Not OP, but for what it's worth, I consider it unreasonable to request that other people think of you in a certain way (be it gender, or having personal traits or skills or anything), or at least for there to be any sense of expectation or obligation that they will fulfill such a request.  That would be actual thought-policing, and abhorrent to me.  It's reasonable to want people to think of you a certain way, to hope that they will, to take actions that will hopefully increase the likelihood of it, and possibly to only be close friends with peop... (read more)

To do this, we'll start by offering alignment as a service for more limited AIs. Value extrapolation scales down as well as up: companies value algorithms that won't immediately misbehave in new situations, algorithms that will become conservative and ask for guidance when facing ambiguity.

What are examples of AIs you think you can currently align and how much (order of magnitude, say) would it cost to have you align one for me? If I have a 20B parameter language model, can you align it for me?

2Stuart_Armstrong
Reach out to my cofounder (Rebecca Gorman) on linkedin.
StellaAthenaΩ0110

The distinction between "large scale era" and the rest of DL looks rather suspicious to me. You don't give a meaningful defense of which points you label "large scale era" in your plot and largely it looks like you took a handful of the most expensive models each year to give a different label to.

On what basis can you conclude that Turing NLG, GPT-J, GShard, and Switch Transformers aren't part of the "large scale era"? The fact that they weren't literally the largest models trained that year?

There's also a lot of research that didn't make your analysis, in... (read more)

6Jsevillamol
It is not feasible to do an exhaustive analysis of all milestone models. We necessarily are missing some important ones, either because we are not aware of them, because they did not provide enough information to deduce the training compute or because we haven't gotten to annotate them yet. Our criteria for inclusion is outlined in appendix A. Essentially it boils down to ML models that have been cited >1000 times, models that have some historical significance and models that have been deployed in an important context (eg something that was deployed as part of Bing search engine would count). For models in the last two years we were more subjective, since there hasn't been enough time for the more relevant work to stand out the test of time. We also excluded 5 models that have abnormally low compute, see figure 4. We tried playing around with the selection of papers that was excluded and it didn't significantly change our conclusions, though obviously the dataset is biased in many ways. Appendix G discusses the possible biases that may have crept in. 
8Jsevillamol
Great questions! I think it is reasonable to be suspicious of the large-scale distinction. I do stand by it - I think the companies discontinuously increased their training budgets around 2016 for some flagship models.[1] If you mix these models with the regular trend, you might believe that the trend was doubling very fast up until 2017 and then slowed down. It is not an entirely unreasonable interpretation, but it explains worse the discontinuous jumps around 2016. Appendix E discusses this in-depth. The way we selected the large-scale models is half intuition and half convenience. We compare the compute of each model to the log compute of nearby papers (within 2 years), and we call it large scale if its log compute exceeds 0.72 standard deviations of the mean of that sample. I think there is a reasonable case for including NASv3, Libratus, Megatron-LM, T5-3B, OpenAI Five, Turing NLG, iGPT-XL, GShard (dense), Switch, DALL-E, Pangu-α, ProtT5-XXL and HyperClova on either side of this division. Arguably we should have been more transparent about the effects of choosing a different threshold - we will try to look more into this in the next update of the paper.   1. ^ See appendix F for a surface discussion
3A Ray
I think that the authors at least did some amount of work to distinguish the eras, but agree more work could be done. Also I agree w/ Stella here that Turing, GPT-J, GShard, and Switch are probably better fit into the “large scale“ era.

1:  I expect that it's easier for authors to write longer thoughtful things that make sense;

I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don't want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it's a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it's easier to find 1,000 people to write 60 pages each than it is to find 10... (read more)

Hi! Co-author of the linked “exploration” here. I have some reservations about the exact request (left as a separate comment) but I’m very excited about this idea in general. I’ve been advocating for direct spending on AI research as a place with a huge ROI for alignment research for a while and it’s very exciting to see this happening.

I don’t have the time (or aptitude) to produce a really high quality dataset, but I (and EleutherAI in general) would be happy to help with training the models if that’s desired. We’d be happy to consult on model design or t... (read more)

9Beth Barnes
IMO Eleuther should probably spend more time doing things like this and less on scaling LMs
3Nicholas / Heather Kross
Can confirm: Eleuther is awesome, I don't know how to do any of this, but keep offering big prizes and I (and others) will follow them.

What is the purpose of requesting such extremely long submissions? This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest. There’s inher... (read more)

4Chris_Leong
  It's interesting to come across this comment in 2024 given how much things have changed already.
  • 1:  I expect that it's easier for authors to write longer thoughtful things that make sense;
  • 2:  MIRI doesn't just target the AI we have, it targets the AI we're afraid we'll get;
  • 3:  Present-day use-cases for dungeons are a long-range problem even if they're currently addressed with short-range technology.

Answer 1:  Longer is easier to write per-step.

Fitting a coherent story with interesting stuff going on into 100 steps, is something I expect to be much harder for a human author than fitting that story into 1000 steps.  Novels are ... (read more)

9plex
Strong upvote. The argument from training diversity seems plausible, but the key point is that when trying to point large amounts of effort at writing content having it be delivered in smaller chunks than a novel would allow many more people to risk putting in time and learn whether they can contribute, and ultimately raise quality and volume substantially. It will also make it much easier to build a collaborative project around this, as people could submit their work for community review without a review taking an extremely long time and large amount of effort. I'd also propose that the bounty be updated to allow smaller submissions relatively soon for higher visibility. MIRI could easily allow backward compatibility fairly easily by just accepting smaller submissions, without needing to reject longer ones. If the concern is the hassle of handing out lots of smaller bounties, MIRI could accept batches of small runs and let some trusted middle-man handle the details of the distribution.
4delton137
I think what you're saying makes a lot of sense. When assembling a good training data set, it's all about diversity. 

Also, I'm unclear on what constitutes a "run"... roughly how long does the text have to be, in words, to have a chance at getting $20,000?

Using the stated length estimates per section, a single run would constitute approximately 600 pages of single spaced text. This is a lot of writing.

Interesting… I was busy and wasn’t able to watch the workshop. That’s good to know, thanks!

For Sanh et al. (2021), we were able to negotiate access to preliminary numbers from the BIG Bench project and run the T0 models on it. However the authors of Sanh et al. and the authors of BIG Bench are different groups of people.

What makes you say BIG Bench is a joint Google / OpenAI project? I'm a contributor to it and have seen no evidence of that.

5RomanS
During the workshop presentation, Jascha said that the OpenAI will run their models on the benchmark. This suggests that there is (was?) some collaboration. But it was a half a year ago. Just checked, the repo's readme doesn't mention OpenAI anymore. In the earlier versions, it was mentioned like this:  So, it seems that OpenAI withdrew from the project, partially or fully.

I think that 4 is confused when people talk about "the GPT-3 training data." If someone said "there are strings of words found in the GPT-3 training data that GPT-3 never saw" I would tell them that they don't know what the words in that sentence mean. When an AI researcher speaks of "the GPT-3 training data" they are talking about the data that GPT-3 actually saw. There's data that OpenAI collected which GPT-3 didn't see, but that's not what the words "the GPT-3 training data" refers to.

2Daniel Kokotajlo
Ahhh, OK. Then perhaps I just was using inappropriate words; it sounds like what I meant to refer to by 4 was the same as what you meant to refer to by 3.
Answer by StellaAthena30

Or is it "Predict the next word, supposing what you are reading is a random-with-the-following-weights sample from dataset D? [where D is the dataset used to train GPT-3]

This is the correct answer.

The problem with these last two answers is that they make it undefined how well GPT-3 performs on the base objective on any prompt that wasn't in D, which then rules out psuedo-alignment by definition.

This is correct, but non-problematic in my mind. If data wasn’t in the training dataset, then yes there is no fact of the matter as to what training signal G... (read more)

2Daniel Kokotajlo
Why do you choose answer 3 instead of answer 4? In some sense answer 3 is the random weights that the developers intended, but answer 4 is what actually happened.

My thinking is that prosaic alignment can also apply to non-super intelligent systems. If multimodal GPT-17 + RL = superintelligence, then whatever techniques are involved with aligning that system would probably apply to multimodal GPT-3 + RL, despite not being superintelligence. Superintelligence is not a prerequisite for being alignable.

StellaAthenaΩ3100

If superintelligence is approximately multimodal GPT-17 plus reinforcement learning, then understanding how GPT-3-scale algorithms function is exceptionally important to understanding super-intelligence.

Also, if superintelligence doesn’t happen then prosaic alignment is the only kind of alignment.

6Rob Bensinger
Why do you think this? On the original definition of prosaic alignment, I don't see why this would be true. (In case it clarifies anything: my understanding of Paul's research program is that it's all about trying to achieve prosaic alignment for superintelligence. 'Prosaic' was never meant to imply 'dumb', because Paul thinks current techniques will eventually scale to very high capability levels.)

Strong upvote.

My original exposure to LW drove me away in large part because issues you describe. I would also add (at least circa 2010) you needed to have a near-deistic belief in the anti-messianic emergence of some AGI so powerful that it can barely be described in terms of human notions of “intelligence.”

Answer by StellaAthena30

Yes, new information absolutely exists. Thinking about new information in some kind of absolute sense (“has anyone else ever had this thought?”) is the wrong approach in my mind. What we are really interested in is new information relative to an established set of knowledge. Information theory tells us that there’s a maximum amount of information that can be encoded in k bits, and (at least as long as our system is significantly smaller than the universe) so we can find information that’s not encoded in the existing system.

Whether GPT-3 is likely to succeed at doing this is a statistical and empirical question, but at a minimum the answer to the title question is a resounding “yes.”

It’s interesting how Microsoft and NVIDIA are plugging EleutherAI and open source work in general. While they don’t reference EleutherAI by name, the Pile dataset used as the basis for their training data and the LM Evaluation Harness mentioned in the post are both open source efforts by EleutherAI. EleutherAI, in return, is using the Megatron-DS codebase as the core of their GPT-NeoX model architecture.

I think that this is notable because it’s the first time we’ve really seen powerful AI research orgs sharing infra like this. Typically everyone wants to d... (read more)

8gwern
It may just be the incentives. "Commoditize your complement". Nvidia wants to sell GPUs, and that's pretty much it; any services they sell are tightly coupled to the GPUs, they don't sell smartphones or banner ads. And Microsoft wants to sell MS Azure, and to a lesser extent, business SaaS, and while it has many fingers in many pies, those tails do not wag the dog. NV/MS releasing tooling like DeepSpeed, and being pragmatic about using The Pile since it exists (instead of spending scarce engineer time on making one's own just to have their own), is consistent with that. In contrast, Facebook, Google, Apple, AliBaba, Baidu - all of these sell different things, typically far more integrated into a service/website/platform, like smartphones vertically integrated from the web advertising down to the NN ASICs on their in-house smartphones. Google may be unusually open in terms of releasing research, but they still won't release the actual models trained on JFT-300M/B or web scrapes like their ALIGN, or models touching on the core business vitals like advertising, or their best models like LaMDA* or MUM or Pathways. Even academics 'sell' very different things than happy endusers on Nvidia GPUs / MS cloud VMs: prestige, citations, novelty, secret sauces, moral high grounds. Not necessarily open data and working code. * The split incentives lead to some strange behavior, like the current situation where there's already like 6 notable Google-authored papers on LaMDA revealing fascinating capabilities like general text style transfer... all of which won't use its name and only refer to it as "a large language model" or something. (Sometimes they'll generously specify the model in question is O(100b) parameters.)

Why is this problem better solved by systematically underpaying everyone as opposed to firing people who act “in favor of what advances their own power” or who promote infighting?

1Joe Collman
I think the essential point is that you're actually not underpaying them - in terms of their own utility gain (if they believe in the mission). You're only 'underpaying' them in terms of money. It's still not obviously the correct approach (externalities are an issue too), but [money != utility].
Load More