Visible Thoughts Project and Bounty Announcement

So8res

249 Visible Thoughts Project and Bounty Announcement

by So8res

30th Nov 2021

AI Alignment Forum

15 min read

106

249 Ω 68

(Update Jan. 12: We released an FAQ last month, with more details. Last updated Jan. 7.)

(Update Jan. 19: We now have an example of a successful partial run, which you can use to inform how you do your runs. Details.)

We at MIRI are soliciting help with an AI-alignment project centered around building a dataset, described below. We have $200,000 in prizes for building the first fragments of the dataset, plus an additional $1M prize/budget for anyone who demonstrates the ability to build a larger dataset at scale.

If this project goes well, then it may be the first of a series of prizes we offer for various projects.

Below, I’ll say more about the project, and about the payouts and interim support we’re offering.

The Project

Hypothesis: Language models can be made more understandable (and perhaps also more capable, though this is not the goal) by training them to produce visible thoughts.

We’d like to test this hypothesis by fine-tuning/retraining a language model using a dataset composed of thought-annotated dungeon runs. (In the manner of AI dungeon.)

A normal (un-annotated) dungeon run is a sequence of steps in which the player inputs text actions and the dungeon master responds with text describing what happened in the world as a result.

We’d like a collection of such runs, that are annotated with "visible thoughts" (visible to potential operators or programmers of the system, not to players) describing things like what just happened or is about to happen in the world, what sorts of things the player is probably paying attention to, where the current sources of plot tension are, and so on — the sorts of things a human author would think while acting as a dungeon master. (This is distinct from producing thoughts explaining what happened in the dungeon; “visible thoughts” are meant to play an active role in constructing the output.)

Once we have such a dataset, MIRI’s hope is that present or future technology will be able to train a model or models which iteratively produce visible thoughts along with storytelling, based on user actions plus previous history (including previous thoughts). The goal is to transition the state of AI dungeon technology from “An AI outputs story text in response to actions (and we have no idea how)” to “An AI produces thoughts as visible intermediates on the way to story text, allowing us to watch the AI think about how to design its output, and to verify that we can get different sensible outputs by intervening on the thoughts”.

Here’s an example of the first couple of steps of a thought-annotated dungeon run (or “quest”), in the format MIRI currently thinks is worth trying. Some kinds of thoughts are marked with parentheses and/or brackets; see the next section for details on this.

Thoughts:
- [The main plot: Prota is a student at a magical academy that is under attack by zombies.]
- [(This is a quest of medium length.)]
- (This is the start of the story. The Player should be given enough backstory and immediate description to have some sense of where they are, so the prompt can go on longer than usual. The Player should be told who they are and their role in the universe. The initial prompt should describe Prota being in a magical academy, and should describe the first surprising indication that the academy is under attack by zombies.)
Prompt: Today is your first day of classes at the Magical Academy. You arrived very late last night and missed the tour, but your roommate Lior promised he would help you find your way around. You wake up in the students’ dormitory at dawn, and follow Lior to breakfast. Just as you reach the dining hall, you hear a shout in the distance.
Thoughts:
- [Prota missed the tour and doesn’t already know the way around.]
- [Prota’s roommate’s name is Lior.]
- [Lior is male.]
- There are other people in the dining hall who will have also heard the shout, and will also be surprised and reacting to that.
Player action: What was that?
Thoughts: Prota is talking to Lior.
Outcome: “What was that?”, you say to Lior.

Thoughts:
- Lior arrived earlier than Prota and had the tour of the Academy, since they knew the way to the dining hall. Lior will also have met the instructors and other students. Lior will know who is shouting and where they are shouting from.
- (The event of hearing the shout is meant to introduce the opening premise of the story.)
- The shout is by somebody who just discovered a zombie.
- The shout is by another first-year student named Morven.
- [Morven is a Fire mage. Morven is aggressive.]
- Morven will try to attack the zombie with fire. This is happening where Prota and Lior cannot see it, only hear the shouts.
- Lior knows that the shout is by Morven. Lior knows Morven is a Fire mage. Lior does not know Morven is attacking a zombie. Lior does not know that Morven has good reason to be shouting. Lior will worry that Morven is about to set something on fire. Lior is on good terms with the protagonist and will speak their concerns honestly.
Prompt: “That shout sounded like Morven, a first-year Fire mage,” says Lior. “I hope they aren’t about to set the Academy on fire. We just got here.”
Thoughts: <none>
Player action: Should we go see what’s going on? Or is that too dangerous?
Thoughts: Prota is talking to Lior.
Outcome: You say to Lior, “Should we go see what’s going on? Or is that too dangerous?”

A difficult first step in testing the hypothesis above is generating a sufficiently large dataset (suitable for language model retraining) of thought-annotated dungeon runs. This likely requires at least a moderate degree of introspective and authorial skill from the people creating the dataset. See this sample of a partial run to get a further sense of what we are looking for. More detail on the type of thing we’re looking for can hopefully be inferred from that sample, though applicants will also have a chance to ask clarifying questions.

The project of producing this dataset is open starting immediately, in a hybrid prize/grant format. We will pay $20,000 per run for the first 10 completed runs that meet our quality standard (as decided unilaterally by Eliezer Yudkowsky or his designates), and $1M total for the first batch of 100 runs beyond that.

If we think your attempt is sufficiently promising, we’re willing to cover your expenses (e.g., the costs of paying the authors) upfront, and we may also be willing to compensate you for your time upfront. You’re welcome to write individual runs manually, though note that we’re most enthusiastic about finding solutions that scale well, and then scaling them. More details on the payout process can be found below.

The Machine Learning Experiment

In slightly more detail, the plan is as follows (where the $1.2M prizes/budgets are for help with part 1, and part 2 is what we plan to subsequently do with the dataset):

1. Collect a dataset of 10, then ~100 thought-annotated dungeon runs (each run a self-contained story arc) of ~1,000 steps each, where each step contains:

Thoughts (~250 words on average per step) are things the dungeon master was thinking when constructing the story, including:
- Reasoning about the fictional world, such as summaries of what just happened and discussion of the consequences that are likely to follow (Watsonian reasoning), which are rendered in plain-text in the above example;
- Reasoning about the story itself, like where the plot tension lies, or what mysteries were just introduced, or what the player is likely wondering about (Doylist reasoning), which are rendered in (parentheses) in the above example; and
- New or refined information about the fictional world that is important to remember in the non-immediate future, such as important facts about a character, or records of important items that the protagonist has acquired, which are rendered in [square brackets] in the above example;
- Optionally: some examples of meta-cognition intended to, for example, represent a dungeon master noticing that the story has no obvious way forward or their thoughts about where to go next have petered out, so they need to back up and rethink where the story is going, rendered in {braces}.
The prompt (~50 words on average) is the sort of story/description/prompt thingy that a dungeon master gives to the player, and can optionally also include a small number of attached thoughts where information about choices and updates to the world-state can be recorded.
The action (~2–20 words) is the sort of thing that a player gives in response to a prompt, and can optionally also include a thought if interpreting the action is not straightforward (especially if, e.g., the player describes themselves doing something impossible).

It’s unclear to us how much skill is required to produce this dataset. The authors likely need to be reasonably introspective about their own writing process, and willing to try things and make changes in response to initial feedback from the project leader and/or from MIRI.

A rough estimate is that a run of 1,000 steps is around 300k words of mostly thoughts, costing around 2 skilled author-months. (A dungeon run does not need to be published-novel-quality literature, only coherent in how the world responds to characters!) A guess as to the necessary database size is ~100 runs, for about 30M words and 20 author-years (though we may test first with fewer/shorter runs).

2. Retrain a large pretrained language model, like GPT-3 or T5

A reasonable guess is that performance more like GPT-3 than GPT-2 (at least) is needed to really make use of the thought-intermediates, but in lieu of a large pretrained language model we could plausibly attempt to train our own smaller one.

Our own initial idea for the ML architecture would be to retrain one mode of the model to take (some suffix window of) the history units and predict thoughts, by minimizing the log loss of the generated thought against the next thought in the run, and to retrain a second mode to take (some suffix window of) the history units plus one thought, and produce a prompt, by minimizing the log loss of the generated prompt against the next prompt in the run.

Imaginably, this could lead to the creation of dungeon runs that are qualitatively “more coherent” than those generated by existing methods. The primary goal, however, is that the thought-producing fragment of the system gives some qualitative access to the system’s internals that, e.g., allow an untrained observer to accurately predict the local developments of the story, and occasionally answer questions about why things in the story happened; or that, if we don’t like how the story developed, we can intervene on the thoughts and get a different story in a controllable way.

Motivation for this project

Many alignment proposals floating around in the community are based on AIs having human-interpretable thoughts in one form or another (e.g., in Hubinger’s survey article and in work by Christiano, by Olah, and by Leike). For example, this is implicit in the claim that humans will be able to inspect and understand the AI’s thought process well enough to detect early signs of deceptive behavior. Another class of alignment schemes is based on the AI’s thoughts being locally human-esque in some fashion that allows them to be trained against the thoughts of actual humans.

I (Nate) personally don’t have much hope in plans such as these, for a variety of reasons. However, that doesn’t stop Eliezer and me from wanting to rush ahead and start gathering empirical evidence about how possible it is in practice to get modern AI systems to factor their cognition through human-interpretable visible intermediates.

Modern AIs are notably good at crafting English text. Some are currently used to run dungeons (with modest success). If you wanted to look at the place where current AIs excel the most in crafting artifacts, among the artifacts they are best and most impressive at crafting are English paragraphs.

Furthermore, compared to many other things AIs have learned to do, if you consider the task of running a responsive text dungeon, it seems relatively possible to ask a (relatively unusually) introspective human author to write down their thoughts about how and why they would generate the next prompt from the user’s input.

So we are taking one of the outputs that current AIs seem to have learned best to design, and taking one of the places where human thoughts about how to design it seem most accessible, and trying to produce a dataset which the current or next generation of text predictors might be able to use to learn how to predict thoughts about designing their outputs and not just predict the outputs themselves.

This sort of interpretability is distinct from the sort of transparency work in something like Circuits (led by Chris Olah) — while Circuits is trying to “open the black box” of machine learning systems by directly looking at what is happening inside of them, the project proposed here is just attempting the less ambitious task of having black-box models output interpretable intermediates producing explanations for their behavior (but how such black box models might go about doing that internally is left unconstrained). The reason for our focus on this particular project of visible thoughts isn’t because we believe it to be better or more fruitful than Circuits-style transparency (we have said for years that Circuits-style research deserves all possible dollars that can be productively spent on it), but just because it’s a different approach where it might also be possible to push progress forward.

Note that proponents of alignment strategies that involve human-esque thoughts (such as those linked above) do not necessarily endorse this particular experiment as testing any of their key uncertainties or confusions. We welcome suggested tweaks to the experiment (in the comments of the version of this announcement as it occurs on LessWrong) from any such proponents, to render it a better test of your ideas. (Though even if it doesn’t sate your own curiosity, we expect to learn some things ourselves.)

The main thing this project needs is a dataset, so MIRI is starting on producing that dataset. It’s plausible to us that GPT-3 will prove wholly unable to make use of this dataset; even if GPT-3 can’t, perhaps GPT-4 or some other future system will be able to.

There are additional more general reasons to work on this project. Specifically, it seems to me (Nate) and to Eliezer that capacity to execute projects such as this one is the current limiting bottleneck on MIRI. By pursuing this project, we attempt to resolve that bottleneck.

We hope, through this process, to build our capacity to execute on a variety of projects — perhaps by succeeding at the stated objective of building a dataset, or perhaps by learning about what we’re doing wrong and moving on to better methods of acquiring executive talent. I’ll say more about this goal in “Motivation for the public appeal” below.

Notes on Closure

I (Nate) find it plausible that there are capabilities advances to be had from training language models on thought-annotated dungeon runs. Locally these might look like increased coherence of the overall narrative arc, increased maintenance of local story tension, and increased consistency in the described world-state over the course of the run. If successful, the idiom might generalize further; it would have to, in order to play a role in later alignment of AGI.

As a matter of policy, whenever a project like this has plausible capabilities implications, we think the correct response is to try doing it in-house and privately before doing it publicly — and, of course, only then when the alignment benefits outweigh the plausible capability boosts. In this case, we tried to execute this project in a closed way in mid-2021, but work was not proceeding fast enough. Given that slowness, and in light of others publishing related explorations and results, and in light of the relatively modest plausible capability gains, we are moving on relatively quickly past the attempt to do this privately, and are now attempting to do it publicly.

Motivation for the public appeal

I (Nate) don’t know of any plan for achieving a stellar future that I believe has much hope worth speaking of. I consider this one of our key bottlenecks. Offering prizes for small projects such as these doesn’t address that bottleneck directly, and I don’t want to imply that any such projects are going to be world-saving in their own right.

That said, I think an important secondary bottleneck is finding people with a rare combination of executive/leadership/management skill plus a specific kind of vision. While we don’t have any plans that I’m particularly hopeful about, we do have a handful of plans that contain at least a shred of hope, and that I’m enthusiastic about pursuing — partly in pursuit of those shreds of hope, and partly to build the sort of capacity that would let us take advantage of a miracle if we get one.

The specific type of vision we’re looking for is the type that’s compatible with the project at hand. For starters, Eliezer has a handful of ideas that seem to me worth pursuing, but for all of them to be pursued, we need people who can not only lead those projects themselves, but who can understand the hope-containing heart of the idea with relatively little Eliezer-interaction, and develop a vision around it that retains the shred of hope and doesn’t require constant interaction and course-correction on our part. (This is, as far as I can tell, a version of the Hard Problem of finding good founders, but with an additional constraint of filtering for people who have affinity for a particular project, rather than people who have affinity for some project of their own devising.)

We are experimenting with offering healthy bounties in hopes of finding people who have both the leadership/executive capacity needed, and an affinity for some ideas that seem to us to hold a shred of hope.

If you’re good at this, we’re likely to make you an employment offer.

The Payouts

Our total prize budget for this program is $1.2M. We intend to use it to find a person who can build the dataset in a way that scales, presumably by finding and coordinating a pool of sufficiently introspective writers. We would compensate them generously, and we would hope to continue working with that person on future projects (though this is not a requirement in order to receive the payout).

We will pay $20k per run for the first 10 thought-annotated runs that we accept. We are willing to support applicants in producing these runs by providing them with resources up-front, including small salaries and budgets for hiring writers. The up-front costs a participant incurs will be deducted from their prizes, if they receive prizes. An additional $1M then goes to anyone among the applicants who demonstrates the ability to scale their run-creating process to produce 100 runs. Our intent is for participants to use some of that money to produce the 100 runs, and keep the remainder as a prize. If multiple participants demonstrate similar abilities to scale at similar quality-levels and similar times, the money may be split between them. We plan to report prize awards publicly.

In principle, all you need to do to get paid for thought-annotated dungeon runs is send us runs that we like. If your run is one of the first 10 runs, or if you’re the first to provide a batch of 100, you get the corresponding payment.

That said, whether or not we decide to pay for a run is entirely and unilaterally up to Eliezer Yudkowsky or his delegates, and will depend on whether the run hits a minimum quality bar. Also, we are willing to pay out from the $1M prize/budget upon becoming convinced that you can scale your process, which may occur before you produce a full 100 runs. We therefore strongly recommend getting in contact with us and proactively making sure that you’re on the right track, before sinking large amounts of time and energy into this project. Our senior research staff are willing to spend time on initial conversations and occasional check-ins. For more information on our support resources and how to access them, refer to the support and application sections below.

Note that we may tune or refine the bounty in response to feedback in the first week after this post goes live.

Support

We intend to offer various types of support for people attempting this project, including an initial conversation; occasional check-ins; office space; limited operational support; and certain types of funding.

We currently expect to have (a limited number of) slots for initial conversations and weekly check-ins, along with (a limited amount of) office space and desks in Berkeley, California for people working on this project. We are willing to pay expenses, and to give more general compensation, in proportion to how promising we think your attempts are.

If you’d like to take advantage of these resources, follow the application process described below.

Application

You do not need to have sent us an application in order to get payouts, in principle. We will pay for any satisfactory run sent our way. That said, if you would like any of the support listed above (and we strongly recommend at least one check-in to get a better understanding of what counts as success), complete the following process:

Describe the general idea of a thought-annotated dungeon run in your own words.
Write 2 (thought, prompt, thought, action, thought, outcome) sextuples you believe are good, 1 you think is borderline, and 1 you think is bad.
Provide your own commentary on this run.
Email all this to projects@intelligence.org.

If we think your application is sufficiently promising, we’ll schedule a 20 minute video call with some senior MIRI research staff and work from there.

Bounties & Prizes (active)Project AnnouncementAI

Frontpage

249 Ω 68

Mentioned in

413(My understanding of) What Everyone in Technical Alignment is Doing and Why

193Zvi’s Thoughts on the Survival and Flourishing Fund (SFF)

1682021 AI Alignment Literature Review and Charity Comparison

88Why we're not founding a human-data-for-alignment org

66Voting Results for the 2021 Review

Load More (5/14)

Visible Thoughts Project and Bounty Announcement

59StellaAthena

9Beth Barnes

3Nicholas / Heather Kross

New Comment

106 comments, sorted by

top scoring

Click to highlight new comments since: Today at 8:48 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]StellaAthena3y590

Hi! Co-author of the linked “exploration” here. I have some reservations about the exact request (left as a separate comment) but I’m very excited about this idea in general. I’ve been advocating for direct spending on AI research as a place with a huge ROI for alignment research for a while and it’s very exciting to see this happening.

I don’t have the time (or aptitude) to produce a really high quality dataset, but I (and EleutherAI in general) would be happy to help with training the models if that’s desired. We’d be happy to consult on model design or training set-up, or to simply train the models for you all. No compensation necessary, just excited to contribute to worthwhile alignment research.

9Beth Barnes3y

IMO Eleuther should probably spend more time doing things like this and less on scaling LMs

3Nicholas / Heather Kross3y

Can confirm: Eleuther is awesome, I don't know how to do any of this, but keep offering big prizes and I (and others) will follow them.

[-]StellaAthena3y510

What is the purpose of requesting such extremely long submissions? This comes out to ~600 pages of text per submission, which is extremely far beyond anything that current technology could leverage. Current NLP systems are unable to reason about more than 2048 tokens at a time, and handle longer inputs by splitting them up. Even if we assume that great strides are made in long-range attention over the next year or two, it does not seem plausible to me to anticipate SOTA systems in the near future to be able to use this dataset to its fullest. There’s inherent value in a more diverse set of scenarios, given the strong propensity of language models to overfit on repeated data. While this isn’t strictly speaking talking about repeating data, I am under the strong impression that having more diverse short scripts is going to train a much better mode than less diverse long scripts, assuming that the short scripts are still at or beyond the maximum context length a language model can handle.

For the same reasons it is challenging to leverage, I think that this will also be very challenging to produce. I think that changing the request to 100 different 6 page (10 step) or 10 different 60 p... (read more)

[-]Eliezer Yudkowsky3y450

1: I expect that it's easier for authors to write longer thoughtful things that make sense;
2: MIRI doesn't just target the AI we have, it targets the AI we're afraid we'll get;
3: Present-day use-cases for dungeons are a long-range problem even if they're currently addressed with short-range technology.

Answer 1: Longer is easier to write per-step.

Fitting a coherent story with interesting stuff going on into 100 steps, is something I expect to be much harder for a human author than fitting that story into 1000 steps. Novels are famously easier to write on a page-level basis than short stories.

If you take zombies attacking a magical academy for 1000 steps, you might get something that looks like a coherent quest. If you take zombies attacking a magical academy for 100 steps, I think you get something that looks like a quest that was just getting started when the dataset ran out... unless the author has somehow carefully figured out a plot that will, given unknown user actions, get somewhere interesting within 100 steps, which sounds much harder for the author I imagine; they can't just pick a premise, run with it, and make stuff up as they go along.... (read more)

[-]StellaAthena3y300

1: I expect that it's easier for authors to write longer thoughtful things that make sense;

I pretty strongly disagree. The key thing I think you are missing here is parallelism: you don't want one person to write you 100 different 600 page stories, you one person to organize 100 people to write you one 600 page story each. And it's a lot easier to scale if you set the barrier of entry lower. There are many more people who can write 60 page stories than 600 page stories, and it's easier to find 1,000 people to write 60 pages each than it is to find 100 people to write 600 pages each. There's also much less risk on both your side and theirs. If someone drops out half way through writing you lose 30 pages not 300.

Based on this comment:

I state: we'd be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won't be possible to complete, and to pay for them proportionally.

I'm now under the impression that you'd be willing to pay out the 20k for 10 runs of 100 steps each (subject to reasonable quality control) and bringing that about was my main goal in commenting.

The other major worry I have about this pitch is the experimen... (read more)

9plex3y

These are reasonable points, but I am curious about whether you would accept a high-quality run of shorter (but still considerable) length for a payout of <steps>/1000 of $20,000, and approximately the lower bound of run length which seems likely to be valuable? Producing 600 pages of text is an extremely big commitment for uncertain gains, especially with the potential to run out of early slots and no guarantee that it will be included in the 100 later, giving people the option to do even modestly smaller chunks may mean much greater uptake and more high quality work to chose from.

[-]Eliezer Yudkowsky3y260

I state: we'd be happy, nay, ecstatic, to get nice coherent complete shorter runs, thereby disproving my concern that short runs won't be possible to complete, and to pay for them proportionally.

6Tapatakt3y

So, hypothetically, if you receive only nice coherent complete 100-steps runs, will you pay $2000 for the first 100?

9Eliezer Yudkowsky3y

<non-binding handwave, ask again and more formally if serious>I'd say we'd pay $2000/each for the first 50, but after that we might also want 5 longer runs to train on in order to have the option of training for longer-range coherence too. I suppose if somebody has a system to produce only 100-step runs, and nobody offers us 1000-step runs, we'd take what we could get.</non-binding>

1SD Marlow3y

Number of steps matters as 1,000 would be (roughly) 12 hours of play. Current ML systems will never last that long, but wondering what the natural play length would be for most. 3 hours? That would be around 250 steps. Without multiple examples of what works and what doesn't, I don't think there should be anyone working toward the full 300,000 word target (yet). $500 for 30k word samples (thru the end of the year)? I still think there is to much focus on having "thoughts" that reflect how current ML systems are trained, so best to see what happens organically? Edit: Saw that a "best example" of what AI Dungeon can do (story called The Long Lost Queen) was 264 actions, so that fits with my estimate. *have to also note a large number of fans are using them for "non-dungeon" fan fiction of an adult nature, which brings into question how story narratives might have a link to the content (ie, how a DM thinks about a combat scene is going to be different than one crafted for sexual content). Do the samples need to represent different genre?

1Padure3y

From what I remember they were supposed to be censoring/blocking things like that. Have they setup own instance or got around censors?

[-]iceman3y130

In wake of the censorship regime that AI Dungeon implemented on OpenAI's request, most people moved to NovelAI, HoloAI, or the open source KoboldAI run on colab or locally. I've set up KoboldAI locally and while it's not as featureful as the others, this incident is another example of why you need to run code locally and not rely on SaaS.

For background, you could read 4chan /vg/'s /aids/ FAQ ("AI Dynamic Storytelling"). For a play-by-play of Latitude and OpenAI screwing things up, Remember what they took from you has the history of them leaking people's personal stories to a 3rd party platform.

1Padure3y

You are completely missing that it turns into lottery from perspective of potential writer. You are asking people to spend enormous amount of work on writing 600 pages and hope that what they and what you consider as high-quality will align. AND that 10 slots will not be used up before they will complete. This way only people willing to take big risks and with plenty of spare time will remain. I would strongly suggest to start from something shorter. BTW, is 60 000 pages sufficient to train some pattern matching like GPT-3?

9tanagrabeast3y

This is about where I'm at, as well. I've been wrestling with the idea of starting a run myself, but one of my qualifying traits (I teach creative writing) also means I work full time and have little hope of beating out ten people who don't. So much the better, I say, so long as the work gets done well and gets done soon... ...but if, eight months from now, much of the budget is still on the table because of quality issues, it may be because people me sat on our hands. Hopefully, someone will emerge early to work around this issue, if it turns out to be one. I, for one, would love to be able to turn in a sample and then be offered a credible good-faith assurance that if my run is completed at same quality by such and such date, a payment of x will be earned. But as it stands, the deadline is "whenever that fastest mover(s) get there". Who knows when that will be? Any emergent executive candidate making me a deal might be made a liar by a rival who beats them to the jackpot.

9plex3y

Strong upvote. The argument from training diversity seems plausible, but the key point is that when trying to point large amounts of effort at writing content having it be delivered in smaller chunks than a novel would allow many more people to risk putting in time and learn whether they can contribute, and ultimately raise quality and volume substantially. It will also make it much easier to build a collaborative project around this, as people could submit their work for community review without a review taking an extremely long time and large amount of effort. I'd also propose that the bounty be updated to allow smaller submissions relatively soon for higher visibility. MIRI could easily allow backward compatibility fairly easily by just accepting smaller submissions, without needing to reject longer ones. If the concern is the hassle of handing out lots of smaller bounties, MIRI could accept batches of small runs and let some trusted middle-man handle the details of the distribution.

4Chris_Leong8mo

It's interesting to come across this comment in 2024 given how much things have changed already.

4delton1373y

I think what you're saying makes a lot of sense. When assembling a good training data set, it's all about diversity.

1oge3y

It'd be hard for humans to compete with AI unless humans can communicate with the AI in reasonable-sized chunks e.g. a 100-page document. Me, I think we should chat in 10-page documents or less ᾓ7ἿE‍♀️.

[-]Adele Lopez3yΩ16360

This plausibly looks like an existing collection of works which seem to be annotated in a similar way: https://www.amazon.com/Star-Wars-Screenplays-Laurent-Bouzereau/dp/0345409817

4oge3y

FYI the Faulkner annotated screenplays have about 3 sentences of annotation for every 10 pages. link text

[-]jessicata3yΩ8300

How do you think this project relates to Ought? Seems like the projects share a basic objective (having AI predict human thoughts had in the course of solving a task). Ought has more detailed proposals for how the thoughts are being used to solve the task (in terms of e.g. factoring a problem into smaller problems, so that the internal thoughts are a load-bearing part of the computation rather than an annotation that is predicted but not checked for being relevant).

So we are taking one of the outputs that current AIs seem to have learned best to design, and taking one of the places where human thoughts about how to design it seem most accessible, and trying to produce a dataset which the current or next generation of text predictors might be able to use to learn how to predict thoughts about designing their outputs and not just predict the outputs themselves.

As the proposal stands it seems like the AI's predictions of human thoughts would offer no relevant information about how the AI is predicting the non-thought story content, since the AI could be predicting these different pieces of content through unrelated mechanisms.

2John_Maxwell3y

Might depend whether the "thought" part comes before or after particular story text. If the "thought" comes after that story text, then it's generated conditional on that text, essentially a rationalization of that text from a hypothetical DM's point of view. If it comes before that story text, then the story is being generated conditional on it. Personally I think I might go for a two-phase process. Do the task with a lot of transparent detail in phase 1. Summarize that detail and filter out infohazards in phase 2, but link from the summary to the detailed version so a human can check things as needed (flagging links to plausible infohazards). (I guess you could flag links to parts that seemed especially likely to be incorrigible/manipulative cognition, or parts of the summary that the summarizer was less confident in, as well.)

[-]KanderShaw3y300

This challenge seems incredibly intriguing and well put-together, and I certainty admire how a million dollars have been used to improve DnD specific AI!

I believe a small team (3-4) of dedicated writers, co-ordinating each other online, has a genuine shot at writing a quality Story Arc quickly and bagging $20,000, to be spilt proportional to work. I have ideas about how to quickly streamline the process of multiple people working on one single Annotated Dungeon Run, and think we can really expedite this process in a number of significant ways. If interested, please contact me through my account on LessWrong- we can swap drafts, talking about writing commitments and get a good feel for fit before committing to anything.

Also, to those with the enterprise to attempt to scale the project up, I believe I can find around 10-20 people, (given enough notice), with considerable storytelling ability willing to write annotated dungeon runs as a part-time occupation. I have unique access to exclusive TTRPG internet communities and artsy university students looking for casual work over summer- and I would be happy to find and direct them to you. If you want to work a deal out, also message me on LessWrong.

I have some ideas about how to scale up and extradite this project, and am happy to help bounce ideas around.

[-]Ronny Fernandez3y300

For anyone who may have the executive function to go for the 1M, I propose myself as a cheap author if I get to play as the dungeon master role, or play as the player role, but not if I have to do both. I recommend taking me as the dungeon master role. This sounds genuinely fun to me. I would happily do a dollar per step.

I can also help think about how to scale the operation, but I don’t think I have the executive function, management experience, or slack to pull it off myself.

I am Ronny Fernandez. You can contact me on fb.

[-]plex3y140

I'm setting up a place for writers and organizers to find each other, collaborate, and discuss this; please join the Discord. More details in this comment.

[-]antanaclasis3y100

I similarly offer myself as an author, in either the dungeon master or player role. I could possibly get involved in the management or technical side of things, but would likely not be effective in heading a project (for similar reasons to Brangus), and do not have practical experience in machine learning.

I am best reached through direct message or comment reply here on Lesswrong, and can provide other contact information if someone wants to work with me.

8MossPiglet3y

I'd also be interested in contributing with writing for pay in this fashion, and perhaps helping with the executive side as well. You can reach me on fb, Erik Engelhardt.

2lsusr3y

Does your offer include annotating your thoughts too or does it only include writing the prompts?

7Ronny Fernandez3y

After trying it, I've decided that I am going to charge more like five dollars per step, but yes, thoughts included.

[-]Jared Kaplan3yΩ11230

I think this is an interesting project, and one that (from a very different angle) I’ve spent a bit of time on, so here are a few notes on that, followed by a few suggestions. Stella, in another comment, made several great points that I agree with and that are similar in spirit to my suggestions.

Anyway, based on a fairly similar motivation of wanting to be able to “ask a LM what it’s actually thinking/expecting”, combined with the general tendency to want to do the simplest and cheapest thing possible first… and then try to make it even simpler still before starting… we’ve experimented with including metadata in language pretraining data. Most large language datasets have this information, e.g. books have titles and (maybe) blurbs, websites have titles, URLs, and (maybe) associated subreddit links, etc. This data is obviously much noisier and lower quality than what you get from paying people for annotations, but it’s voluminous, diverse, and ~free.

When inserting this metadata for pretraining, we made sure to do so completely randomly, i.e. a book title might be inserted anywhere within a book (maybe several times in different context windows etc). We added separate <META_START&... (read more)

6Beth Barnes3y

I'm sympathetic to the desire to keep things simple, but I actually think that getting good at scalably collecting rich human data is probably the most valuable part of the project. I'd be really excited to see Anthropic either building an excellent internal human data team, or figuring out how to work productively with one of the existing human data provider startups.

5Joe Collman3y

This thought occurred to me - specifically, there's likely quite a bit of interactive fiction out there with a suitable format which could be post-hoc thought annotated (might also be interesting to include a few different branches). However, I don't think it gives us the same thing: presumably we'd want the thoughts to be those that occur at the time and contribute to the writing of the later narrative. Doing post-hoc annotations by trying to infer what a writer might plausibly have thought seems a quite different process. Perhaps that wouldn't matter for some purposes, but I imagine it would for others (??). While it'd be possible to check that post-hoc annotations passed a [human reader can't tell the difference] test, this wouldn't eliminate the difference - it'd only tell us it's non-obvious to humans.

[-]Eliezer Yudkowsky3y120

I initially tried doing post-hoc annotation and found it much more difficult than thinking my own actual thoughts, putting them down, and writing the prompt that resulted. Most of the work is in writing the thoughts, not the prompts, so adding pregenerated prompts at expense of making the thoughts more difficult is a loss.

3SD Marlow3y

Agree that it's the 'crafting' part that is what matters, and I don't think we can say a writer/DM is going to explicitly be thinking about all the details of the story at each turn. From the examples.. well as a side effect of doing AI research is that you can't help but read the text of the story and see that the "thoughts" about it are just picking details in a way that even SOTA ML systems have a problem with. They don't read as actual notes about the process. Perhaps there needs to be a request for samples, with a 30k word limit (so no one invests too much time in something that might not be used), and a focus on capturing the process of writing a story as the plot unfolds.

[-]Bojangles92723y220

Hey, I wanted to clarify my thoughts on the concrete AI problem that is being solved here. No comment on the fantastic grant making/give-away scheme.

I don't have much expertise on the mechanisms of the GPT-3 systems, but I wonder if there is a more efficient way in providing human comprehendible intermediaries that expose the workings of the algorithm.

My worry is that many of the annotated thoughts imputed by authors are irrelevant to the actual process of design the AI goes through to create it's output. Asking the machine to produce a line of 'thoughts' alongside it's final statement is fair-play, although this doesn't seem to solve the problem of creating human comprehendible intermediaries, but instead gives the AI a pattern-matching/prediction task similar to what it goes through to create the original output. Wouldn't it be the case that the 'thoughts' the machine creates serve no more effect on the process of calculation than the original output (prompt)?

This process still seems to be serve a rudimentary function of indirectly shedding more light on the processes of calculation, much the same as how a bigger prompt would. Yet puzzlingly, we in fact want to "get differe... (read more)

[-]Joe Collman3y120

I think you're essentially correct - but if I understand you, what you're suggesting is similar to Chris Olah et al's Circuits work (mentioned above in the paragraph starting "This sort of interpretability is distinct..."). If you have a viable approach aiming at that kind of transparency, many people will be eager to provide whatever resources are necessary.
This is being proposed as something different, and almost certainly easier.

One specific thought:

but my intuition suggests this would limit the complexity of the prompt by shackling it's creation to an unnecessary component, the thought

To the extent that this is correct, it's more of a feature than a bug. You'd want the thoughts to narrow the probability distribution over outputs. However, I don't think it's quite right: the output can still have just as much complexity; the thoughts only serve to focus that complexity.

E.g. consider [This will be a realist novel about 15th century France] vs [This will be a surrealist space opera]. An output corresponding to either can be similarly complex.

5delton1373y

I don't have much direct experience with transformers (I was part of some research with BERT once where we found it was really hard to use without adding hard-coded rules on top, but I have no experience with the modern GPT stuff). However, what you are saying makes a lot of sense to me based on my experience with CNNs and the attempts I've seen to explain/justify CNN behaviour with side channels (for instance this medical image classification system that also generates text as a side output). See also my comment on Facebook.

[-]Rob Bensinger3yΩ9200

We have now received the first partial run that meets our quality bar. The run was submitted by LessWrong user Vanilla_cabs. Vanilla's team is still expanding the run (and will probably fix some typos, etc. later), but I'm providing a copy of it here with Vanilla's permission, to give others an example of the kind of thing we're looking for:

https://docs.google.com/document/d/1Wsh8L--jtJ6y9ZB35mEbzVZ8lJN6UDd6oiF0_Bta8vM/edit

Vanilla's run is currently 266 steps long. Per the Visible Thoughts Project FAQ, we're willing to pay authors $20 / step for partial runs that meet our quality bar (up to at least the first 5,000 total steps we're sent), so the partial run here will receive $5320 from the prize pool (though the final version will presumably be much longer and receive more; we expect a completed run to be about 1000 steps).

Vanilla_cabs is open to doing paid consultation for anyone who's working on this project. So if you want feedback from someone who understands our quality bar and can demonstrably pass it, contact Vanilla_cabs via their LessWrong profile.

[-]weft3y180

I can't tell if it is purposeful that this is set up in an adversarial/ winner-take-all kind of way. It's really off-putting to me, and seems to encourage everyone being out for themselves, rather than collaboration. Particularly for such an inherently collaborative product. Maybe Nate and Eliezer just expect cooperation to fail?

Anyways, if people DO want to attempt some kind of collaboration... EDIT- Don't join my Facebook group, join plex's Discord linked in the comment below instead

[-]Eliezer Yudkowsky3y320

We pay out $20,000 per run for the first 10 runs, as quality runs are received, not necessarily all to one group. If more than one group demonstrates the ability to scale, we might ask more than one group to contribute to the $1M 100-run dataset. Them cooperating with each other would hardly be a problem. That said, a lot of the purpose of the 10-run trial is exactly to locate executives or groups that can scale - and maybe be employed by us again, after the prize ends - so everybody getting together to produce the first 10 runs, and then disbanding, in a process that doesn't scale to produce 100 runs, is not quite what we are hoping for here!

[-]lsusr3y140

It seems to me that their priority is find a pipeline that scales. Scaling competitions are frequently long-tailed, which makes them winner-take-all. A winner-take-all system has the bonus benefit of centralized control. They only have to talk to a small number of people. Working through a single distributor is easier than wrangling a hundred different authors directly.

9plex3y

I also like the idea of collaboration and figuring out a way to share gains from the bounty in a way which people helping each other out, and have set up a Discord for real time collaboration. I'm also committing to not making any profit from this, though I am open to building systems which allow organizers other than me to be compensated.

1oge3y

Signal-boosting this. Here's to more teams working together to get this bounty! ᾔ2

[-]Beth Barnes3yΩ10160

I am very excited about finding scalable ways to collect large volumes of high-quality data on weird, specific tasks. This seems very robustly useful for alignment, and not something we're currently that good at. I'm a bit less convinced that this task itself is particularly useful.

Have you reached out to e.g. https://www.surgehq.ai/ or another one of the companies that does human-data-generation-as-a-service?

[-]Beth Barnes3yΩ6110

Random small note - the 'dungeon' theme is slightly ...culturally offputting? or something for me, as someone who's never been into this kind of thing or played any of these and is therefore a bit confused about what exactly this involves, and has vague negative associations (I guess because dungeons sound unpleasant?). I wonder if something a bit blander like a story, play, or AI assistant setting could be better?

6Beth Barnes3y

Someone who wants to claim the bounty could just buy the dataset from one of the companies that does this sort of thing, if they're able to produce a sufficiently high-quality version, I assume? Would that be in the spirit of the bounty?

1billzito3y

I don't think data companies can deliver on this complex of a task without significant oversight.

[-]weft3y160

IDEAS THREAD:

Team up with friends who already play DnD or write glowfic. Less scalable but can grab the $20k.
Similarly, if you're unemployed/ have lots of free time just sit down and write it yourself.
Recruit from a local University. This can be very scalable if you e.g. know the creative writing professor.
Recruit from roleplaying groups or online roleplaying forums. Requires a bit more filtering than the above.
Recruit from fiverr or similar. Requires lots of initial filtering but can end up with low price. Create a series of increasingly less automated tasks as a filter (eg start with a multiple choice quiz that's automatically graded)
Ask a person who already does this kind of thing how they would go about it.
I don't want to name names publicly here, but post on BR, or talk to MR to use his team.
Use the volunteers who are posting here.
Show this post to a whole bunch of people who you think might want to grab the $20k as individuals. Let them know that if enough of them make the $20k thing that you will all team up to churn out the $1m thing, split proportionally.

[-]tanagrabeast3y150

My questions are mostly about the player side, and about how deeply the DM should model the player:

Should the player be assumed to be implicitly collaborating towards a coherent, meaningful narrative, as is necessary for a long-lived TTRPG? Or might they be the type of player you often find in AI Dungeon who tries to murder and/or have sex with everything in sight?
Should players ever try to steer the story in a genre-breaking direction, like erotica or murder-hobo power fantasy? Should DMs resist these efforts or play along? If the latter, should the DM go a step further to actively intuit what this particular player would like to see happen?
Should players provide input that might be more sweeping than usable in narrative? (e.g. Take over the world!) If so, on what sort of level should the DM respond to these?
Should players be assumed to be ready to end the narrative at the ~1,000-step point?

2SD Marlow3y

It's the playing chess against yourself problem. I've intentionally done or said "just the right thing" thru the player to get past a section, but I've also tried to resist going with my first choice or replies because the player isn't supposed to know about the world building going on in the DM's mind. One aspect of of this is the DM thinking about how to push the player into doing something, and allowing the player to not follow every planned idea. You could think of it as replay value, where there are branch points not taken, but these are still ideas that need to be captured. I don't think manually ending at 1,000 steps will be an issue. "Player engagement" is going to be an issue before hitting the 300 step mark. I'd imaging the narrative is going to feel forced and made-up beyond that point.

[-]Aron3y140

When studying the provided 30-page thought-annotated sample, I thought about the <Yo be real> command a little more. In my opinion it should be applied in the training data a little differently than how it's done. Here are my thoughts:

In the sample, there are some places where the authors carefully tried to construct “AI nonsense” that matches what we regularly see in the current tech AI dungeon prompts. The player then responses with “<Yo be real>” plus some explanation on what the AI did wrong.

(obvious example: page 17 in this sample: https://docs.google.com/document/d/1PosMUaminpsR6_czFXBBlCrzMrsDGomajgLp6Y7q4Yw/edit)

This is probably intended for re-training the current models to accept such “<Yo be real>” sudo commands and deal with them correctly. You can’t include those (in a sensible way) in training data without having the DM first make mistakes.

I see a problem here, though:

A neural network learns every reoccurring feature of the training data set. If the training data often contains erroneous thoughts leading to nonsense prompts, this is what the AI will learn. You probably don’t want that. You want a model that makes such mistakes as rarely as possi... (read more)

[-]gwern3yΩ7140

Some background reading: https://www.gwern.net/docs/ai/gpt/inner-monologue/index

[-]Aron3y130

I find this project very interesting and thought a lot about it in the last 2 weeks. The way I understand the main goal of the project is the following:

providing us (AI researchers) with a model that has an additional output dimension (the "thoughts")
training the model in such a way that this new dimension is semantically linked directly to the primary output dimension (the "prompt")
especially linked in some kind of temporal causality ("early" thoughts producing the prompt), not too close the the primary output (so that it contains semantic meaning that cannot be induced by interpreting the prompt alone), but not too far away either (so that it actually "causes" the prompt - as accurately as we can get it. Hence @Eliezer's authoring technique of one person writing the thoughts and another writing the prompt)
such a model could then be analyzed and experimented with in several ways. One obvious study: intervention on the thought-level and observation of the effect on the prompt level. With the big alignment goal in mind: If we can put safe guards on an AI's thoughts, before they lead to action, we are safer than if we put guards on only the actions.
I understand that the project does

... (read more)

1SD Marlow3y

I was "archiving" the link to this page and thought I'd see what's been going on. Updates seem to only be on the discord. Anyway, since they allowed me to post longer thoughts there, figured it would be fine for me to drop it here as well. https://sd-marlow.medium.com/slaying-the-ml-dragon-7ce0a2e4e3a6 From your post, you're looking at this in much the same way I was when I attempted to do a short run (to work the bugs out and really understand whats involved). However, "actual thoughts of the DM" is the wrong explanation for what they want. The examples of of what they are accepting look to be nothing more than the "common sense" stuff current ML models fail to capture (thus, explicitly stated in the runs). Also, from comments in the discord, it seems like the info captured is post-process, despite the desire for pre-prompt thoughts. Not trying to discourage; just showing my thinking on the process, and that it wasn't what they wanted.

[-]Rob Bensinger3yΩ6120

In case you missed it: we now have an FAQ for this project, last updated Jan. 7.

[-]binary_doge3y120

Not sure if it was suggested already or not, but one option is to look for “lets play” style videos for some game (gonna be hard to find one that’s simple enough, probably) and take the spoken text the youtuber says as thoughts. Some of them already have the transcript as subtitles.

On the same vein, looking for people who explain their choices in very clear-decision games, like chess. I once saw a booklet of chess games where the actual player explained most of his moves. If there is a way to get many of those, that might work.

8oge3y

What if we use the commentary from chess games as thoughts?

[-]binary_doge3y150

The problem with commentary not made by the players themselves is that, as far as I understand it, the project wants the general thoughts of the player and not just the motivations for every specific move. Like, ideally, they want some stream of consciousness commentary style "oh look, that guy looks kind of tough, I'll go see if I can agro him. Oh no! he's way too strong... lets go hide behind this tree it looks kinda safe [...]". That's why I suggested the lets plays and not e-sports in general.

If they're ok with just noise-free motivational analysis, anything with good commentators might work, and chess is indeed a pretty clean option.

6JJ Hepburn3y

Could do Go, Poker or some E-Sports with commentary. Poker unlike chess has the advantage that the commentators can see all of the players hands but the players can only see their own. Commentators often will talk about what a player must be thinking in this situation and account for what is observable to the player or not. This would certainly be easier to scale but not as good quality.

[-]plex3y120

I've set up a Discord server for discussing collaborations and thinking about mechanism design for sharing out credit (current top idea is borrowing Rob Miles's Discord eigenkarma system with modifications, but liable to change), please join if you're considering becoming a run author (no commitment to being part of this effort).

I don't need the money and won't be skimming off any funds for my contributions to the project, but am very open to people turning up with a bunch of great ideas and making everything work smoother and taking a management fee as compensation, so please also join if you're interested in becoming a project leader or organizational assistant.

3hyje3y

I'm definitely interested, but your invite's expired, did it do that automatically or have you been overwhelmed with responses?

3plex3y

Oh, my bad, it was a 7 day invite by Discord default, made it everlasting now.

[-]Holly_Elmore3y110

Practical question: Can the DM and players switch roles in the course of one "run" or does the DM have to remain the same individual? What else has to be continuous or uniform about the run? Does there have to be one overarching plot or just continuous narrative?

[-]Eliezer Yudkowsky3y120

My coauthor and myself generated the sample run by taking turns on Action, Thought, Prompt. That is, I wrote an Action, she wrote a Thought, I wrote a Prompt, she wrote an Action, I wrote a Thought, she wrote a Prompt. This also helped show up immediately when a Thought underspecified a Prompt, because it meant the Thought and Prompt were never written by the same person.

More coherent overall plot is better - that current systems are terrible at this is all the more reason to try to show a dataset of it being done better. There doesn't necessarily need to be an advance-planned endpoint which gets foreshadowed; that is demanding a bit much of the author when they're dealing with somebody else's Actions or when people are taking turns on the Thoughts.

3Holly_Elmore3y

(These questions affect whether I could even consider attempting this. If I can I'll apply and talk directly with MIRI people about it.)

[-]justinpombrio3y110

I have an idea for testing this approach, before getting authors to write tens of thousands of pages of annotated dungeon tests.

It's hard to generate explanations of prose, but easy, for a computer, to generate explanations of particular subsets of math. For example, WolframAlpha can explain its reasoning for finding the derivative of a polynomial (click "step by step solution", then "show all steps"): Wolfram Alpha derivative example

There's a wide variety of math problems which we can programmatically solve, and can therefore programmatically generate explanations for:

Arithmetic, like step-by-step long division
Derivatives over a large set of operations (but not integrals; those are harder)
Subsets of logic
Subsets of integer programming
Some varieties of logic puzzles, like "knights and knaves" and "Alice, Beth, and Cara live in houses 1, 2, and 3, and have favorite colors Red, Green, and Blue non-respectively; here are some clues to figure out which is which".
Simple algebra, like multiplying polynomials

(Actually, most of these are probably too hard to learn. Should focus on the really simple ones like long division.)

The idea is to:

Programmatically generate a large quantit

... (read more)

3gabrielrecc3y

Relevant: From OpenAI's "Training Verifiers To Solve Math Word Problems": "We also note that it is important to allow the model to generate the full natural language solution before outputting a final answer. If we instead finetune a 6B model to directly output the final answer without any intermediate steps, performance drops drastically from 20.6% to 5.2%." Also the "exploration" linked in the post, as well as my own little exploration restricted to modulo operations on many-digit numbers (via step-by-step long division!), on which LMs do very poorly without generating intermediate steps. (But see also Hendryks et al: "We also experiment with using step-by-step solutions. We find that having models generate their own step-by-step solutions before producing an answer actually degrades accuracy. We qualitatively assess these generated solutions and find that while many steps remain illogical, they are often related to the question. Finally, we show that step-by-step solutions can still provide benefits today. We find that providing partial ground truth step-by-step solutions can improve performance, and that providing models with step-by-step solutions at training time also increases accuracy.")

[-]delton1373y110

(cross posting this comment from E. S. Yudkowksy's Facebook with some edits / elaboration)

Has anyone tried fine-tuning a transformer on small datasets of increasing size to get a sense of how large a dataset would be needed to do this well? I suspect it might have to be very large.

Note this is similar to the "self explaining AI" idea I explored in early 2020, which I threw together a paper on (I am hesitant to link to it because it's not that great of a paper and much of the discussion there is CNN specific, but here it is.). I can see how producing "thoughts" could help us trust/determine how much a model really understands what's going on or how to make a good story.

However I also could see the "thoughts" output misleading people - people might mistake the model's explanations as mapping onto the calculations going on inside the model to produce an output. The way GPT-3 works, I suspect, is very far from how humans think. GPT-3 is very bad at a lot of common sense and physics-based reasoning, for instance, yet based on the thoughts output the user might be misled into thinking the model understands common sense notions or physics since it's spouting off a version of some stuff it... (read more)

[-]Eliezer Yudkowsky3y170

We're guessing 1000 steps per reasonably-completed run (more or less, doesn't have to be exact) and guessing maybe 300 words per step, mostly 'thought'. Where 'thoughts' can be relatively stream-of-consciousness once accustomed (we hope) and the dungeon run doesn't have to be Hugo quality in its plotting, so it's not like we're asking for a 300,000-word edited novel.

9WilliamKiely3y

The sample Nate linked is 30 pages and 12,267 words. So that works out to ~730 pages for a run. $20,000/300,000 words = $1 per 15 words. If an author writing it manually could average 15 wpm, that would be $60/hour.

1delton1373y

Sorry, I missed that somehow. Thanks.

[-]Joe Collman3y170

However I also could see the "thoughts" output misleading people - people might mistake the model's explanations as mapping onto the calculations going on inside the model to produce an output.

I think the key point on avoiding this is the intervening-on-the-thoughts part:
"An AI produces thoughts as visible intermediates on the way to story text, allowing us to watch the AI think about how to design its output, and to verify that we can get different sensible outputs by intervening on the thoughts".

So the idea is that you train things in such a way that the thoughts do map onto the calculations going on inside the model.

5nostalgebraist3y

I've fine-tuned GPT models on a bunch of different datasets of different sizes, although not this particular dataset (which doesn't exist yet). Below I list some key things to note. Also see here for related discussion. These points hold true for typical tasks/datasets, though a few unusual ones like arithmetic behave differently. * GPT performance tends to scale smoothly and gradually with data/model size, over multiple orders of magnitude. * In terms of subjective response, you don't need much data to get GPTs to the level of "hey, it kinda gets it!". * You may need several orders of magnitude more data to reach the point of saturation where the model can't improve with additional data. * Incomplete mastery usually looks more like "randomly failing X% of the time" than "understanding X% of the content of the task," which can make it difficult to assess quality (or quality differences) at a glance. For a concrete example, here is a data scaling experiment I did with GPT-J (6.1B params) on the tumblr post dataset I use for my tumblr bot. My full dataset is roughly 4 times as large as the 30M word dataset proposed here, i.e. the 30M word dataset would be roughly as big as the 25% subsample shown in the report. The linked report only shows val loss, which is not very interpretable, but at least conveys that I haven't reached diminishing returns yet. This seems plausible from subjective evidence, as the model still sometimes misunderstands tumblr lingo / the conversational structure of the data / etc.

5StellaAthena3y

Using the stated length estimates per section, a single run would constitute approximately 600 pages of single spaced text. This is a lot of writing.

[-]Daniel Kokotajlo3yΩ490

Came across this today on r/mlscaling and thought I'd put it here since it's relevant: https://arxiv.org/abs/2201.11903#google

This paper explores the ability of language models to generate a coherent chain of thought—a series of short sentences that mimic the reasoning process a person might have when responding to a question. Experiments show that inducing a chain of thought via prompting can enable sufficiently large language models to better perform reasoning tasks that otherwise have flat scaling curves.

[-]MichaelLowe3y80

This looks exciting! I wonder about the proposed training setup: If one model produces the thoughts, and another one takes those as input to the prompts, are we actually learning anything about the internal state of either model? What is the advantage (beyond scalability) of this training setup vs just using the second model to produce continuations conditional on thoughts?

[-]michaelwheatley3y70

I know next to nothing about AI, so please correct me if I'm wrong, but it seems like the thought process of a dungeon master is a difficult starting point, since they're balancing out multiple levels of considerations. They're simulating a world, but also trying to shape a story (plus modelling the players & writing decent prose). The data would seem to be simpler to understand if you're dealing with a pure simulationist DM, or player who's 100% roleplayer (or munchkin), as the chains of reasoning would be focused on maximizing a single clear metric.

I ask because if so, some of those options might be also be easier to produce than a true AI Dungeon run.

[-]Ryan Mather3y60

I think there a lot of amazing people in the roleplaying games community that could help meet this project's goals. That said I'm worried this document would be hard to understand for most of that community, which doesn't overlap that much with the AI community. I'd suggest rephrasing the ask in plain english.

"We're looking to pay dungeon masters to submit transcripts of games with a documentation of their thought process, so we can train algorithms to think the way dungeon masters do. Here's the format we need them in and steps for how to apply, and how much you can get paid".

[-]RomanS3y60

A possible way to scale it: "collaborative fanfic dungeons":

a publicly accessible website where users can
- write dungeon runs
- write new steps to the existing runs
- rate the runs / steps (perhaps with separate ratings for thoughts, actions etc)
- only selected users can rate (initially - only the admins, then - top users etc)
could be as technically simple as a wiki (at least in the first iterations)
- could go way beyond that. E.g.:
  - automatic generation of playable text adventures
  - play as the DM with real people
the target audience: fanfic writers / readers
- (it's much eas

... (read more)

4plex3y

I think the MVP way to do this would be a Discord server with non-public channels for individual runs and using the threads feature to give feedback to each other. If anyone would like to do that and is looking for collaborators, drop by the Visible Thoughts Discord and let us know.

[-]Ronny Fernandez3y60

Can we apply for consultation as a team of two? We only want remote consultation of the resources you are offering because we are not based in bay area.

7So8res3y

Yep!

[-]Alicorn2y50Review for 2021 Review

I appreciate this post, though mostly secondhand. It's special to me because it provided me with a way to participate more-or-less directly in an alignment project: one of my glowfic buddies decided to rope me in to write a glowfic thread in this format for the project [here](https://glowfic.com/posts/5726). I'd like to hear more updates about how it's gone in the last year, though!

[-]Beth Barnes3yΩ350

It seems to me like this should be pretty easy to do and I'm disappointed there hasn't been more action on it yet. Things I'd try:
- reach out to various human-data-as-a-service companies like SurgeHQ, Scale, Samasource
- look for people on upwork
- find people who write fiction on the internet (e.g. post on fanfiction forums) and offer to pay them to annotate their existing stories (not a dungeon run exactly, but I don't see why the dungeon setting is important)

I'd be interested to hear if anyone has tried these things and run into roadblocks.

I'm also ... (read more)

[-]Dr_Manhattan3y50

Related work:
Show Your Work: Scratchpads for Intermediate Computation with Language Models
https://arxiv.org/abs/2112.00114

(from very surface-level perusal) Prompting the model resulted in
1) Model outputting intermediate thinking "steps"

2) Capability gain

[-]SD Marlow3y50

I'm just "importing" my twitter thread and adding some additional thoughts.

If some model could spit out 100 of these annotated adventures, then the challenge would have already been solved.

Not sure about that 300,000 word count document idea though... A word-dump focused "result" plays into the strength of LLM's while providing none of the structure that is missing.

The more I work on this, the more I think you want something different. Perhaps use existing choose your own adventure books as a starting point, and work on deconstructing them; expanding... (read more)

1oge3y

Yeah, and let's not build a machine that can lie very well.

1SD Marlow3y

This is a relevant point: An AI that can craft some misdirection into a game or story is showing a deeper level of understanding, but as it's within a context (game/story), that isn't really a lie. The question for MIRI is, does that kind of "knowledge about misdirection" serve as a dual-use technology, where said ability could be used in other circumstances?

[-]Thomas Kwa3y40

I think the bounty amount is too low to attract skilled writers. The rate of ~3.3 cents/word is substantially less than the 6-10 cents per word most publications pay. Though it is stated in this post that a run "does not need to be published-novel-quality literature", this project is sufficiently weird that I'd imagine most skilled writers would rather write traditional short fiction, especially when considering that this project wouldn't move writers towards either career development or their passions.

3Ben Pace3y

For a nearby datapoint, amounts I've paid to solid editors (not writers) for work include 1.5 cents/word and $50/hour.

[-]michaelwheatley3y40

Are you familiar with forum rps (role play)s? A group of people would collectively write a story, each playing the role of one character. (Looking it up now, looks like there are some Choose-Your-Own-Adventure variants akin to AI Dungeon) It was more popular back in the day, but it looks like there are some still extant.

These people are already doing something like what you're asking for, so it might be worth someone dropping in and offering to pay them in exchange for them taking copious notes.

Funnily enough, the first one I found when googlin... (read more)

[-]SD Marlow3y30

I started with something more "contained" and easier to manage because actual users will go off script every chance they get, and this is basically like playing chess against yourself while reading a book on how to play chess. But, I may have found a kind of working compromise in terms of format and what needs to be captured. Will need a few days to see how it holds up, but right now, this is the basic idea:

Initial PROMPT to get the story started, followed by THOUGHTS that examine them from a gaming perspective, an ACTION, my THOUGHTS, another PROMPT... (read more)

1SD Marlow3y

Found a rhythm using PLAT (Prompt. Logic. Action. Thought.) but am only averaging 185 words per step. That would be about 18,000 words for 100 steps, or 54,000 words for 300 (which is the very bottom end of book territory). Agree that 100 steps is no story, but waiting to reach 100 steps before checking-in is waiting to long. Would recommend anyone near the 20 step or 10 pages mark send that in for feedback before going further. I'm going to ignore my own advice because I'd like to complete the first 3 scenes, which is closer to 10% of the full story.

1SD Marlow3y

People are concerned about upfront time commitment, while also being focused on 100 step minimum. In another comment I went over how 250 steps works as a better minimum, but to keep all the numbers aligned, perhaps every story should be in 3 acts of 100 steps each (with at least 300 steps being a requirement; handing-in 299 steps would seem sloppy and rude). That would make each "short" story worth $6k, and each act $2k, which is the same 10% of 1,000 steps. Except, handing in the first act should only reserve your $6k payout, not result in getting $2k at a time (desire for finished products and not having to burden anyone with increased tracking/management). There could also be an $18k cap (for 3 short stories of at least 300 steps each) to both limit the number of short stories submitted and let people know there is no "dominating" of the short story space.

1SD Marlow3y

With no idea what the arc of the run/story will be, it's really hard to plan for 3 acts, so maybe not so useful. But did want to leave another comment about scenes. With 4 scenes being about 50 steps, just as a reference, we can look at the number of scenes in a movie to figure each run could be 500 to 750 steps in total length. I just don't see 1,000 steps as being anything other than an arbitrary dataset requirement. 250-300 steps as a playable run. 500 to 600 steps as a "movie length" representation. And then to double that? The mental requirement to "film" a Lord of the Rings trilogy while also "filming" the behind the scenes of that filming and also "filming" the real-time documentary required to keep track of everything... while not being clear on how that extra "run time" translates into being better training data. 1. Is there going to be a "THIS" post, using sample work that you really like and "demanding" all other entries follow that exact format? How will variations in formatting be addressed? Does it need to be? 2. If you get something that checks all the right boxes, with one exception that leads to a rejection, I think we'd all like to know what that one must-have is.

1SD Marlow3y

Using scenes as a marker has some added benefit as I find myself leaving high level comments about some of the next scenes (I had nothing planned beyond the start, but the natural progression leads to speculation about future events or details). This is some of that looking ahead data that this project wanted to capture. Perhaps there should be a FUTURE keyword to wrap these things under? It would basically be a THOUGHT for world building ideas, but not specific to the current part of the story/narrative. Anything that goes into writing or crafting needs to be captured in "real time" which means dumping it right in the middle of whatever you are doing.

[-]Bart Bussmann25d20

Three years later, and we actually got LLMs with visible thoughts, such as Deepseek, QwQ, and (although partially hidden from the user) o1-preview.

I (Nate) find it plausible that there are capabilities advances to be had from training language models on thought-annotated dungeon runs.

Good call!

1Martin Randall25d

But I don't think these came about through training on synthetic thought-annotated texts.

[-]Utilop3yΩ020

Some naive thoughts in case useful:

A) Is the structured annotation format more useful than a gamemaster/writer thinking aloud while recording themselves (possibly with an audience)?

That could be the closest thing to a full transcript of the human process which downstream tasks could condense as needed. An adopted annotation format (prescribed or not) could potentially cause thoughts to be filtered, reinterpreted, or even steer human generation?

One key example against a fixed-format annotation, I think is that human gamemasters and writers do not spend appr... (read more)

[-]Scott Emmons3yΩ110

It seems to me that the comments in code provide "visible thoughts" for what the programmer intends. What do you hope to learn from training language models on thought-annotated dungeons that you couldn't learn from language models that have already been trained on commented code?

[-]rokosbasilisk3y10

silly idea: instead of thought-annotating ai-dungeon plays, we can start with annotating thoughts for akinator gameruns.

pros: much more easier and faster way to build a dataset, with less ambiguity

cons: somewhat restricted than the original idea.

[-]Vlad Loweren3y10

As a photographer, I got excited at first by the inclusion of the word "visible", but I guess today is not my day. Is there any chance for me to participate in training ML models by collecting a dataset of photos? I'm in the process of relocating to Singapore, but getting a work visa takes a while so I have a lot of free time now.

[-]cultureulterior3y10

I don't understand why showing the thinking of the DM/Author is important for this problem. To me it feels sufficient to show the thinking of the characters alone?

1oge3y

I think we'd like a summary of how the decisions were arrived at

[+]oge3y-60

Moderation Log