User Comment Replies

Takeaways from our robust injury classifier project [Redwood Research]

sid1y20

Are there any plans to repeat this work using larger models which now exist?

More information about the dangerous capability evaluations we did with GPT-4 and Claude.

sid2y10

see the inputs of running that code

Should this be "outputs"?

Using GPT-4 to Understand Code

sid2y20

Nice, exercises are a good idea, especially for bite-sized things like einsum. It could also give personalized feedback on your solutions to exercises from a textbook.

Randomized flashcards like you've described would be really really cool. I'm just dipping my toes in the water with having it generate normal flashcards. It has promise, but I'm not sure on the best way to do it yet. One thing I've tried is prompting it with a list of principles the flashcards ought to adhere to, and then having it say for each flashcard which of the principles that card exhi... (read more)

What sci-fi books are most relevant to a future with transformative AI?

sid2y10

Can you point to a particular one? I've read Player of Games but I don't think it's relevant.

3Dagon2y

None of them are directly written for this aspect, but Player of Games is on the lower end of human/AI interaction content. If the universe didn't capture your attention, it may not be worth trying again, but I'll recommend Surface Detail as one that explores the interaction between AIs and humans, and between sim and "real" experiences.

What sci-fi books are most relevant to a future with transformative AI?

Answer by sidJan 24, 20232-1

Worth the Candle by Alexander Wales

What sci-fi books are most relevant to a future with transformative AI?

Answer by sidJan 24, 202320

Permutation City by Greg Egan has humans living in a simulation.

What sci-fi books are most relevant to a future with transformative AI?

Answer by sidJan 24, 202320

Diaspora by Greg Egan features human-like beings living in a virtual world, similar to the digital people described here.

Debate update: Obfuscated arguments problem

sid2y10

In the RSA-2048 example, why is it infeasible for the judge to verify every one of the honest player's arguments? (I see why it's infeasible for the judge to check every one of the dishonest player's arguments.)

AI Safety via Debate

sid2y*50

I was trying to get a clearer picture of how training works in debate so I wrote out the following. It is my guess based on reading the paper, so parts of it could be incorrect (corrections are welcome!), but perhaps it could be helpful to others.

My question was: is the training process model-free or model-based? After looking into it more and writing this up, I'm convinced it's model-based, but I think maybe either could work? (I'd be interested if anyone has a take on that.)

In the model-free case, I think it would not be trained like AlphaGo Zero, but in... (read more)

The alignment problem from a deep learning perspective

sid2y10

Do you think it's possible we end up in a world where we're mostly building AIs by fine-tuning powerful base models that are already situationally aware? In this world we'd be skipping right to phase 2 of training (at least on the particular task), thereby losing any of the alignment benefits that are to be gained from phase 1 (at least on the particular task).

Concretely, suppose that GPT-N (N > 3) is situationally aware, and we are fine-tuning it to take actions that maximize nominal GDP. It knows from the get-go that printing loads of money is the bes... (read more)

Solstice 2022 Roundup

sid2y60

I think the date is Dec 17 per the Facebook event?

4jefftk2y

That's correct; it's the 17th. Dec 19 would be a Monday. [Taymon has since edited the above]

Trying to Make a Treacherous Mesa-Optimizer

sid2y10

On page 8 of the paper they say, "our work does not demonstrate or address mesa-optimization". I think it's because none of the agents in their paper has learned an optimization process (i.e. is running something like a search algorithm on the inside).

5Lauro Langosco2y

FWIW I believe I wrote that sentence and I now think this is a matter of definition, and that it’s actually reasonable to think of an agent that e.g. reliably solves a maze as an optimizer even if it does not use explicit search internally.

The alignment problem from a deep learning perspective

sid2y30

It says that the first head predicts the next observation. Does this mean that that head is first predicting what action the network itself is going to make, and then predicting the state that will ensue after that action is taken?

(And I guess this means that the action is likely getting determined in the shared portion of the network—not in either of the heads, since they both use the action info—and that the second head would likely just be translating the model's internal representation of the action to whatever output format is needed.)

3Richard_Ngo2y

Good question. I imagine the first head mostly being trained on existing data (e.g. text, videos) but then when it comes to data gathered by the network itself, my default story is that it'd be trained to output predictions conditional on actions, so that it's not duplicating the learning done by the action head. But this is all fairly speculative and either seems reasonable.

Prizes for ELK proposals

sid3y10

If the predictor AI is in fact imitating what humans would do, why wouldn’t it throw its hands up at an actuator sequence that is too complicated for humans—isn’t that what humans would do? (I'm referring to the protect-the-diamond framing here.)

2paulfchristiano3y

As described in the report it would say "I'm not sure" when the human wasn't sure (unless you penalized that). That said, often a human who looks at a sequence of actions would say "almost certainly the diamond is there." They might change their answer if you also told them "by the way these actions came from a powerful adversary trying to get you to think the diamond is there." What exactly the reporter says will depend on some details of e.g. how the reporter reasons about provenance. But the main point is that in no case do you get useful information about examples that a human (with AI assistants) couldn't figure out what was happening on their own.

larger language models may disappoint you [or, an eternally unfinished draft]

sid3y30

There is some point at which it’s gaining a given capability for the first time though, right? In earlier training stages I would expect the output to be gobbledygook, and then at some point it starts spelling out actual words. (I realize I’m conflating parameters and training compute, but I would expect a model with few enough parameters to output gobbledygook even when fully trained.)

So my read of the de-noising argument is that at current scaling margins we shouldn’t expect new capabilities—is that correct? Part of the evidence being that GPT-3 doesn’t ... (read more)

7nostalgebraist3y

Not quite. If you define some capability in a binary yes-no way, where it either "has it" or "doesn't have it" -- then yes, there are models that "have it" and those that "don't," and there is some scale where models start "having it." But this apparent "switch flip" is almost always an artifact of the map, not a part of the territory. Suppose we operationalize "having the capability" as "scoring better than chance on some test of the capability." What we'll find is that models smoothly move from doing no better than chance, to doing 1% better than chance, to doing 2% better . . . (numbers are meant qualitatively). If we want, we can point at the model that does 1% better than chance and say "it got the capability right here," but (A) this model doesn't have the capability in any colloquial sense of the term, and (B) if we looked harder, we could probably find an intermediate model that does 0.5% better, or 0.25%... (By the time the model does better than chance at something in a way that is noticeable to a human, it will typically have been undergoing a smooth, continuous increase in performance for a while already.)

LESSWRONG
LW

All of sid's Comments + Replies