User Comment Replies

Contra EY: Can AGI destroy us without trial & error?

I don't know, the bacteria example really gets me because working in biotech, it seems very possible and the main limitation is current lack of human understanding about all proteins' functions which is something we are actively researching if it can be solved via AI.

I imagine an AI roughly solving the protein function problem just as we have a rough solution for protein folding, then hacking a company which produces synthetic plasmids and slipping in some of its own designs in place of some existing orders. Then when those research labs receive their plas... (read more)

OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales

Aditya Jain3y30

I was trying to say that the gap between the two did not decrease with scale. Of course, raw performance increases with scale as gwern & others would be happy to see :)

6gwern3y

Yes, that was my take away. You expect a gap but there is no particular reason to expect the gap to close with scale, because that would require critique to scale better than discrimination, and why would you expect that rather than scaling similarly (maintaining a gap) or diverging in the other direction (discriminating scaling better than critiquing)? I think the gap itself is mildly interesting in a "it knows more than it can say" deception sort of way, but we already knew similar things from stuff like prompt programming for buggy Codex code completions. Since the knowledge must be there in the model, and it is brought out by fairly modest scaling (a larger model can explain what a smaller model detects), I would guess that it wouldn't be too hard to improve the critique with the standard tricks like generating a lot of completions & scoring for the best one (which they show does help a lot) and better prompting (inner-monologue seems like an obvious trick to apply to get it to fisk the summary: "let's explain step by step why this is wrong, starting with the key quote: "). The gap will only be interesting if it proves immune to the whole arsenal. If it isn't, then it's just another "sampling can prove the presence of knowledge but not the absence". Otherwise, this looks like a lot of results any pro-scaling advocate would be unsurprised to see: yet another task with apparently smooth improvement with model size*, some capabilities emerging with larger but not smaller models ("We also find that large models are able to directly improve their outputs, using their self-critiques, which small models are unable to do. Using better critiques helps models make better improvements than they do with worse critiques, or with no critiques.") at unpredicted sizes requiring empirical testing, big performance boosts from better sampling procedures than naive greedy sampling, interesting nascent bootstrapping effects... * did I miss something or does this completely omit a

OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales

Aditya Jain3y10

This makes sense in a pattern-matching framework of thinking, where both humans and AI can "feel in their gut" that something is wrong without necessarily being able to explain why. I think this is still concerning as we would ideally prefer AI which can explain its answers beyond knowing them from patterns, but also reassuring in that it suggests the AI is not hiding knowledge, but just doesn't actually have knowledge (yet).

What I find interesting is that they found this capability to be extremely variable based on task & scale - ie being able to expl... (read more)

LESSWRONG
LW

All of Aditya Jain's Comments + Replies