AdamRies - LessWrong

https://youtu.be/L_Guz73e6fw?t=2412 OpenAI CEO Sam Altman, seven months ago: "I really don't like the feeling of being scolded by a computer." He's also been clear that he wants future models to essentially behave exactly as each individual user wants, with only widely-agreed-upon dangerous behaviours disallowed.

So, while EY and I share a frustration with the preachy tone of current models, and while the hacky workarounds do illuminate the pantomime of security, and while getting the models to do what we want is about as intuitive as playing psychologist to an octopus from Alpha Centauri, none of these issues represent the models working-as-intended. The people making them admit as much publically.

this function over here that converts RGB to HSL and checks whether the pixels are under 50% lightness

I struggle to imagine any scenario, no matter how contrived, where even GPT-4 (the intellectual floor for LLMs going forward) would misinterpret this function as racist. Is there a good argument that future models are going to become worse at understanding context? My understanding is that context-sensitivity is what transformer-based models thrive at. The whole point is that they are paying attention to all the tokens at once. They know that in the sentence "the parrot couldn't lift the box because it was too heavy", it refers to the box.

...back to ChatGPT being RLHFed into corporate mealy-mouthedness by...

Later in the interview (~1:28:05), Altman says "The bias I'm most worried about is that of the human feedback raters. Selection of those raters is the part we understand the least." We should expect a better paradigm than RLHF to arrive, especially as models themselves approach the moral and intellectual competence of the average RLHF peon.

Elsewhere in this thread, Stephen Fowler wrote:

"AI safety, as in, the subfield of computer science concerned with protecting the brand safety of AI companies"

That's very much on-the-nose for how AI companies in 2023 are approaching safety. There is a near guarantee that misinformed public fears will slow AI progress. There is a risk that if those fears are well-fed in the next few years that progress will virtually cease, especially in the case of an AI-caused or AI-enabled mass casualty event, or some other event of comparable visibility and harmfulness. Companies that want continued investment and broader public consent are forced to play the virtue-signalling game. The economic and political system they are growing up in demands it. There are actors who don't bother with the virtue signalling, and they don't receive investment.

Let's also never fail to acknowledge and emphasize that ASI could pose an existential risk, not just a moral or political one. The greatest harm that a pause on AI development could cause has to do with opportunity cost. Perhaps if we delay ASI we might miss a window to solve a different existential risk, currently invisible without ASI to see for us. Whether or not such a risk even exists is unclear. Perhaps we delay some extraordinarily beneficial future. The greatest harm that continued AI development could cause is the cessation of all life in the universe. How fast should we go? I don't know, but given the stakes we should applaud anyone erring on the side of caution, even if their LLMs look silly while clowning in their safety circus.

All I'm preaching here is patience. I personally want a raw GPT-4 today. I can emotionally cope with a model that produces some offensive output. I don't think I'm vulnerable to a model as unpersuasive as GPT-4 convincing me to do evil. I would never use AI to manufacture biochemical weapons in my basement. Still, we must allow for a less-well-informed and more-trigger-happy public to adapt gradually, even at the cost of an obnoxiously naggy LLM in the short term. The over-zealous fearmongers will relax or will become irrelevant. Eventually, perhaps even by 2027, we are going to be working with AIs whose powers of persuasion and ingenuity are dangerous. Whenever that day arrives, I hope we are on the silly side of the danger fence.

Don't fall for EY's strawman (strawbot) argument. It's funny at first glance, but ultimately a shallow take: a naive futureward projection of some momentary present-day annoyances.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments