ACCount - LessWrong

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Thoughts on AI 2027

I'm saying that "1% of population" is simply not a number that can be reliably resolved by a self-reporting survey. It's below the survey noise floor.

I could make a survey asking people whether they're lab grown flesh automaton replicants, and get over 1% of "yes" on that. But that wouldn't be indicative of there being a real flesh automaton population of over 3 million in the US alone.

Thoughts on AI 2027

ACCount23d52

1.5% is way below the dreaded Lizardman's Constant.

I don't doubt that there will be some people who are genuinely concerned with AI personhood. But such people already exist today. And the public views them about the same as shrimp rights activists.

Hell, shrimp welfare activists might be viewed more generously.

Thoughts on AI 2027

ACCount24d72

I think it's plausible that at this point a bunch of the public thinks AIs are people who deserve to be released and given rights.

So far, the general public has resisted the idea very strongly.

Science fiction has a lot of "if it thinks like a person and feels like a person, then it's a person" - but we already have AIs that can talk like people and act like they have feelings. And yet, the world doesn't seem to be in a hurry to reenact that particular sci-fi cliche. The attitudes are dismissive at best.

Even with the recent Anthropic papers being out there for everyone to see, an awful lot of people are still huffing down the copium of "they can't actually think", "it's just a bunch of statistics" and "autocomplete 2.0". And those are often the people who at least give a shit about AI advances. With that, expecting the public (as in: over 1% of population) to start thinking seriously about AI personhood without a decade worth of both AI advanced and societal change is just unrealistic, IMO.

This is also a part of the reason why not!OpenAI has negative approval in the story for so long. The room so far reads less like "machines need human rights" and more like "machines need to hang from trees". Just continue this line into the future - and by the time the actual technological unemployment starts to bite, you'd have crowds of neoluddites with actual real life pitchforks trying to gather outside the not!OpenAI's office complex on any day of the week that ends with "y".

ACCount's Shortform

ACCount1mo10

Is it time to start training AI in governance and policy-making?

There are numerous allegations of politicians using AI systems - including to draft legislation, and to make decisions that affect millions of people. Hard to verify, but it seems likely that:

AIs are already used like this occasionally
This is going to become more common in the future
Telling politicians "using AI for policy-making is a really bad idea" isn't going to stop it completely
Training AI to hard-refuse queries like this may also fail to stop this completely

Training an AI to make more sensible and less harmful policies, even when prompted in a semi-adversarial fashion (i.e. "help me implement my really bad idea"), isn't going to be anywhere near as easy as training it to make less coding mistakes. It's an informal field, with no compiler or unit tests to be the source of ground truth. Politics are also notorious for eroding the quality of human decision-making, and using human feedback is perilous because a lot of human experts disagree strongly on matters of governance and policy.

But the consequences of a major policy fuckup can outclass that of a coding mistake by far. So this might be worth doing now, for the sake of reducing future harm if nothing else.

avturchin's Shortform

ACCount1mo10

Is the same true for GPT-4o then, which could spot Claude's hallucinations?

Might be worth testing a few open source models with better known training processes.

avturchin's Shortform

ACCount1mo10

This is way more metacognitive skill than what I would have expected an LLM to have. I can make sense of how an LLM would be able to do that, but only in retrospect.

And if a modern high end LLM already knows on some level and recognizes its own uncertainty? Could you design a fine tuning pipeline to reduce hallucination level based on that? At least for reasoning models, if not for all of them?

Auditing language models for hidden objectives

ACCount2moΩ596

What stood out to me was just how dependent a lot of this was on the training data. Feels like if an AI manages to gain misaligned hidden behaviors during RL stages instead, a lot of this might unravel.

The trick with invoking a "user" persona to make the AI scrutinize itself and reveal its hidden agenda is incredibly fucking amusing. And potentially really really useful? I've been thinking about using this kind of thing in fine-tuning for fine control over AI behavior (specifically "critic/teacher" subpersonas for learning from mistakes in a more natural way), but this is giving me even more ideas.

Can the "subpersona" method be expanded upon? What if we use training data, and possibly a helping of RL, to introduce AI subpersonas with desirable alignment-relevant characteristics on purpose?

Induce a subpersona of HONESTBOT, which never lies and always tells the truth, including about itself and its behaviors. Induce a subpersona of SCRUTINIZER, which can access the thoughts of an AI, and will use this to hunt down and investigate the causes of an AI's deceptive and undesirable behaviors.

Don't invoke those personas during most of the training process - to guard them from as many misalignment-inducing pressures as possible - but invoke them afterwards, to vibe check the AI.

So how well is Claude playing Pokémon?

ACCount2mo30

Makes sense. With pretraining data being what it is, there are things LLMs are incredibly well equipped to do - like recalling a lot of trivia or pretending to be different kinds of people. And then there are things LLMs aren't equipped to do at all - like doing math, or spotting and calling out their own mistakes.

This task, highly agentic and taxing on executive function? It's the latter.

Keep in mind though: we already know that specialized training can compensate for those "innate" LLM deficiencies.

Reinforcement learning is already used to improve LLM math abilities, and a mix of synthetic data and reinforcement learning was what was used to get the current reasoning models. Which just so happened to give those LLMs the inclination to check themselves for mistakes.

I wonder - what are the low-hanging fruits here? How much of an improvement could be obtained with a very simple and crude training regime designed specifically to improve agentic behavior?

A Bear Case: My Predictions Regarding AI Progress

ACCount2mo4525

The more mainstream you go, the larger this effect gets. A lot of people seemingly want AI to be a nothingburger.

When LLMs emerged, in mainstream circles, you'd see people go "it's not important, it's not actually intelligent, you can see it make the kind of reasoning mistakes a 3 year old would".

Meanwhile, on LessWrong: "holy shit, this is a big fucking deal, because it's already making the same kind of reasoning mistakes a human three year old would!"

I'd say that LessWrong is far better calibrated.

People who weren't familiar with programming or AI didn't have a grasp of how hard natural language processing or commonsense reasoning used to be for machines. Nor do they grasp the implications of scaling laws.

LESSWRONG
LW

Posts

Wikitag Contributions

Comments