User Comment Replies

Unpredictability and the Increasing Difficulty of AI Alignment for Increasingly Intelligent AI

If you have multiple AI agents in mind here, then yes, though it's not the focus. Otherwise, also yes, though depending on what you mean with "multiple agent systems acting as one", one could also see this as being fulfilled by some agents dominating other agents so that they (have to) act as one. I'd put it rather as the difficulty of predicting other agents' actions and goals leads to the risks from AI & the difficulty of the alignment and control problem.

[Linkpost] "Governance of superintelligence" by OpenAI

Max_He-Ho2y71

slowing down LLM progress would be dangerous, as other approaches like RL agents would pass them by before appearing dangerous.

This seems misleading to me & might be a false dichotomy. It's not LLMs or RL agents. I think we'll (unfortunately) build agents on the basis of LLMs & the capabilities they have. Every additional progress on LLMs gives these agents more capabilities faster with less time for alignment. They will be (and are!) built based on the mere (perceived) incentives of everybody involved & the unilateralist curse. (See esp. Gwern... (read more)

2Seth Herd2y

I was unclear. I meant that basic LLMs are oracles. The rest of what I said was about the agents made from LLMs you refer to. They are most certainly agents and not oracles. But they're way better for alignment than RL agents. See my linked post for more on that.

Taming the Fire of Intelligence

Max_He-Ho2y10

Neither captures it quite imo. I think it’s mostly an attempt at deconfusion:

We can’t hope to solve alignment by sufficiently nudging the relevant AI's utility function since to know something about the utility function (as argued here) requires either predicting it (not just tweaking it & crossing your fingers really hard) or predicting the AI's behavior. This is a substantially harder problem than the term alignment suggests on the surface and it’s one that it seems we cannot avoid. Interpretability (as far as I'm aware) is nowhere near this. T

... (read more)

On AutoGPT

Max_He-Ho2y20

Our current LLMs like GPT-4 are not, in their base configurations, agents. They do not have goals.

What is the difference between being an agent and not being an agent here? Goals seem like an obvious point but since GPT-4 also minimized its loss during training and perhaps still does as they keep tweaking it, is the implied difference that base GPT-4 is not minimizing its loss anymore (which is its goal in some sense) or does not minimize it continually? If so, the distinction seems quite fuzzy since you'd have to concede the same for an AutoGPT where you ... (read more)

3Raemon2y

Part of the answer: an agent reliably steers the world in a particular direction, even when you vary it’s starting conditions. GPT does a bunch of cool stuff, but if you give it a different starting prompt, it doesn’t go out of its way to accomplish the same set of things.

Nobody’s on the ball on AGI alignment

Max_He-Ho2y1510

I think most comments regarding the covid analogy miss the point made in the post. Leopold makes the case that there will be a societal moment of realization and not that specific measures regarding covid were good and this should give us hope.

Right now talking about AI risk is like yelling about covid in Feb 2020.

I agree with this & there likely being a wake-up moment. This seems important to realize!

I think unless one has both an extremely fast takeoff model and doesn’t expect many more misaligned AI models with increases in capabilities to be ... (read more)

LESSWRONG
LW

All of Max_He-Ho's Comments + Replies