User Comment Replies

Well written. I find a large amount of my thinking process involves long lookups to some part of my brain that I can't visualize very well . This could be due to my particular split of inner monologue and abstract thought, but I'm now finding it challenging to optimize the speed of the abstract thought.

This feels like LLM interpretability with and without COT.

Would catching your AIs trying to escape convince AI developers to slow down or undeploy?

agazi9mo20

It's very fascinating to consider how the costs of undeploying would be analyzed in the heat of the moment. If we consider the current rate of LLM adoption in all parts of the economy over the next few years, one could foresee a lot of pipelines breaking if all GPT6 level models get removed from the api.

Definitely not a new comparison but this scenario seems similar to the decision to shut down the economy at the onset of Covid.

Instruction-following AGI is easier and more likely than value aligned AGI

agazi1yΩ140

I think we can already see the early innings of this with large API providers figuring out how to calibrate post-training techniques (RHLF, constitutional AI) between economic usefulness and the "mean" of western morals. Tough to go against economic incentives

3Seth Herd1y

Yes, we do see such "values" now, but that's a separate issue IMO. There's an interesting thing happening in which we're mixing discussions of AI safety and AGI x-risk. There's no sharp line, but I think they are two importantly different things. This post was intended to be about AGI, as distinct from AI. Most of the economic and other concerns relative to the "alignment" of AI are not relevant to the alignment of AGI. This thesis could be right or wrong, but let's keep it distinct from theories about AI in the present and near future. My thesis here (and a common thesis) is that we should be most concerned about AGI that is an entity with agency and goals, like humans have. AI as a tool is a separate thing. It's very real and we should be concerned with it, but not let it blur into categorically distinct, goal-directed, self-aware AGI. Whether or not we actually get such AGI is an open question that should be debated, not assumed. I think the answer is very clearly that we will, and soon; as soon as tool AI is smart enough, someone will make it agentic, because agents can do useful work, and they're interesting. So I think we'll get AGI with real goals, distinct from the pseudo-goals implicit in current LLMs behavior. The post addresses such "real" AGI that is self-aware and agentic, but that has the sole goal of doing what people want is pretty much a third thing that's somewhat counterintuitive.

LESSWRONG
LW

All of agazi's Comments + Replies