Benjy_Forstadt - LessWrong

Another argument against utility-centric alignment paradigms

There is a difference between the claim that powerful agents are approximately well-described as being expected utility maximizers (which may or may not be true) and the claim that AGI systems will have an explicit utility function the moment they’re turned on, and maximize that function from that moment on.

I think this is the assumption OP is pointing out: “most of the book's discussion of AI risk frames the AI as having a certain set of goals from the moment it's turned on, and ruthlessly pursuing those to the best of its ability”. “From the moment it’s turned on” is pretty important, because it rules out value learning as a solution

MIRI 2024 Communications Strategy

Benjy_Forstadt4mo10

Edit: Retracted because some of my exegesis of the historical seed AI concept may not be accurate

MIRI 2024 Communications Strategy

Benjy_Forstadt4mo10

There will be future superintelligent AIs that improve themselves. But they will be neural networks, they will at the very least start out as a compute-intensive project, in the infant stages of their self-improvement cycles they will understand and be motivated by human concepts rather than being dumb specialized systems that are only good for bootstrapping themselves to superintelligence.

[This comment is no longer endorsed by its author]Reply

MIRI 2024 Communications Strategy

Benjy_Forstadt4mo12

How does the question of whether AI outcomes are more predictable than AI trajectories reduce to the (vague) question of whether observations on current AIs generalize to future AIs?

MIRI 2024 Communications Strategy

Benjy_Forstadt4mo35-11

To be blunt, it's not just that Eliezer lacks a positive track record in predicting the nature of AI progress, which might be forgivable if we thought he had really good intuitions about this domain. Empiricism isn't everything, theoretical arguments are important too and shouldn't be dismissed. But-

Eliezer thought AGI would be developed from a recursively self-improving seed AI coded up by a small group, "brain in a box in a basement" style. He dismissed and mocked connectionist approaches to building AI. His writings repeatedly downplayed the importance of compute, and he has straw-manned writers like Moravec who did a better job at predicting when AGI would be developed than he did.

Old MIRI intuition pumps about why alignment should be difficult like the "Outcome Pump" and "Sorcerer's apprentice" are now forgotten, it was a surprise that it would be easy to create helpful genies like LLMs who basically just do what we want. Remaining arguments for the difficulty of alignment are esoteric considerations about inductive biases, counting arguments, etc. So yes, let's actually look at these arguments and not just dismiss them, but let's not pretend that MIRI has a good track record.

LESSWRONG
LW

Posts

Wiki Contributions

Comments