Jiro comments on Safety engineering, target selection, and alignment theory - Less Wrong
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (11)
Actually, that sounds entirely plausible.
Well, of course, part of the problem is that the best theories of "rational agents" try to assume Homo Economicus into being, and insist on cutting off all the ways in which physically-realizable minds cannot fit. So we need a definition of rationality that makes sense in a world where agents don't have completed infinities of computational power and can be modified by the environment and don't come with built-in utility functions that necessarily map physically realizable situations to the real numbers.
Wait wait wait. You're saying that the path between Clippy and a prospective completed FAI is shorter than the path between today's AI state-of-the-art and Clippy? Because it sounds like you're saying that, even though I really don't expect you to say that.
On the upside, I do think we can spell out a research program to get us there, which will be grounded in current computational cog-sci and ML literature, which will also help with Friendliness/alignment engineering, which will not engender arguments with Jessica over math this time.
But now for the mandatory remark: you are insane and will kill us all ;-), rabble rabble rabble.
Programming a computer to reliably make lots of diamonds (or paperclips) is not creating Clippy for the same reason that programming Google Maps to produce the shortest distance between two locations is not creating Clippy. People program computers to do X, where X doesn't consider the welfare of humans, all the time. The programming is not really "do X no matter what", it's "do X using these methods". Google Maps will not start trying to hack the computers of construction equipment in order to build a bridge and shorten the distance it finds between two points.
Ok, but that makes Nate's statement very confusing. We already understand, "up to" R&D effort, how to program computers to use various peripherals to perform a task in the physical world without intelligence, using fixed methods. I'm left confused at what industrial automation has to do with AI alignment research.