Jiro comments on Safety engineering, target selection, and alignment theory - Less Wrong

17 Post author: So8res 31 December 2015 03:43PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (11)

You are viewing a single comment's thread. Show more comments above.

Comment author: [deleted] 01 January 2016 01:29:31AM *  0 points [-]

One common question we hear about alignment research runs analogously to: "If you don't develop calculus, what bad thing happens to your rocket? Do you think the pilot will be struggling to make a course correction, and find that they simply can't add up the tiny vectors fast enough? That scenario just doesn't sound plausible."

Actually, that sounds entirely plausible.

The case is similar with, e.g., attempts to develop theories of logical uncertainty. The problem is not that we visualize a specific AI system encountering a catastrophic failure because it mishandles logical uncertainty; the problem is that all our existing tools for describing the behavior of rational agents assume that those agents are logically omniscient, making our best theories incommensurate with our best practical AI designs.

Well, of course, part of the problem is that the best theories of "rational agents" try to assume Homo Economicus into being, and insist on cutting off all the ways in which physically-realizable minds cannot fit. So we need a definition of rationality that makes sense in a world where agents don't have completed infinities of computational power and can be modified by the environment and don't come with built-in utility functions that necessarily map physically realizable situations to the real numbers.

If we could program that computer to reliably achieve some simple goal (such as producing as much diamond as possible), then a large share of the AI alignment research would be completed.

Wait wait wait. You're saying that the path between Clippy and a prospective completed FAI is shorter than the path between today's AI state-of-the-art and Clippy? Because it sounds like you're saying that, even though I really don't expect you to say that.

On the upside, I do think we can spell out a research program to get us there, which will be grounded in current computational cog-sci and ML literature, which will also help with Friendliness/alignment engineering, which will not engender arguments with Jessica over math this time.

But now for the mandatory remark: you are insane and will kill us all ;-), rabble rabble rabble.

Comment author: Jiro 02 January 2016 01:16:48AM 0 points [-]

Programming a computer to reliably make lots of diamonds (or paperclips) is not creating Clippy for the same reason that programming Google Maps to produce the shortest distance between two locations is not creating Clippy. People program computers to do X, where X doesn't consider the welfare of humans, all the time. The programming is not really "do X no matter what", it's "do X using these methods". Google Maps will not start trying to hack the computers of construction equipment in order to build a bridge and shorten the distance it finds between two points.

Comment author: [deleted] 02 January 2016 05:16:43AM 1 point [-]

Programming a computer to reliably make lots of diamonds (or paperclips) is not creating Clippy for the same reason that programming Google Maps to produce the shortest distance between two locations is not creating Clippy.

Ok, but that makes Nate's statement very confusing. We already understand, "up to" R&D effort, how to program computers to use various peripherals to perform a task in the physical world without intelligence, using fixed methods. I'm left confused at what industrial automation has to do with AI alignment research.