fubarobfusco comments on MIRI's technical research agenda - LessWrong

33 Post author: So8res 23 December 2014 06:45PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (52)

You are viewing a single comment's thread. Show more comments above.

Comment author: fubarobfusco 23 December 2014 09:13:09PM 2 points [-]

The same problem applies to any set of interests, though. It's not just that default AI drives will conflict with (say) liberal humanist interests. They'd conflict with "evangelize Christianity and ensure the survival of the traditional family" too.

Comment author: shminux 23 December 2014 10:07:10PM 4 points [-]

The same problem applies to any set of interests, though

I assume that you are talking about the problem of AI value drift, or, as OP puts it

a smarter-than-human system [...] reliably pursues beneficial goals "aligned with human interests"

What I am asking is whether OP presumes that the problem of figuring out what "human interests" are to begin with has been solved, at least in some informal way, like "ensure surviving, thriving and diverse humanity far into the future", or "comply with the literal world of the scriptures", or "live in harmony with nature". Even before we worry about the AI munchkining its way into fulfilling the goal in a way a jackass genie would.

Comment author: Kaj_Sotala 24 December 2014 12:18:40AM 5 points [-]

Section 4 of the document discusses value learning as an open problem involving its own challenges.

Comment author: shminux 24 December 2014 04:39:40AM 5 points [-]

Actually, if I understand it correctly, the value problem is turning informal values into formal ones, not figuring out the informal values to begin with.

Comment author: shminux 24 December 2014 02:33:40AM 2 points [-]

Thanks!

Comment author: fubarobfusco 24 December 2014 12:03:59AM *  2 points [-]

Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.

If we want to drive to the store, we have to both have a working car, and know how to get to the store. If the car is broken, we can fix the car. If we don't know how to get to the store, we can look at a map. We have to do both.

If someone else wants to use the car to drive to church, we may disagree about destinations but we both want a working car. Fixing the car doesn't "presume" that the destination question has been solved; rather, it's necessary to get to any destination.

(OTOH, if we fix the car and the church person steals it, that would kinda suck.)

Comment author: shminux 24 December 2014 12:10:58AM 2 points [-]

Rather than saying that the authors presume the problem of defining human interests has been solved, I would say that the authors are talking about a problem that also has to be solved, separately from that problem.

Right, I didn't mean "OP is clueless by assuming that the problem has been solved", but "let's assume the problem has been solved, and work on the next step". Probably worded it poorly, given the misunderstanding.