Comment author: TheAncientGeek 14 December 2014 11:13:03AM *  0 points [-]

that a rationally discoverable set of ethics might not be as sensible notion as it sounds.

That wasn't the point I thought I was making. I thought I was making the point that the idea of tractable sets of moral truths had been sidelined rather than sidestepped...that it had been neglected on the basis of a simplification that has not been delivered.

Having said that, I agree that discoverable morality has the potential downside of being inconvenient to, or unfriendly for , humans: the one true morality might be some deep ecology that required a much lower human population, among many other possibilities. That might have been a better argument against discoverable morality ethics than the one actually presented.

But on the other hand human preference satisfaction seems a really bad goal - many human preferences in the world are awful - take a desire for power over others for example. Otherwise human society wouldn't have wars, torture, abuse etc etc.

Most people have a preference for not being the victims of war or torture. Maybe something could be worked up from that.

CEV is the main accepted approach at MIRI :-( I assumed it was one of many

I've seen comments to the effect that to the effect that it has been abandoned. The situation is unclear.

Comment author: the-citizen 15 December 2014 05:39:33AM *  0 points [-]

Thanks for reply. That makes more sense to me now. I agree with a fair amount of what you say. I think you'd have a sense from our previous discussions why I favour physicalist approaches to the morals of a FAI, rather than idealist or dualist, regardless of whether physicalism is true or false. So I won't go there. I pretty much agree with the rest.

EDIT> Oh just on the deep ecology point, I believe that might be solvable by prioritising species based on genetic similarity to humans. So basically weighting humans highest and other species less so based on relatedness. I certainly wouldn't like to see a FAI adopting the view that people have of "humans are a disease" and other such views, so hopefully we can find a way to avoid that sort of thing.

Comment author: TheAncientGeek 13 December 2014 03:55:33PM *  1 point [-]

MIRI makes the methodological proposal that it simplifies the issue of friendliness (or morality or safety) to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics.

The idea of a tractable, rationally discoverable , set of ethical principles is a weaker form of, or lead into, one of the most common objections to the MIRI approach: "Why doesn't the AI figure out morality itself?".

Comment author: the-citizen 14 December 2014 07:39:07AM *  0 points [-]

Thanks that's informative. Not entirely sure your own position is from your post, but I agree with what I take your implication to be - that a rationally discoverable set of ethics might not be as sensible notion as it sounds. But on the other hand human preference satisfaction seems a really bad goal - many human preferences in the world are awful - take a desire for power over others for example. Otherwise human society wouldn't have wars, torture, abuse etc etc. I haven't read up on CEV in detail, but from what I've seen it suffers from a confusion that somehow decent preferences are gained simply by obtaining enough knowledge? I'm not fully up to speed here so I'm willing to be corrected.

EDIT> Oh... CEV is the main accepted approach at MIRI :-( I assumed it was one of many

Comment author: RobbBB 13 December 2014 10:13:57PM *  1 point [-]

There may be questions in moral philosophy that we need to answer in order to build a Friendly AI, but most MIRI-associated people don't think that the bulk of the difficulty of Friendly AI (over generic AGI) is in generating a sufficiently long or sufficiently basic list of intuitively moral English-language sentences. Eliezer thinks the hard part of Friendly AI is stability under self-modification; I've heard other suggestions to the effect that the hard part is logical uncertainty, or identifying how preference and motivation are implemented in human brains.

The problems you need to solve in order to convince a hostile human being to become a better person, or to organize a society, or to motivate yourself to do the right thing, aren't necessarily the same as the problems you need to solve to build the brain of a value-conducive agent from scratch.

Comment author: the-citizen 14 December 2014 07:29:54AM *  0 points [-]

The stability under self-modification is a core problem of AGI generally, isn't it? So isn't that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn't derivative of moral philosophy?

Additionally, many human preferences are almost certainly not moral... surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...

Comment author: RobbBB 13 December 2014 10:03:04PM 2 points [-]

the-citizen is replying to this thing I said:

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Those are just three things we don't necessarily want to be perceived as; they don't necessarily share anything else in common. However, because the second one is pejorative and the first is sometimes treated as pejorative, the-citizen was wondering if I'm anti-moral-philosophy. I replied that highly anthropomorphic AI and moral philosophy are both perfectly good fields of study, and overlap at least a little with MIRI's work; but the typical newcomer is likely to think these are more central to AGI safety work than they are.

Comment author: the-citizen 14 December 2014 07:27:18AM 0 points [-]

For the record, my current position is that if MIRI doesn't think it's central, then it's probably doing it wrong.

Comment author: RobbBB 13 December 2014 09:19:06AM 1 point [-]

It's appropriate to anthropomorphize when you're dealing with actual humans, or relevantly human-like things. Someone could legitimately research issues surrounding whole brain emulations, or minor variations on whole brain emulations. Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI's attention doesn't go to ems or moral philosophy.

Comment author: the-citizen 13 December 2014 01:56:35PM *  0 points [-]

But perhaps moral philosophy is important for a FAI? Like for knowing right and wrong so we can teach/build it into the FAI? Understanding right and wrong in some form seems really central to FAI?

Comment author: the-citizen 13 December 2014 08:12:03AM 0 points [-]

What do you feel is bad about moral philosophy? It looks like you dislike it because place it next to anthropormorphic thinking and technophobia.

Comment author: KatjaGrace 02 December 2014 02:22:57AM 1 point [-]

Can you think of goals that would lead an agent to make a set number of paperclips (or whatever) then do nothing?

Comment author: the-citizen 10 December 2014 12:37:44PM 0 points [-]

I'll leave these two half-baked ideas here in case they're somehow useful:

DO UNTIL <Failsafe mechanism> - Construct an AI to perform its utility function until an undesirable failsafe condition is met. (Somehow) make the utility function not take the failsafe into account when calculating utility (can it be made blind to the failsafe somehow? Force the utility function to exclude their existence? Make lack of knowledge about failsafes part of the utility function?) Failsafes could be every undesirable outcome we can think of, such as human death rate exceeds X, biomass reduction, quantified human thoughts declines by X, mammalian species extictions, quantified human suffering exceeds X, or whatever. One problem is how to objectively attribute these triggers causally to the AI (what if another event occurs and shuts down the AI which we now rely on).

Energy limit - Limit the AIs activities (through its own utility function?) through an unambiguous quantifiable resource - matter moved around or energy expended. The energy expended would (somehow) include all activity under its control. Alternatively this could be a rate rather than a limit, but I think this would be more likely to go wrong. The idea would be to let the AGI go foom, but not let it have energy for other stuff like a paperclip universe. I am not sure about this idea achieving all that much safety, but here it is.

I don't know if an intelligence explosion will truely be possible, but plenty of people smarter than I seem to think so... good luck in this field of work!

Comment author: SteveG 02 December 2014 03:14:46PM 1 point [-]

On infrastructure profusion:

What idiot is going to give an AGI a goal which completely disrespects human property rights from the moment it is built?

Meanwhile, an AGI that figured out property rights from the internet would have some idea that if it ignored property rights, people would want to turn it off. If it has goals which were not possible to achieve once turned off, then it would respect property rights for a very long time as an instrumental goal.

And I do believe we should be able to turn off an off-the-grid AGI running on a limited amount of computing resources whose behavior was previously simulated many times.

So, we seem to be getting closer to being willing to test disabled AGIs and AGI components, if we can avoid people misusing them.

Comment author: the-citizen 10 December 2014 12:09:32PM 1 point [-]

Wouldn't most AGI goals disregard property rights unless it was explicitly built in? And if it was built in, wouldn't an AGI just create a situation (eg. progressive blackmail or deception or something) where we wanted to sell it the universe for a dollar?

Comment author: KatjaGrace 02 December 2014 02:26:46AM 5 points [-]

Do you think you could you see would-be malignant failure modes more than once and so have time to learn about them by setting an AI up in a limited, simulated environment?

Comment author: the-citizen 10 December 2014 11:55:25AM *  0 points [-]

The risk of course is the AI predicting that it's nested in this sort of environment and finding a way to signal to observers. Even if it's blind to the other layers it might try it just in case. What you want is to develop a way for the simulated world environment to detect a harmful intellgience explosion and send a single bit binary communication "out of the box" to indicate that it has occurred. Then you can shut it down and keep trying multiple instances until you get a success at this level of safety. I guess you can then slowly expand the amount of information that can come "out of the box" for safety. I have no idea how the detection process could be implemented, though... perhaps the difficultly of its implementation would defeat the usefulness of this idea?

EDIT> Interaction between the AGI and the detection mechanism could be problematic... it could predict its existence and find a way to deceive it?

Comment author: [deleted] 30 November 2014 12:03:33PM -1 points [-]

Answer: clearly, no. If you know all the ways things can go wrong, but don't know how to make them go right, then your knowledge is useless for anything except worrying.

Comment author: the-citizen 02 December 2014 08:05:17AM 0 points [-]

Thanks for comment. I will reply as follows:

  • Knowing how things could go wrong gives useful knowledge about scenarios/pathways to avoid
  • Our knowledge of how to make things go right is not zero

My intention with the article is to draw attention to some broader non-technical difficulties in implementing FAI. One worrying theme in the reponses I've gotten is a conflation between knowledge of AGI risk and building a FAI. I think they are separate projects, and that success of the second relies on comprehensive prior knowledge of the first. Apparently MIRI's approach doesn't really acknowledge the two as separate.

View more: Prev | Next