Comment Permalink

the-citizen10y00

Additionally, many human preferences are almost certainly not moral... surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...

Rob Bensinger10y50

If you want to build an unfriendly AI, you probably don't need to solve the stability problem. If you have a consistently self-improving agent with unstable goals, it should eventually (a) reach an intelligence level where it could solve the stability problem if it wanted to, then (b) randomly arrive at goals that entail their own preservation, then (c) implement the stability solution before the self-preserving goals can get overwritten. You can delegate the stability problem to the AI itself. The reason this doesn't generalize to friendly AI is that this process doesn't provide any obvious way for humans to determine which goals the agent has at step (b).

See in context

21 A forum for researchers to publicly discuss safety issues in advanced AI

by Rob Bensinger

13th Dec 2014

2 min read

21

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI. The MIRIx workshops, our new research guide, and our more detailed in-the-works technical agenda are intended to further that goal.

To encourage the growth of a larger research community where people can easily collaborate and get up to speed on each other's new ideas, we're also going to roll out an online discussion forum that's specifically focused on resolving technical problems in Friendly AI. MIRI researchers and other interested parties will be able to have more open exchanges there, and get rapid feedback on their ideas and drafts. A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

Topics will run the gamut from logical uncertainty in formal agents to cognitive models of concept generation. The exact range of discussion topics is likely to evolve over time as researchers' priorities change and new researchers join the forum.

We're currently tossing around possible names for the forum, and I wanted to solicit LessWrong's input, since you've been helpful here in the past. (We're also getting input from non-LW mathematicians and computer scientists.) We want to know how confusing, apt, etc. you perceive these variants on 'forum for doing exploratory engineering research in AI' to be:

1. AI Exploratory Research Forum (AIXRF)

2. Forum for Exploratory Engineering in AI (FEEAI)

3. Forum for Exploratory Research in AI (FERAI, or FXRAI)

4. Exploratory AI Research Forum (XAIRF, or EAIRF)

We're also looking at other name possibilities, including:

5. AI Foundations Forum (AIFF)

6. Intelligent Agent Foundations Forum (IAFF)

7. Reflective Agents Research Forum (RARF)

We're trying to avoid names like "friendly" and "normative" that could reinforce someone's impression that we think of AI risk in anthropomorphic terms, that we're AI-hating technophobes, or that we're moral philosophers.

Feedback on the above ideas is welcome, as are new ideas. Feel free to post separate ideas in separate comments, so they can be upvoted individually. We're especially looking for feedback along the lines of: 'I'm a grad student in theoretical computer science and I feel that the name [X] would look bad in a comp sci bibliography or C.V.' or 'I'm friends with a lot of topologists, and I'm pretty sure they'd find the name [Y] unobjectionable and mildly intriguing; I don't know how well that generalizes to mathematical logicians.'

Personal Blog

21

A forum for researchers to publicly discuss safety issues in advanced AI

New Comment

72 comments, sorted by

top scoring

Click to highlight new comments since: Today at 12:35 AM

Some comments are truncated due to high volume. (⌘F to expand all)Change truncation settings

[-]matheist10y190

I'm a postdoc in differential geometry, working in pure math (not applied). The word "engineering" in a title of a forum would turn me away and lead me to suspect that the contents were far from my area of expertise. I suspect (low confidence) that many other mathematicians (in non-applied fields) would feel the same way.

[-]Sarunas10y150

A relatively small group of people with relevant mathematical backgrounds will be authorized to post on the forum, but all discussion on the site will be publicly visible to visitors.

You should note that this policy is different from the policy of perhaps the largest and most successful internet mathematics forum Mathoverflow. Maybe you have already thought about this and decided that this policy will be better. I simply wanted to make a friendly reminder that whenever you want to do things differently from the "industry leader" it is often a good idea to have a clear idea exactly why.

7Rob Bensinger10y

Thanks, Sarunas. We've been thinking about this issue; multiple people have brought up the example of MathOverflow in previous discussions. It's a very relevant data point, though the intent of this forum differs from the intent behind MO in a number of ways.

7solipsist10y

Video recommendation: Joel Spolsky, cofounder of Stack Exchange, examining consequences of various forum designs: The Cultural Anthropology of Stack Exchange.

6Adele_L10y

It might also be worth talking to David Zureick-Brown (co-founder of MO) about this (and maybe other things). He's already interested in MIRI's work.

[-]Kawoomba10y70

Why not another subforum on LW, next to Main and Discussion, say, Technical Discussion? Probably because you want to avoid the "friendliness" nomenclature, but it would be nice to find some way around that, otherwise it's yet another raison d'être of this forum being outsourced.

[-]Kaj_Sotala10y150

LW seems to have a rather mixed reputation: if you want to attract mainstream researchers, trying to separate the research forum from the weirder stuff discussed on LW seems like a good idea.

0dxu10y

This interests me. I haven't been around here for very long, so if there are any particular incidents that have occurred in the past, I wouldn't be aware of them (sans the basilisk, of course, because that whole thing just blew up). Why does LW have such a mixed reputation? I would chalk it up to the "Internet forum" effect, because most mainstream researchers probably don't trust Internet forums, but MIRI seems to have the same thing going on, so it can't (just) be that. Is it just due to the weirdness, possibly causing LW/MIRI to be viewed as crankish? Or something else?

3ctintera10y

Many people (specifically, people over at RationalWiki, and probably elsewhere as well) see the community as being insular, or as being a Yudkowsky Personality Cult, or think that some of the weirder-sounding ideas widely espoused here (cryonics, FAI, etc) "might benefit from a better grounding in reality". Still others reflexively write LW off based on the use of fanfiction (a word of dread and derision in many circles) to recruit members. Even the jargon derived from the Sequences may put some people off. Despite the staunch avoidance of hot-button politics, they still import a few lesser controversies. For example, there still exist people who outright reject Bayesian probability, and there are many more who see Bayes' theorem as a tool that is valid only in a very narrow domain. Brazenly disregarding their opinion can be seen as haughty, even if the maths are on your side.

2hesperidia10y

Out on my parts of the internet, a major reason to reject LWisms is because they are perceived as coming from a "Silicon Valley tribe" that does not share values with the majority of people (i.e. similar to the attitude of the newsblog (?) Pando, which regularly skewers tech startups). The libertarians claiming to be "apolitical", and the neoreactionaries, do not help this perception at all. (Although discussing more of this is probably unwise because politics SPIDERS.)

-1Lumifer10y

Mutant and proud! :-)

2Viliam_Bur10y

I wonder how much of that negative view comes from the two or three people on RW who in the past have invested a lot of time and energy describing LW in the most uncharitable way, successfully priming many readers. There are many websites on the internet with a dominant author, specific slang, or weird ideas. People usually ignore them, if they don't like them. I am not saying that LW is flawless, only that it is difficult to distinguish between (a) genuine flaws of LW and (b) successfuly anti-LW memes which started for random reasons. Both of them are something people will complain about, but in one case they had to be taught to complain.

4Kawoomba10y

If this is true, or a major factor, then creating a new website is unlikely to be the solution. There is no reason to assume the anti-fans won't just write the same content about the new website, highlighting "the connection" to LW. Far removed from starting with a "clean slate", such a migration could even provide for a new negative spin on the old narrative and it could be perceived as the anti-fans "winning", and nothing galvanizes like the (perceived) taste of blood.

5Viliam_Bur10y

Yep. At this moment, we need a strategy, not just how to make a good impression in general (and we have already not optimized for this), but also how to prevent active character assassination. I am not an expert on this topic. And it probably shouldn't be debated in public, because, obviously, selective quoting from such debate would be another weapon for the anti-fans. The mere fact that you care about your impression and debate other people's biases can be spinned very easily. It's important to realize that we not only have to make a good impression on Joe the Rational Internet Reader, but also to keep social costs of cooperating with us reasonable low for Joe. At the end, we care not only about Joe's opinion, but also about opinions of people around him.

0ChristianKl10y

Giving the moderation track record of LW, there's also a case for having a new place with decent leadership.

0dxu10y

Are you referring to the basilisk? Other than that, I can't think of any real moderation disasters off the top of my head, and given the general quality of discourse here, I'm having a hard time seeing any real reason for zealous moderation, anyway.

4ChristianKl10y

When folks of this forum had an issue with mass downvoting, it took very long to get a response from the moderating team about the issue. Most of the moderation was pretty intransparent.

2Sarunas10y

I don't know what is the best way to design a forum for technical discussion. I think that your suggestion is worth consideration. But I guess that some people like to keep their work and their play strictly separate. If you invite them to post on LessWrong, then they aren't sure which mental folder - work or play - they should put it in, because you can find many things on LessWrong that cannot be described as "work", many people come here to play. Perhaps it is hard for one place to straddle both work and play. Whether making things strictly separate is the most productive approach is a different question. Perhaps it depends on an individual person, the nature of their work and their hobbies, etc.

0Kawoomba10y

Yea, but it kind of worked in the past. There was plenty of technical discussion on LW, and I doubt the limiting factor was a work/play confusion. Especially since most people who participate won't get paid to do so, so technically it'll also be "play time" in any case.

[-]Lumifer10y60

Forum for Exploratory Research in AI Logic (FERAL) :-D

7Luke_A_Somers10y

Discussions on Exploratory Reserach in AI Logic (DERAIL) Expect much topic drift.

0dxu10y

I like the acronym, but it suffers a bit from abbreviating an already-abbreviated name. (First "artificial intelligence", then "AI", now just "A"?)

3Lumifer10y

Would you prefer Forum for Research in AI Logic (FRAIL)?

0Ben Pace10y

Yes I was about to suggest this one!

[-]Manfred10y40

I like Intelligent Agent Foundations Forum, because I chronically overuse the word 'foundations,' and would like to see my deviancy validated. (preference ordering 6,5,1,4)

Also, I'm somewhat sad about the restricted posting model - guess I'll just have to keep spamming up Discussion :P

[-]Shmi10y40

I'd expect MIRI to run a forum called MIRF, but it has a negative connotation on urbandictionary. How about Safety in Machine Intelligence, or SMIRF? :)

5Rob Bensinger10y

I actually did consider 'Self-Modifying Intelligence Research Forum' at one point...

[-]Kaj_Sotala10y130

I initially parsed that as (Self-Modifying (Intelligence Research Forum)), and took it to indicate that the forum's effectively a self-modifying system with the participants' comments shifting each other's beliefs, as well as changing the forum consensus.

[-]RyanCarey10y20

Good initiative!

For people who haven't read the LM/Hibbard paper, I can't imagine it would be clear why 'exploratory' is a word that should apply to this kind of research as compared to other AI research. 5-7 seem more timeless. 5 seems clearest and most direct.

[-]polymathwannabe10y20

Layman suggestion here...

Future INtelligence DIscussioN Group (FINDING)

[-]blogospheroid10y10

Forum for Exploratory Research in General AI

[-]turchin10y10

Safe AI Forum

-1Ben Pace10y

(SAIF)

[-]Gondolinian10y00

Something from the tired mind of someone with no technical background:

Selective Forum for Exploratory AI Research (SFEAR)

Cool acronym, plus the "Selective" emphasizes the fact that only highly competent people would be allowed, which I imagine would be desirable for CV appearance.

[-]TheAncientGeek10y00

MIRI has an organizational goal of putting a wider variety of mathematically proficient people in a position to advance our understanding of beneficial smarter-than-human AI.

Sure does, There remains the question of whether it should be emphasising mathematical proficiency so much. MIRI isn't very interested in people who are proficient in actual computer science, or AI, which might explain why spends a lot of time on the maths of computationally untractable systems like AIXI. MIRI isn't interested in people who are proficient in philosophy, leaving it unable to either sidestep the ethical issues that are part of AI safety, .ir to say anything very cogent about them.

5Rob Bensinger10y

My background is in philosophy, and I agree with MIRI's decision to focus on more technical questions. Luke explains MIRI's perspective in From Philosophy to Mathematics to Engineering. Friendly AI work is currently somewhere in between 'philosophy' and 'mathematics', and if we can move more of it into mathematics (by formalizing more of the intuitive problems and unknowns surrounding AGI), it will be much easier to get the AI and larger computer science community talking about these issues. People who work for and with MIRI have a good mix of backgrounds in mathematics, computer science, and philosophy. You don't have to be a professional mathematician to contribute to a workshop or to the research forum; but you do need to be able to communicate and innovate concisely and precisely, and 'mathematics' is the name we use for concision and precision at its most general. A lot of good contemporary philosophy also relies heavily on mathematics and logic.

-6TheAncientGeek10y

[-]the-citizen10y00

What do you feel is bad about moral philosophy? It looks like you dislike it because place it next to anthropormorphic thinking and technophobia.

1Rob Bensinger10y

It's appropriate to anthropomorphize when you're dealing with actual humans, or relevantly human-like things. Someone could legitimately research issues surrounding whole brain emulations, or minor variations on whole brain emulations. Likewise, moral philosophy is a legitimate and important topic. But the bulk of MIRI's attention doesn't go to ems or moral philosophy.

1TheAncientGeek10y

The appropriate degree of anthropomorphisation when dealing with an AI made by humans, with human limitations, for human purposes is not zero. Are those claims supposed to be linked? ?..we don't need to deal with moral philosophy if we are not dealing with WBEs?

3Rob Bensinger10y

the-citizen is replying to this thing I said: Those are just three things we don't necessarily want to be perceived as; they don't necessarily share anything else in common. However, because the second one is pejorative and the first is sometimes treated as pejorative, the-citizen was wondering if I'm anti-moral-philosophy. I replied that highly anthropomorphic AI and moral philosophy are both perfectly good fields of study, and overlap at least a little with MIRI's work; but the typical newcomer is likely to think these are more central to AGI safety work than they are.

0the-citizen10y

For the record, my current position is that if MIRI doesn't think it's central, then it's probably doing it wrong.

0the-citizen10y

But perhaps moral philosophy is important for a FAI? Like for knowing right and wrong so we can teach/build it into the FAI? Understanding right and wrong in some form seems really central to FAI?

2Rob Bensinger10y

There may be questions in moral philosophy that we need to answer in order to build a Friendly AI, but most MIRI-associated people don't think that the bulk of the difficulty of Friendly AI (over generic AGI) is in generating a sufficiently long or sufficiently basic list of intuitively moral English-language sentences. Eliezer thinks the hard part of Friendly AI is stability under self-modification; I've heard other suggestions to the effect that the hard part is logical uncertainty, or identifying how preference and motivation are implemented in human brains. The problems you need to solve in order to convince a hostile human being to become a better person, or to organize a society, or to motivate yourself to do the right thing, aren't necessarily the same as the problems you need to solve to build the brain of a value-conducive agent from scratch.

0the-citizen10y

The stability under self-modification is a core problem of AGI generally, isn't it? So isn't that an effort to solve AGI, not safety/friendliness (which would be fairly depressing given its stated goals)? Does MIRI have a way to define safety/friendliness that isn't derivative of moral philosophy? Additionally, many human preferences are almost certainly not moral... surely a key part of the project would be to find some way to separate the two. Preference satisfaction seems like a potentially very unfriendly goal...

5Rob Bensinger10y

2the-citizen10y

Cheers thanks for the informative reply.

1TheAncientGeek10y

MIRI makes the methodological proposal that it simplifies the issue of friendliness (or morality or safety) to deal with the whole of human value, rather than identifying a morally relevant subset. Having done that, it concludes that human morality is extremely complex. In other words, the payoff in terms of methodological simplification never arrives, for all that MIRI relieves itself of the burden of coming up with a theory of morality. Since dealing with human value in total is in absolute terms very complex, the possibility remains open that identifying the morally relevant subset of values is relatively easier (even if still difficult in absolute terms) than designing an AI to be friendly in terms of the totality of value, particularly since philosophy offers a body of work that seeks to identify simple underlying principles of ethics. The idea of a tractable, rationally discoverable , set of ethical principles is a weaker form of, or lead into, one of the most common objections to the MIRI approach: "Why doesn't the AI figure out morality itself?".

0the-citizen10y

Thanks that's informative. Not entirely sure your own position is from your post, but I agree with what I take your implication to be - that a rationally discoverable set of ethics might not be as sensible notion as it sounds. But on the other hand human preference satisfaction seems a really bad goal - many human preferences in the world are awful - take a desire for power over others for example. Otherwise human society wouldn't have wars, torture, abuse etc etc. I haven't read up on CEV in detail, but from what I've seen it suffers from a confusion that somehow decent preferences are gained simply by obtaining enough knowledge? I'm not fully up to speed here so I'm willing to be corrected. EDIT> Oh... CEV is the main accepted approach at MIRI :-( I assumed it was one of many

0TheAncientGeek10y

That wasn't the point I thought I was making. I thought I was making the point that the idea of tractable sets of moral truths had been sidelined rather than sidestepped...that it had been neglected on the basis of a simplification that has not been delivered. Having said that, I agree that discoverable morality has the potential downside of being inconvenient to, or unfriendly for , humans: the one true morality might be some deep ecology that required a much lower human population, among many other possibilities. That might have been a better argument against discoverable morality ethics than the one actually presented. Most people have a preference for not being the victims of war or torture. Maybe something could be worked up from that. I've seen comments to the effect that to the effect that it has been abandoned. The situation is unclear.

0the-citizen10y

Thanks for reply. That makes more sense to me now. I agree with a fair amount of what you say. I think you'd have a sense from our previous discussions why I favour physicalist approaches to the morals of a FAI, rather than idealist or dualist, regardless of whether physicalism is true or false. So I won't go there. I pretty much agree with the rest. EDIT> Oh just on the deep ecology point, I believe that might be solvable by prioritising species based on genetic similarity to humans. So basically weighting humans highest and other species less so based on relatedness. I certainly wouldn't like to see a FAI adopting the view that people have of "humans are a disease" and other such views, so hopefully we can find a way to avoid that sort of thing.

0TheAncientGeek10y

I think you have an idea from our previous discussions why I don't think you physicalism, etc, is relevant to ethics.

0the-citizen10y

Indeed I do! :-)

0ChristianKl10y

Or simply extremly smart AI's > human minds.

0the-citizen10y

Yes some humans seem to have adopted this view where intelligence moves from being a tool and having instrumental value to being instrinsically/terminally valuable. I find often the justifcation for this to be pretty flimsy, though quite a few people seem to have this view. Let's hope a AGI doesn't lol.

Moderation Log