Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: lukeprog 31 August 2014 01:57:54PM 0 points [-]

Great, thanks!

Comment author: Sean_o_h 30 August 2014 03:16:57PM *  13 points [-]

Hi,

I'd be interested on LW's thoughts on this. I was quite involved in the piece, though I suggested to the journalist it would be more appropriate to focus on the high-profile names involved. We've been lucky at FHI/Cambridge with a series of very sophisticated tech-savvy journalists with whom the inferential distance has been very low (see e.g. Ross Andersen's Aeon/Atlantic pieces); this wasn't the case here, and although the journalist was conscientious and requested reading material beforehand, I found that communicating on these concepts more difficult than expected.

In my view the interview material turned out better than expected, given the clear inferential gap. I am less happy with the 'catastrophic scenarios'' which I was asked for. The text I sent (which I circulated to FHI/CSER members) was distinctly less sensational, and contained a lot more qualifiers. E.g. for geoengineering I had: "Scientific consensus is against adopting it without in depth study and broader societal involvement in the decisions made, but there may be very strong pressure to adopt once the impacts of climate change become more severe." and my pathogen modification example did not go nearly as far. While qualifiers can seem like unnecessary padding to editors, it can really change the tone of a piece. Similarly, in a pre-emptive line to ward off sensationalism, I included "I hope you can make it clear these are "worst case possibilities that currently appear worthy of study" rather than "high-likelihood events". Each of these may only have e.g. a 1% likelihood of occurring. But in the same way an aeroplane passenger shouldn't accept a 1% possibility of a crash, society should not accept a 1% possibility of catastrophe. I see our role as (like airline safety analysts) figuring out which risks are plausible, and for those, working to reduce the 1% to 0.00001%"; this was sort-of-addressed, but not really.

That said, the basic premises - that a virus could be modified for greater infectivity and released by a malicious actor, 'termination risk' for atmospheric aerosol geoengineering, future capabilities of additive manufacturing for more dangerous weapons - are intact.

Re: 'paperclip maximiser'. I mentioned this briefly in conversation, after we'd struggled for a while with inferential gaps on AI (and why we couldn't just outsmart something smarter than us, etc), presenting it as a 'toy example' used in research papers on AI goals, meant to encapsulate the idea that seemingly harmless or trivial but poorly thought through goals can result in unforseen and catastrophic consequences when paired with the kind of advanced resource utilisation and problem-solving ability a future AI might have. I didn't expect it it to be taken as a literal doomsday concern - and it wasn't in the text I sent - and to my mind it looks very silly in there, possibly deliberately so. However, I feel that Huw and Jaan's explanations were very good, and quite well-presented..

We've been considering whether we should limit ourselves to media opportunities where we can write the material ourselves, or have the opportunity to view and edit the final material before publishing. MIRI has significantly cut back on its media engagement, and this seems on the whole sensible (FHI's still doing a lot, some turns out very good, some not so good).

Lesson to take away: 1) this stuff can be really, really hard. 2) Getting used to v sophisticated, science/tech-savvy journalists and academics can leave you unprepared. 3) Things that are v reasonable with qualifies can become v unreasonable if you remove the qualifiers - and editors often just see the qualifiers as unnecessary verbosity (or want the piece to have stronger, more senational claims)

Right now, I'm leaning fairly strongly towards 'ignore and let quietly slip away' (the guardian has a small UK readership, so how much we 'push' this will probably make a difference), but I'd be interested in whether LW sees this as net positive or net negative on balance for existential risk in the public. However, I'm open to updating. I asked a couple of friends unfamiliar with the area what their take away impression was, and it was more positive than I'd anticipated.

Comment author: lukeprog 30 August 2014 08:34:21PM 1 point [-]

Is the paper mentioned and apparently quoted in the Guardian article, which Dasgupta described as "somewhat informal," available anywhere?

Comment author: lukeprog 26 August 2014 04:27:44AM *  15 points [-]

I think it makes sense to keep the term "Friendly AI" for "stably self-modifying AGI which optimizes for humane values."

Narrow AI safety work — for extant systems and near-term future systems — goes by many different names. We (at MIRI) have touched on that work in several posts and interviews, e.g.:

Comment author: lukeprog 25 August 2014 11:41:39PM 9 points [-]

I won't be able to put much effort into the new forum, probably, but I'll express a few hopes for it anyway.

I hope the "Well-kept gardens die by pacifism" warning will be taken very seriously at the EA forum.

I hope the forum's moderators will take care to squash unproductive and divisive conversations about race, gender, social justice, etc., which seem have been invading and hindering nearby communities like the atheism/secularism world and the rationality world.

I hope the forum's participants end up discussing a well-balanced set of topics in EA, so that e.g. we don't end up with 10% of conversations being about AGI risk mitigation while 1% of conversations are about policy interventions.

Comment author: lukeprog 24 August 2014 06:06:49AM 0 points [-]

Nice!

Comment author: Wei_Dai 18 August 2014 04:55:38AM 3 points [-]

See also, positional good, which pre-dates Robert Frank's "positional arms race".

Comment author: lukeprog 24 August 2014 03:29:10AM 0 points [-]
Comment author: TheAncientGeek 21 August 2014 10:16:21AM 0 points [-]

Otherwise compared to nothing, or otherwise compared to informal methods?

Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn't have in the non AI domain, namely you lose the potential to tell an AI "stop doing that,it isn't nice"

Comment author: lukeprog 21 August 2014 03:46:04PM 0 points [-]

you lose the potential to tell an AI "stop doing that,it isn't nice"

How so?

Comment author: TheAncientGeek 20 August 2014 09:27:20PM 0 points [-]

Why talk about unupdateable UFs and "solving morality" if you are not going for that approach?

Comment author: lukeprog 20 August 2014 09:54:40PM 2 points [-]

Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn't mean we expect to ever have a definite "proof" that system will be stably friendly.

Formal methods work for today's safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.

Comment author: V_V 20 August 2014 08:02:09PM *  0 points [-]

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

When somebody says they are doing A for reason X, then reason X is criticized and they claim they are actually doing A for reason Y, and they have always been, I tend to be wary.

In this case A is "research on mathematical logic and formal proof systems",
X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"
and Y is "Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control".

If Y is better than X, as it seems to me in this case, this is indeed an improvement, but when you modify your reasons and somehow conclude that your previously chosen course of action is still optimal, then I doubt your judgment.

as far as I can tell, we've never claimed that.

Well... (trigger wa-...)

"And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant."
"I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)"
"Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work."
"Find whatever you’re best at; if that thing that you’re best at is inventing new math[s] of artificial intelligence, then come work for the Singularity Institute. [ ... ] Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that."

Comment author: lukeprog 20 August 2014 09:12:48PM 2 points [-]

X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"

But where did somebody from MIRI say "we need to get it provably right on the first try"? Also, what would that even mean? You can't write a formal specification that includes the entire universe and than formally verify an AI against that formal specification. I couldn't find any Yudkowsky quotes about "getting it provably right on the first try" at the link you provided.

Comment author: V_V 20 August 2014 08:12:20AM *  0 points [-]

And yet, all the publicly known MIRI research seems to be devoted to formal proof systems, not to testing, "boxing", fail-safe mechanisms, defense in depth, probabilistic failure analysis, and so on.

Motte and bailey?

Comment author: lukeprog 20 August 2014 05:15:27PM *  2 points [-]

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

And we do try to be clear about the role that proof plays in our research. E.g. see the tiling agents LW post:

The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

And later, in an Eliezer comment:

Reply to: "My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI."

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

My guess is that people hear the words "proof" and "Friendliness" in the same sentence but (quite understandably!) don't take time to read the actual papers, and end up with the impression that MIRI is working on "provably Friendly AI" even though, as far as I can tell, we've never claimed that.

View more: Next