Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: TheAncientGeek 21 August 2014 10:16:21AM 0 points [-]

Otherwise compared to nothing, or otherwise compared to informal methods?

Are you talking into account that the formal/proveable/unupdateable approach has a drawback in the AI domain that it doesn't have in the non AI domain, namely you lose the potential to tell an AI "stop doing that,it isn't nice"

Comment author: lukeprog 21 August 2014 03:46:04PM 0 points [-]

you lose the potential to tell an AI "stop doing that,it isn't nice"

How so?

Comment author: TheAncientGeek 20 August 2014 09:27:20PM 0 points [-]

Why talk about unupdateable UFs and "solving morality" if you are not going for that approach?

Comment author: lukeprog 20 August 2014 09:54:40PM 1 point [-]

Again, a simplification, but: we want a sufficient guarantee of stably friendly behavior before we risk pushing things past a point of no return. A sufficient guarantee plausibly requires having robust solutions for indirect normatively, stable self-modification, reflectively consistent decision theory, etc. But that doesn't mean we expect to ever have a definite "proof" that system will be stably friendly.

Formal methods work for today's safety-critical software systems never results in a definite proof that a system will be safe, either, but ceteris paribis formal proofs of particular internal properties of the system give you more assurance that the system will behave as intended than you would otherwise have.

Comment author: V_V 20 August 2014 08:02:09PM *  0 points [-]

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

When somebody says they are doing A for reason X, then reason X is criticized and they claim they are actually doing A for reason Y, and they have always been, I tend to be wary.

In this case A is "research on mathematical logic and formal proof systems",
X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"
and Y is "Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control".

If Y is better than X, as it seems to me in this case, this is indeed an improvement, but when you modify your reasons and somehow conclude that your previously chosen course of action is still optimal, then I doubt your judgment.

as far as I can tell, we've never claimed that.

Well... (trigger wa-...)

"And if Novamente should ever cross the finish line, we all die. That is what I believe or I would be working for Ben this instant."
"I intend to plunge into the decision theory of self-modifying decision systems and never look back. (And finish the decision theory and implement it and run the AI, at which point, if all goes well, we Win.)"
"Take metaethics, a solved problem: what are the odds that someone who still thought metaethics was a Deep Mystery could write an AI algorithm that could come up with a correct metaethics? I tried that, you know, and in retrospect it didn’t work."
"Find whatever you’re best at; if that thing that you’re best at is inventing new math[s] of artificial intelligence, then come work for the Singularity Institute. [ ... ] Aside from that, though, I think that saving the human species eventually comes down to, metaphorically speaking, nine people and a brain in a box in a basement, and everything else feeds into that."

Comment author: lukeprog 20 August 2014 09:12:48PM 2 points [-]

X is "self-improving AI is unboxable and untestable, we need to get it provably right on the first try"

But where did somebody from MIRI say "we need to get it provably right on the first try"? Also, what would that even mean? You can't write a formal specification that includes the entire universe and than formally verify an AI against that formal specification. I couldn't find any Yudkowsky quotes about "getting it provably right on the first try" at the link you provided.

Comment author: V_V 20 August 2014 08:12:20AM *  0 points [-]

And yet, all the publicly known MIRI research seems to be devoted to formal proof systems, not to testing, "boxing", fail-safe mechanisms, defense in depth, probabilistic failure analysis, and so on.

Motte and bailey?

Comment author: lukeprog 20 August 2014 05:15:27PM *  1 point [-]

This paragraph is a simplification rather than the whole story, but: Our research tends to be focused on mathematical logic and proof systems these days because those are expressive frameworks with which to build toy models that can give researchers some general insight into the shape of the novel problems of AGI control. Methods like testing and probabilistic failure analysis require more knowledge of the target system than we now have for AGI.

And we do try to be clear about the role that proof plays in our research. E.g. see the tiling agents LW post:

The paper uses first-order logic (FOL) because FOL has a lot of useful standard machinery for reflection which we can then invoke; in real life, FOL is of course a poor representational fit to most real-world environments outside a human-constructed computer chip with thermodynamically expensive crisp variable states.

As further background, the idea that something-like-proof might be relevant to Friendly AI is not about achieving some chimera of absolute safety-feeling, but rather about the idea that the total probability of catastrophic failure should not have a significant conditionally independent component on each self-modification, and that self-modification will (at least in initial stages) take place within the highly deterministic environment of a computer chip. This means that statistical testing methods (e.g. an evolutionary algorithm's evaluation of average fitness on a set of test problems) are not suitable for self-modifications which can potentially induce catastrophic failure (e.g. of parts of code that can affect the representation or interpretation of the goals).

And later, in an Eliezer comment:

Reply to: "My previous understanding had been that MIRI staff think that by default, one should expect to need to solve the Lob problem in order to build a Friendly AI."

By default, if you can build a Friendly AI you were not troubled by the Lob problem. That working on the Lob Problem gets you closer to being able to build FAI is neither obvious nor certain (perhaps it is shallow to work on directly, and those who can build AI resolve it as a side effect of doing something else) but everything has to start somewhere. Being able to state crisp difficulties to work on is itself rare and valuable, and the more you engage with a problem like stable self-modification, the more you end up knowing about it. Engagement in a form where you can figure out whether or not your proof goes through is more valuable than engagement in the form of pure verbal arguments and intuition, although the latter is significantly more valuable than not thinking about something at all.

My guess is that people hear the words "proof" and "Friendliness" in the same sentence but (quite understandably!) don't take time to read the actual papers, and end up with the impression that MIRI is working on "provably Friendly AI" even though, as far as I can tell, we've never claimed that.

Comment author: DanielLC 19 August 2014 04:02:21PM *  1 point [-]

You admit that friendliness is not guaranteed. That means that you're not wrong, which is a good sign, but it doesn't fix the problem that friendliness isn't guaranteed. You have as many tries as you want for intelligence, but only one for friendliness. How do you expect to manage it in the first try?

It also doesn't seem to be clear to me that this is the best strategy. In order to get that provably friendly thing to work, you have to deal with an explicit, unchanging utility function, which means that friendliness has to be right from the beginning. If you deal with an implicit utility function that will change as the AI comes to understand itself better, you could program an AI to recognise pictures of smiles, then let it learn that the smiles correspond to happy humans and update its utility function accordingly, until it (hopefully) decides on "do what we mean".

It seems to me that part of the friendliness proof would require proving that the AI will follow its explicit utility function. This would be impossible. The AI is not capable of perfect solomonoff induction, and will alway have some bias, no matter how small. This means that its implicit utility function will never quite match its explicit utility function. Am I missing something here?

Comment author: lukeprog 19 August 2014 04:35:45PM 2 points [-]

You admit that friendliness is guaranteed.

Typo?

In order to get that provably friendly thing to work

Again, I think "provably friendly thing" mischaracterizes what MIRI thinks will be possible.

I'm not sure exactly what you're saying in the rest of your comment. Have you read the section on indirect normativity in Superintelligence? I'd start there.

Comment author: DanielLC 19 August 2014 04:14:35AM 2 points [-]

MIRI intends to make an AI that is provably friendly. This would require having a formal definition of friendliness that means exactly what it's supposed to mean, and then proving it. Either of those steps seems highly unlikely to be completed without error.

Comment author: lukeprog 19 August 2014 05:10:19AM 13 points [-]

MIRI intends to make an AI that is provably friendly.

I really wish people would stop repeating this claim. Mathematical Proofs Improve But Don’t Guarantee Security, Safety, and Friendliness.

Comment author: shminux 18 August 2014 06:36:56AM 3 points [-]

It's much more likely that I misunderstand something basic about what MIRI does.

Comment author: lukeprog 18 August 2014 06:44:51AM 5 points [-]

Okay, fair enough. To explain briefly:

I disagree with (3) because the Lobian obstacle is just an obstacle to a certain kind of stable self-modification in a particular toy model, and can't say anything about what kinds of safety guarantees you can have for superintelligences in general.

I disagree with (4) because MIRI hasn't shown that there are ways to make a superintelligence 90% or more likely (in a subjective Bayesian sense) to be stably friendly, and I don't expect us to have shown that in another 20 years, and plausibly not ever.

Comment author: satt 18 August 2014 02:04:14AM *  2 points [-]

everyone makes a sacrifice to optimize for a zero-sum competition, ends up with the same relative status, but worse absolute status.

I'm a bit surprised I haven't seen this particular incentives problem named in the academic literature.

Robert H. Frank has called it a "positional arms race". In a relatively recent article on higher education he gives this summary:

Participants in virtually all winner-take-all markets face strong incentives to invest in performance enhancement, thereby to increase their chances of coming out ahead. As in the classic military arms race, however, many such investments prove mutually offsetting in the end. When each nation spends more on bombs, the balance of power is no different than if none had spent more. Yet that fact alone provides no escape for individual participants. Countries may find it burdensome to spend a lot on bombs, but the alternative—to be less well-armed than their rivals—is even worse.

In light of the growing importance of rank in the education marketplace, universities face increasing pressure to bid for the various resources that facilitate the quest for high rank. These pressures have spawned a positional arms race that already has proved extremely costly, and promises to become more so.

Comment author: lukeprog 18 August 2014 03:16:50AM 1 point [-]

Cool, thanks for the pointer!

Comment author: Quinn 11 August 2014 02:50:06AM 13 points [-]

Because the length of Scott's Moloch post greatly exceeds my working memory (to the extent that I had trouble remembering what the point was by the end) I made these notes. I hope this is the right place to share them.

Notes on Moloch (ancient god of child sacrifice)

http://slatestarcodex.com/2014/07/30/meditations-on-moloch/

  1. Intro - no real content.

  2. Moloch as coordination failure: everyone makes a sacrifice to optimize for a zero-sum competition, ends up with the same relative status, but worse absolute status.

    • 10 examples: Prisoner's Dilemma, dollar auctions, fish-farming story (tragedy of the commons), Malthusian trap, ruthless/exploitative Capitalist markets, the two-income trap, agriculture, arms races, cancer, political race to the bottom (lowering taxes to attract business)
    • 4 partial examples: inefficient education, inefficient science, government corruption (corporate welfare), Congress (representatives voting against good of nation for good of constituency)
  3. Existing systems are created by incentive structures, not agents, e.g. Las Vegas caused by a known bias in human reward circuitry, not optimization for human values.

  4. But sometimes we move uphill anyway. Possible explanations:

    • Excess resources / we are in the dream time and can afford non-competitive behavior.
    • Physical limitations to what can be sacrificed
    • Economic competition actually producing positive utility for consumers (but this is fragile)
    • Coordination, e.g. via governments, guilds, friendships, etc.
  5. Technology/ingenuity creates new opportunities to fall into such traps. Technology overcomes physical limitations, consumes excess resources. Automation further decouples economic activity from human values. Technology can improve coordination, but can also exacerbate existing conflicts by giving all sides more power.

AGI opens up whole new worlds of traps: Yudkowsky's paperclipper, Hanson's subsistence-level ems, Bostrom's Disneyland with no children.

6 & 7. Gnon - basically the god of the conservative scarcity mindset. Nick Land advocates compliance; Nyan wants to capture Gnon and build a walled garden. Scott warns that Moloch is far more terrifying than Gnon and will kill both of them anyway.

8 & 9. So we have to kill this Moloch guy, by lifting a better God to Heaven (Elua).

Comment author: lukeprog 17 August 2014 03:45:52PM 3 points [-]

everyone makes a sacrifice to optimize for a zero-sum competition, ends up with the same relative status, but worse absolute status.

I'm a bit surprised I haven't seen this particular incentives problem named in the academic literature. It is related in different ways to economic concepts like tragedy of the commons, social trap, tyranny of small decisions, and information asymmetry, but isn't identical with or fully captured by any of them.

Comment author: Dr_Manhattan 04 August 2014 04:44:39PM *  3 points [-]

[My original question: Ok, registered with Stellar, they only gave 500 units. How much can I send to MIRI and still get a full refund?]

Answer: Ok, Stellar added 400 for me giving them email, then allowed me to send 800 to miri, receiving 800 back.

Update: Stellar only gave me 100 back.

So:

Got 500 for sign-up + FB login

Got 400 for email

Gave 800 to miri expecting 800 back per Stellar suggestion

Only got 100 back (guess bug report is in order?)

Final acct. 200 S.

Comment author: lukeprog 16 August 2014 09:30:25PM 2 points [-]

Thanks for your stellar donation!

View more: Next