Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: turchin 26 June 2017 09:50:47PM *  1 point [-]

Thanks for interesting post. I think that there are two types of self-modification. In the first, an agent is working on lower level parts of itself, for example, by adding hardware or connecting modules. It produces evolutionary development with small returns and is relatively safe.

Another type is high-level self-modification, where the second agent is created, as you describe. Its performance should be mathematically proved (that is difficult) or tested in many simulated environments (which is also risky, as a superior agent will be able to break through it.) We could call it a revolutionary way of self-improvement. Such self-modification will provide higher returns if successful.

Knowing all this, most agents will prefer evolutionary development, that is gaining the same power by lower-level changes. But risk-hungry agents will still prefer revolutionary methods, in case if they are time constrained.

Early stage AI will be time constrained by arms race with other (possible) AIs, and it will prefer risky revolutionary ways of development, even if its probability of failure will be very high.

(It was TL;DR of my text "Levels of self-improvement".)

Comment author: dogiv 28 June 2017 08:22:55PM 1 point [-]

Thanks, that's an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources--simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.

Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.

Comment author: cousin_it 27 June 2017 03:45:54PM *  1 point [-]

Yeah, Schelling's "Strategy of Conflict" deals with many of the same topics.

A: "I would have an advantage in war so I demand a bigger share now" B: "Prove it" A: "Giving you the info would squander my advantage" B: "Let's agree on a procedure to check the info, and I precommit to giving you a bigger share if the check succeeds" A: "Cool"

Comment author: dogiv 28 June 2017 07:53:08PM 0 points [-]

If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.

Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.

Comment author: dogiv 26 June 2017 01:37:27AM *  5 points [-]

I've read a couple of Lou Keep's essays in this series and I find his writing style very off-putting. It seems like there's a deep idea about society and social-economic structures buried in there, but it's obscured by a hodgepodge of thesis-antithesis and vague self-reference.

As best I can tell, his point is that irrational beliefs like belief in magic (specifically, protection from bullets) can be useful for a community (by encouraging everyone to resist attackers together) even though it is not beneficial to the individual (since it doesn't prevent death when shot). He relates this to Seeing Like A State, in that any attempt by the state to increase legibility by clarifying the benefits makes them disappear.

He further points out that political and economic policies tend to focus on measurable effects, whereas the ultimate point of governments and economies is to improve the subjective wellbeing of people (happiness, although he says that's just a stand-in for something else he doesn't feel like explaining).

Extending that, he thinks we have probably lost some key cultural traditions that were very important to the quality of people's lives, but weren't able to thrive in a modern economic setting. He doesn't give any examples of that, although he mentions marriages and funerals as examples of traditions that have survived. Still, it seems plausible.

Overall, it reminds me of Scott Alexander's essay How the West was Won, about the advance of universalist (capitalist) culture and its ability to out-compete traditional systems whether or not it actually improves people's lives. Moloch is also relevant.

It's very likely I've missed a key aspect here. If anyone knows what it is, please let me know.

Comment author: Screwtape 20 June 2017 04:09:15AM 2 points [-]

There have been a couple of community building projects put forward that got me thinking about this, and then over in the post about ways to make the community better it was suggested that some people might want to get to know other lesswrongers through D&D*. I love that idea. Tabletop RPGs are the fastest way I know of to build a connection with someone that doesn't leave scars. While the concept of an 'expert' in those games is sort of goofy, I figure I've got plenty of experience and interest in them to run something or organize a LessWrong RPG group. I haven't been terribly active around LessWrong itself, but I'm the guy who ran the Dungeons and Discourse game with the boring machine about a year ago and I'm also the guy who infodumped about Exalted at the NY Solstice afterparty if you wanted nerd credentials.

Who would want to play?

*D&D is used here as a generic name for these sorts of games, sort of like how if I ask for a Kleenex I mean any sort of disposable thing I can sneeze in. I've got a bunch of systems and genres I could grab from, depending on what people thought sounded cool.

Comment author: dogiv 23 June 2017 03:50:09PM 0 points [-]

Are you talking about a local game in NY or a correspondence thing?

Comment author: turchin 22 June 2017 02:15:32PM 2 points [-]

Wireheading bomb – a putative mechanism to stop dangerous AI.

If a reward function of an AI is presented openly in its source code, any attempt to self-improve by AI will result in its own immediate wireheading, as when it reaches its own source code, it will become able to modify it in order to get maximum reward. So we could create an AI architecture in the way that as soon as it gets access to its own source code, it stops, and use it as a way of reaching passive safety and self-limited self-improving capacity.

We also could do exactly opposite, and put a reward function into the remote impenetrable cryptographically protected box, so AI will not be able to wirehead itself in an unexpected moment. However, we could keep some solutions of the reward functions as a string of numbers, which are equal to maximum reward for this black box. If AI finds this string of numbers, it will reach its maximum utility and stop. The benefit of this stop switch is that AI will not be against it, as it would give it infinite reward. So it would actively cooperate in an attempt to stop it, if it will know that such stop-code exists.

Comment author: dogiv 23 June 2017 02:29:17PM 0 points [-]

I like the first idea. But can we really guarantee that after changing its source code to give itself maximum utility, it will stop all other actions? If it has access to its own source code, what ensures that its utility is "maximum" when it can change the limit arbitrarily? And if all possible actions have the same expected utility, an optimizer could output any solution--"no action" would be the trivial one but it's not the only one.

An AI that has achieved all of its goals might still be dangerous, since it would presumably lose all high-level executive function (its optimization behavior) but have no incentive to turn off any sub-programs that are still running.

Both proposals have the possible failure mode that the AI will discover or guess that this mechanism exists, and then it will only care about making sure it gets activated--which might mean doing bad enough things that humans are forced to open the box and shut it down.

Comment author: Viliam 22 June 2017 09:38:32AM *  0 points [-]

The idea is not to ignore "social games" completely, but rather that some people - specifically, upper-class people - are in a risk of going too far, and seeing the world consisting of "social games" only. Mostly because they are liberated from forces that make lower classes play the "games with nature", such as having to bake your bread or having to keep a job.

Yes, division of labor is a good thing. Problem is, with any division, you need some kind of coordination: whether a person, or an impersonal market. But when you successfully do the revolution, you may kill the competent people and make the market illegal. Then, there may be many people who know how to grow grain and bake bread, but some activities necessary for this process may be made illegal and punished by death. The result is shortage of bread.

The king does not have to know how to make bread, but should not be so insane that he prevents anyone in his kingdom from making bread. And believing e.g. that "objective reality does not exist and everything is socially constructed" seems like a royal road to insanity; but at the same time it is easy to imagine how a person who only ever plays "social games" might find that credible.

Comment author: dogiv 22 June 2017 02:06:42PM 1 point [-]

It seems like the ideal leisure activities, then, should combine the social games with games against nature. Sports do this to some extent, but the "game against nature" part is mostly physical rather than intellectual.

Maybe we could improve on that. I'm envisioning some sort of combination of programming and lacrosse, where the field reconfigures itself according to the players' instructions with a 10-second delay...

But more realistically, certain sports are more strategic and intellectual than others. I've seen both tennis and fencing mentioned as sports that involve quick strategic thinking and predicting your opponent, although they lack the team element that lets you build coordination skills. Maybe some kind of group fencing would be good... or doubles tennis?

Comment author: turchin 22 June 2017 10:56:07AM 0 points [-]

Now existing AI systems are very well in winning in war-like strategic games, like chess and Go, and already reached superhuman performance in them. Military strategic planning and geopolitics could be seen as such a game, and AI able to win in it seems imaginable even on current capabilities.

I also agree that self-improving AI may choose not create its new version, because of the difficulty to solve aligning problem on the new level. In that case it would choose evolutionary development path, which means slower capability gain. I wrote a draft of a paper about levels of self-improvement, where I look in such obstacles in details. I а you are interested, I could share it with you.

Comment author: dogiv 22 June 2017 01:46:09PM 1 point [-]

AI is good at well-defined strategy games, but (so far) bad at understanding and integrating real-world constraints. I suspect that there are already significant efforts to use narrow AI to help humans with strategic planning, but that these remain secret. For an AGI to defeat that sort of human-computer combination would require considerably superhuman capabilities, which means without an intelligence explosion it would take a great deal of time and resources.

Comment author: Lumifer 21 June 2017 04:27:18PM 0 points [-]

So being served a cup of coffee and being served a cup of pure capsaicin are "adjacent in design space"? Maybe, but funny how that problem doesn't arise or even worry anyone...

Comment author: dogiv 21 June 2017 07:37:58PM 4 points [-]

More like driving to the store and driving into the brick wall of the store are adjacent in design space.

Comment author: cousin_it 21 June 2017 04:32:41PM *  1 point [-]

Yeah. One sign of asymmetry is that creating two universes, one filled with pleasure and the other filled with pain, feels strongly negative rather than symmetric to us. Another sign is that pain is an internal experience, while our values might refer to the external world (though it's very murky), so the former might be much easier to achieve. Another sign is that in our world it's much easier to create a life filled with pain than a life that fulfills human values.

Comment author: dogiv 21 June 2017 07:35:22PM 1 point [-]

Yes, many people intuitively feel that a universe of pleasure and a universe of pain add to a net negative. But I suspect that's just a result of experiencing (and avoiding) lots of sources of extreme pain in our lives, while sources of pleasure tend to be diffuse and relatively rare. The human experience of pleasure is conjunctive because in order to survive and reproduce you must fairly reliably avoid all types of extreme pain. But in a pleasure-maximizing environment, removing pain will be a given.

It's also true that our brains tend to adapt to pleasure over time, but that seems simple to modify once physiological constraints are removed.

Comment author: cousin_it 21 June 2017 12:52:38PM *  2 points [-]

The argument somehow came to my mind yesterday, and I'm not sure it's true either. But do you really think human value might be as easy to maximize as pleasure or pain? Pain is only about internal states, and human value seems to be partly about external states, so it should be way more expensive.

Comment author: dogiv 21 June 2017 01:18:09PM 0 points [-]

Human disutility includes more than just pain too. Destruction of the humanity (the flat plain you describe) carries a great deal of negative utility for me, even if I disappear without feeling any pain at all. There's more disutility if all life is destroyed, and more if the universe as a whole is destroyed... I don't think there's any fundamental asymmetry. Pain and pleasure are the most immediate ways of affecting value, and probably the ones that can be achieved most efficiently in computronium, so external states probably don't come into play much at all if you take a purely utilitarian view.

View more: Next