Comment author: ike 28 January 2015 04:52:00AM *  3 points [-]

So if you were trying to maximise total points, wouldn't it be best to never let it out because you lose a lot more if it destroys the world than you gain from getting solutions?

What values for points make it rational to let the AI out, and is it also rational in the real-world analogue?

Comment author: Nepene 30 January 2015 08:11:32PM 0 points [-]

If you predict that there's a 20% chance of the AI destroying the world and an 80% chance of global warming destroying the world and there's a 100% chance the AI will stop global warming if released and unmolested then you are better off releasing the AI.

Or you can just give a person 6 points for achieving their goal and -20 points for releasing the AI. Even though the person knows rationally that the AI could destroy the world points matter more than that, and that strongly encourages people to try negotiating with the AI.

Comment author: passive_fist 28 January 2015 03:05:17AM 12 points [-]

There's a whole art dedicated to convincing people to do something they wouldn't do otherwise: sales. The AI box is no different from a sales pitch, except most people who have attempted doing it so far (at least on LW) weren't salesmen and thus weren't very effective. I'm pretty sure a seasoned salesperson could get very high success rates.

One thing that can't be overstated is the important of knowing the psychology of the gatekeeper. Real salespeople try to get to know their victims (and I'm deliberately using the word victim here). Are they motivated by money, sex, desire to get back with their girlfriend, etc.? It's important to get your victim talking so they reveal their own inner selves. There's many ways to exploit this, such as sharing some bit of 'personal' information about yourself so they reveal something personal about themselves in return. It gives you some more information to work on and it also builds 'trust' (at least, their trust in you).

An effective sales pitch has a hook (e.g. "I can cure disease forever" or "I can bring back your dead husband"), a demonstration of value (something designed to make them think you really can deliver on your promise - you have to be creative here) and then a 'pullback' so they think they're at risk of losing the deal if they don't act quickly. Then, finally, a close.

With all this said, though, the AI box experiment we play on LW is not a good demonstration of what would happen with an actual AI. It's heavily biased in favor of failing. Consider that in a real AI box scenario, there would have been a very good reason for developing the AI in the first place, and thus there would be a strong incentive to let it out. Also, pulling the plug would represent a huge loss of investment.

Comment author: Nepene 28 January 2015 04:47:02AM 3 points [-]

I've played the AI box game on other forums. We designed a system to incentivise release of the AI. We rolled randomly the ethics of the AI, rolled random events with dice and the AI offered various solutions to those problems. A certain number of accepted solutions would enable the AI to free itself. You lost points if you failed to deal with the problems and lost lots of points if you freed the AI and they happened to have goals you disagreed with like annihilation of everything.

Psychology was very important in those, as you said. Different people have very different values and to appeal to each person you have to know their values.

Comment author: cousin_it 10 January 2015 10:21:18AM *  16 points [-]

I also had a weird reaction to your post, like emr and someonewrongonthenet. Personally, I feel that it's healthy to work as an assistant to someone (and stop thinking about work when you leave the office at 6pm), but it's unhealthy to be the assistant of someone (and treat them as a fantasy hero 24/7 and possibly sleep with them). Yay professionalism and work/life balance, boo medieval loyalties and imagined life narratives!

That's also the advice I often give to programmers, to think of themselves as working for a company (in exchange for money) rather than at a company (as part of a common cause). That advice makes some stressful situations and conflicts just magically disappear.

You could say that a world of inherently equal professionals exchanging services, without PCs or NPCs, is too barren to many people. Some people actually want to feel like heroes, and others want to feel like sidekicks. Who am I to deny them that roleplay? Well, some people also want to fit in the "warrior" role, being fiercely loyal to their group and attacking outsiders. We have all kinds of ancient tribal instincts, which are amplified by reading fantasy and bad (hero-based) sci-fi. I feel that such instincts are usually harmful in the long run, although they seem to make sense in the moment.

Comment author: Nepene 13 January 2015 12:35:59AM 0 points [-]

I am from Britain and I can say with experience that working for a company in exchange for money is not an effective way to avoid 24/7 sleep with the hero situations. I know quite a few people who have a poor work life balance because they are working for a company and have more stressful situations and conflicts. I've seen people work themselves to depression, divorce, and death thanks to my involvement with the very toxic British banking culture.

Your avoidance of such things dependends on the independent variable of how assertive you are at managing your work/life balance and how good your goal setting is. It's quite easy to overwork yourself for money. Wanting to be a sidekick or a hero or a equal professional doesn't increase or decrease your skill at maintaining a work life balance or your goal setting skills any more than it increases your physical strength or intellect.

Comment author: Capla 09 December 2014 07:12:02PM *  2 points [-]

Some beliefs may be less important to you, and worthy of being sacrificed for the greater good. If you say, believe that forcing people to wear suits is immoral and that veganism is immoral then it may be worth you sacrificing your belief in the unethical nature of suits so you can better stop people eating animals.

No. I will make concessions about which beliefs to act on in order to optimize for "Goodness", but I'm highly concerned about sacrificing beliefs about the world themselves. Doing this may be beneficial in specific situation, but at a cost to your overall effectiveness in other situations across domains. Since the range of possible situations that you might find yourself in is infinite, there is no way to know whether you've made a change to your model with catastrophic consequences down the line. Furthermore, we evaluate the effectiveness of strategies on the basis of the model we have, so every time your model becomes less accurate, your estimate of what is the best option in a given situation becomes less accurate. (Note that your confidence in your estimate may rise, fall, or stay the same, but I would doubt that having a less accurate model is going to lead to better credence calibration)

Allowing your beliefs to change for any reason other than to better reflect the world, only serves to make you worse at knowing how best to deal with the world.

Now, changing your values - that's another story.

Comment author: Nepene 13 December 2014 02:20:32AM 1 point [-]

You can easily model beliefs and work out if they're likely to have good or bad results. They could theoretically have a variety of infinite impacts, but most probably have a fairly small and limited effect. Humans have lots of beliefs, they can't all have a major impact.

For the catastrophic consequences issue, have you read this?

http://lesswrong.com/lw/ase/schelling_fences_on_slippery_slopes/

The slippery slope issue of potentially catastrophic consequences from a model can be limited by establishing arbitrary lines before hand that you refuse to cross. Whether you should sacrifice your beliefs, like with Gandhi, depends on what the value given for said sacrifice is, how valuable your sacrifice is to your models, and what the likelihood of catastrophic failure is. You can swear an oath not to cross those lines, give valuable possessions to people to destroy if you cross those lines so you can heavily limit the chance of catastrophic failure.

Allowing your beliefs to change for any reason other than to better reflect the world, only serves to make you worse at knowing how best to deal with the world.

Yeah, your success rate drops, but your ability to socialize can rise since irrational beliefs are how many think. If your irrational beliefs are of low importance, not likely to cause major issues, and unlikely to cause catastrophic failure they could be helpful.

Comment author: Nepene 08 December 2014 03:05:13PM 1 point [-]

I think some sort of debating or arguing class would be very helpful. People should have good reasons for why they do things and this applies to most topics.

Some sort of thing where the topics debated would be marked by how well you cited facts or how clear your chains of reasoning were. So if you were asked to discuss why on food was better than another you should have some process like looking up the answer in a text book or looking up stuff from science websites online, interpreting them correctly, and presenting the truth. Lots of fanfare and marks should be awarded for doing this process successfully so people are trained to see this process as valuable. Lots of effort should be made to make sure people look for good advice sources.

On novel questions, reliable processes like asking a bunch of people their opinions, doing some tests, and presenting those results should be suggested.

For specific knowledge fields clear aid should be provided for these topics- finances. Maths should deal with this a lot. Physical health. Biology should deal with this a lot, in a more comprehensive manner than it does now- I've often seen them address illnesses, but rarely seen science curriculems address good health practices. Sexual health and relationship good practice should be addressed a lot more widely. Those are issues everyone is likely to face and everyone should have a solid foundation of knowledge to aid them with rather than a slapsash compilation of things from random people and the internet.

Comment author: ete 28 November 2014 01:30:38PM *  5 points [-]

True, perhaps I should have been more clear in my dealing with the two, and explained how I think the they can blur across unintentionally. I do think being selective with signals can be instrumentally effective, but I think it's important to be intentionally aware when you're doing that and not allow your current mask to bleed over and influence your true beliefs unduly.

Essentially I'd like this post to come with a "Do this sometimes, but be careful and mindful of the possible changes to your beliefs caused by signaling as if you have different beliefs." warning.

Comment author: Nepene 30 November 2014 04:34:24AM 2 points [-]

There is a definite likelihood that acting out a belief will cause you to believe it due to your brain poorly distinguishing signalling and true beliefs.

That can be advantageous at times. Some beliefs may be less important to you, and worthy of being sacrificed for the greater good. If you say, believe that forcing people to wear suits is immoral and that veganism is immoral then it may be worth you sacrificing your belief in the unethical nature of suits so you can better stop people eating animals.

A willingness to do this is beneficial in most people who want to join organizations. They normally have a set of arbitrary rules on social conduct, dress, who to respect and who to respect less, how to deal with sickness and weakness, what media to watch, who to escalate issues to in the event of a conflict. If you don't do this you'll find it tricky gaining much power because people can spot people who fake these things.

Comment author: So8res 25 October 2014 07:44:49PM *  3 points [-]

Oh, I've spent my fair share of time around D&D 2nd ed, and I'm well acquainted with munchkining/minmaxing. However, D&D is an environment were the narrative is one of balance and tradeoffs.

For example, notice how it's OK for one class to be stronger at low levels and another class to be stronger at high levels, but how people would be pissed off if one class was stronger at all levels. This is the "narrative of balance" that I'm talking about: people think it's OK for there to be tradeoffs (e.g. early vs late dominance), but pure dominance is considered a bug and not a feature.

(I'm not bashing this generically; balance is a fine feature for many games. But I'd appreciate games where there is a narrative of exploitation rather than a narrative of balance.)

Comment author: Nepene 28 October 2014 01:08:51PM 1 point [-]

D&D has often had issues with magic users. They often are stronger than non magic users at all levels. For example, use of the spell sleep allows you to disable a group of enemies with no save allowed. Exploitation is common.

In games you can generally gain a huge amount of power by researching the right choices and doing them.

In the real world that's a lot trickier because people in the past have researched the right choices and heavily exploited and monopolized existing power resources, and any publicly known power resources will likely be heavily exploited. Competition makes it harder than when you're playing with three or four people.