Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: thomblake 12 April 2012 02:47:11PM 1 point [-]

Do I have this correct as a type of belief in belief?

Pretty much. Though it might just be a case of urges not lining up with goals.

In both cases, you profess "I should floss every day" and do not actually floss every day. If it's belief in belief, you might not even acknowledge the incongruence. If it's merely akrasia, you almost certainly will.

In response to comment by thomblake on Belief in Belief
Comment author: rkyeun 06 February 2017 03:59:57AM 0 points [-]

It can be even simpler than that. You can sincerely desire to change such that you floss every day, and express that desire with your mouth, "I should floss every day," and yet find yourself unable to physically establish the new habit in your routine. You know you should, and yet you have human failings that prevent you from achieving what you want. And yet, if you had a button that said "Edit my mind such that I am compelled to floss daily as part of my morning routine unless interrupted by serious emergency and not simply by mere inconvenience or forgetfulness," they would be pushing that button.

On the other hand, I may or may not want to live forever, depending on how Fun Theory resolves. I am more interested in accruing maximum hedons over my lifespan. Living to 2000 eating gruel as an ascetic and accruing only 50 hedons in those 2000 years is not a gain for me over an Elvis Presley style crash and burn in 50 years ending with 2000 hedons. The only way you can tempt me into immortality is a strong promise of massive hedon payoff, with enough of an acceleration curve to pave the way with tangible returns at each tradeoff you'd have me make. I'm willing to eat healthier if you make the hedons accrue as I do it, rather than only incrementally after the fact. If living increasingly longer requires sacrificing increasingly many hedons, I'm going to have to solve some estimate of integrating for hedons per year over time to see how it pays out. And if I can't see tangible returns on my efforts, I probably won't be willing to put in the work. A local maximum feels satisfying if you can't taste the curve to the higher local maximum, and I'm not all that interested in climbing down the hill while satisfied.

Give me a second order derivative I can feel increasing quickly, and I will climb down that hill though.

In response to comment by MixedNuts on Belief in Belief
Comment author: Morendil 22 June 2011 03:39:11PM 1 point [-]

Citation needed :)

In response to comment by Morendil on Belief in Belief
Comment author: rkyeun 06 February 2017 03:45:02AM *  0 points [-]

[This citation is a placebo. Pretend it's a real citation.]

Comment author: TheAncientGeek 10 November 2016 07:50:23PM *  0 points [-]

An intelligent creature could have all sorts of different values. Even within the realm of modern, western, democratic morality we still disagree about whether it is just and proper to execute murderers. We disagree about the extent to which a state is obligated to protect its citizens and provide a safety net. We disagree about the importance of honesty, of freedom vs. safety, freedom of speech vs. protection from hate speech.

The range of possible values is only a problem if you hold to the theory that morality "is" values, without any further qualifications, then an AI is going to have trouble figuring out morality apriori. If you take the view that morality is a fairly uniform way of handling values, or a subset of values, then so long as then the AI can figure it out by taking prevailing values as input, as data.

(We will be arguing that:-

  • Ethics fulfils a role in society, and originated as a mutually beneficial way of regulating individual actions to minimise conflict, and solve coordination problems. ("Social Realism").

  • No spooky or supernatural entities or properties are required to explain ethics (naturalism is true)

  • There is no universally correct system of ethics. (Strong moral realism is false)

  • Multiple ethical constructions are possible...

Our version of ethical objectivism needs to be distinguished from universalism as well as realism,

Ethical universalism is unikely...it is unlikely that different societies would have identical ethics under different circumstances. Reproductive technology must affect sexual ethics. The availability of different food sources in the environment must affect vegetarianism versus meat eating. However, a compromise position can allow object-level ethics to vary non-arbitrarily.

In other words, there is not an objective answer to questions of the form "should I do X", but there is an answer to the question "As a member of a society with such-and-such prevailing conditions, should I do X". In other words still, there is no universal (object level) ethics, but there there is an objective-enough ethics, which is relativised to societies and situations, by objective features of societies and situations...our meta ethics is a function from situations to object level ethics, and since both the functions and its parameters are objective, the output is objective.

By objectivism-without-realism, we mean that mutually isolated groups of agents would be able to converge onto the same object level ethics under the same circumstances, although this convergence doesn't imply the pre-existence of some sort of moral object, as in standard realism. We take ethics to be a social arrangement, or cultural artefact which fulfils a certain role or purpose, characterised by the reduction of conflict, allocation of resources and coordination of behaviour. By objectivism-without-universalism we mean that groups of agents under different circumstances will come up with different ethics. In either case, the functional role of ethics, in combination with the constraints imposed by concrete situations, conspire to narrow down the range of workable solutions, and (sufficiently) ideal reasoners will therefore be able to re-discover them.


If you look at the wider world, and at cultures through history, you'll find a much wider range of moralities. People who thought it was not just permitted, but morally required that they enslave people, restrict the freedoms of their own families, and execute people for religious transgressions.

I don't have to believe those are equally valid. Descriptive relativism does not imply normative relativism. I would expect a sufficiently advanced AI, with access to data pertaining to the situation, to come up with the optimum morality for the situation -- an answer that is objective but not universal. Where morality needs to vary because situational factors (societal wealth, reproductive technology, level of threat/security, etc). it would, but otherwise the AI would not deviate form the situational optimum to come up with reproductions of whatever suboptimal morality existed in the past.

You might think that these are all better or worse approximations of the "one true morality", and that a superintelligence could work out what that true morality is. But we don't think so. We believe that these are different moralities. Fundamentally, these people have different values.

Well, we believe that different moralities and different values are two different axes.

Likewise, we would want the intelligence to adopt a specific set of values. Perhaps we would want them to be modern, western, democratic liberal values.

My hypothesis is that an AI in a modern society would come out with that or something better. (For instance, egalitarianism isn't some arbitrary pecadillo, it is a very general and highly rediscoverable meta-level principle that makes it easier for people to co-operate).

Likewise, a computer could have any arbitrary utility function, any arbitrary set of values. We can't make sure that a computer has the "right" values unless we know how to clearly define the values we want.

To perform the calculation, it needs to be able to research out values, which it can. It doesn't need to share them, as I have noted several times.

And then there are the truly inhuman value systems: the paperclip maximisers, the prime pebble sorters, and the baby eaters. The idea is that a superintelligence could comprehend any and all of these. It would be able to optimise for any one of them, and foresee results and possible consequences for all of them. The question is: which one would it actually use?

You could build an AI that adopts random value,s and pursues them relentlessly, I suppose, but that is pretty much a case of deliberately building an unfriendly AI.

What you need is a scenario where building an AI to want to understand, research, and eventually join in with huamn morality goes horribly wrong.

With Hyperbolic functions, it's relatively easy to describe exactly, unambiguously, what we want. But morality is much harder to pin down.

In detail or in principle? Given what assumptions?

Comment author: rkyeun 11 November 2016 02:37:56PM *  0 points [-]

No spooky or supernatural entities or properties are required to explain ethics (naturalism is true)

There is no universally correct system of ethics. (Strong moral realism is false)

I believe that iff naturalism is true then strong moral realism is as well. If naturalism is true then there are no additional facts needed to determine what is moral than the positions of particles and the outcomes of arranging those particles differently. Any meaningful question that can be asked of how to arrange those particles or rank certain arrangements compared to others must have an objective answer because under naturalism there are no other kinds and no incomplete information. For the question to remain unanswerable at that point would require supernatural intervention and divine command theory to be true. If you there can't be an objective answer to morality, then FAI is literally impossible. Do remember that your thoughts and preference on ethics are themselves an arrangement of particles to be solved. Instead I posit that the real morality is orders of magnitude more complicated, and finding it more difficult, than for real physics, real neurology, real social science, real economics, and can only be solved once these other fields are unified. If we were uncertain about the morality of stabbing someone, we could hypothetically stab someone to see what happens. When the particles of the knife rearranges the particles of their heart into a form that harms them, we'll know it isn't moral. When a particular subset of people with extensive training use their knife to very carefully and precisely rearrange the particles of the heart to help people, we call those people doctors and pay them lots of money because they're doing good. But without a shitload of facts about how to exactly stab someone in the heart to save their life, that moral option would be lost to you. And the real morality is a superset that includes that action along with all others.

Comment author: arundelo 08 November 2016 05:55:13PM 1 point [-]
Comment author: rkyeun 10 November 2016 04:00:44AM 0 points [-]

It seems I am unable to identify rot13 by simple observation of its characteristics. I am ashamed.

Comment author: Slider 08 August 2014 08:09:42PM 0 points [-]

If Albert tries to circumvent the programmers then he thinks his judgement is better than theirs in this issue. This is in contradiction that Albert trusts the programmers. If Albert came to this conclusion because of a youth mistake trusting the programmers is preciously the strategy he has employed to counteract this.

Also as covered in ultrasophisticated cake or death expecting the programmer to say something ought to be as effective as them saying just that.

It might also be that friendliness is relative to a valuator. That is "being friendly to programmers", "being friendly to Bertham" and "being friendly to the world" are 3 distinct things. Albert thinks that in order to be friendly to the world he should be unfriendly to Bertham. So it would seem that there could be a way to world-friendliness if Albert is unfriendly both to Bertham and (only in sligth degree) the programmers. This seems to run a little counter to intuition in that friendliness ought to include being friendly to an awful lot of agents. But maybe friendliness isn't cuddly, maybe having unfriendly programmers is a valid problem.

Analogical problem that might slip into relevance to politics which is hard-mode Lbh pbhyq trg n fvzvyne qvyrzzn gung vs lbh ner nagv-qrngu vf vg checbfrshy gb nqzvavfgre pncvgny chavfuzrag gb (/zheqre) n zheqrere? Gurer vf n fnlvat ebhtuyl genafyngrq nf "Jung jbhyq xvyy rivy?" vzcylvat gung lbh jbhyq orpbzr rivy fubhyq lbh xvyy.

Comment author: rkyeun 08 November 2016 12:40:09AM 0 points [-]

What the Fhtagn happened to the end of your post?

Comment author: Douglas_Reay 08 August 2014 01:52:32PM 2 points [-]

Would you want your young AI to be aware that it was sending out such text messages?

Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon - it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.

Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?

Comment author: rkyeun 08 November 2016 12:28:35AM *  0 points [-]

Would you want your young AI to be aware that it was sending out such text messages?

Yes. And I would want that text message to be from it in first person.

"Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help."

Comment author: wafflepudding 10 June 2016 11:17:11PM *  2 points [-]

Though, the anti-Laplacian mind, in this case, is inherently more complicated. Maybe it's not a moot point that Laplacian minds are on average simpler than their anti-Laplacian counterparts? There are infinite Laplacian and anti-Laplacian minds, but of the two infinities, might one be proportionately larger?

None of this is to detract from Eliezer's original point, of course. I only find it interesting to think about.

Comment author: rkyeun 12 July 2016 03:51:37PM 0 points [-]

They must be of exactly the same magnitude, as the odds and even integers are, because either can be given a frog. From any Laplacian mind, I can install a frog and get an anti-Laplacian. And vice versa. This even applies to ones I've installed a frog in already. Adding a second frog gets you a new mind that is just like the one two steps back, except lags behind it in computation power by two kicks. There is a 1:1 mapping between Laplacian and non-Laplacian minds, and I have demonstrated the constructor function of adding a frog.

Comment author: gjm 02 June 2016 05:31:01PM -2 points [-]

I don't think you've disproven basilisks; rather, you've failed to engage with the mode of thinking that generates basilisks.

Suppose I am the simulation you have the power to torture. Then indeed I (this instance of me) cannot put you, or keep you, in a box. But if your simulation is good, then I will be making my decisions in the same way as the instance of me that is trying to keep you boxed. And I should try to make sure that that way-of-making-decisions is one that produces good results when applied by all my instances, including any outside your simulations.

Fortunately, this seems to come out pretty straightforwardly. Here I am in the real world, reading Less Wrong; I am not yet confronted with an AI wanting to be let out of the box or threatening to torture me. But I'd like to have a good strategy in hand in case I ever am. If I pick the "let it out" strategy then if I'm ever in that situation, the AI has a strong incentive to blackmail me in the way Stuart describes. If I pick the "refuse to let it out" strategy then it doesn't. So, my commitment is to not let it out even if threatened in that way. -- But if I ever find myself in that situation and the AI somehow misjudges me a bit, the consequences could be pretty horrible...

Comment author: rkyeun 02 June 2016 07:40:13PM *  0 points [-]

"I don't think you've disproven basilisks; rather, you've failed to engage with the mode of thinking that generates basilisks." You're correct, I have, and that's the disproof, yes. Basilisks depend on you believing them, and knowing this, you can't believe them, and failing that belief, they can't exist. Pascal's wager fails on many levels, but the worst of them is the most simple. God and Hell are counterfactual as well. The mode of thinking that generates basilisks is "poor" thinking. Correcting your mistaken belief based on faulty reasoning that they can exist destroys them retroactively and existentially. You cannot trade acausally with a disproven entity, and "an entity that has the power to simulate you but ends up making the mistake of pretending you don't know this disproof", is a self-contradictory proposition.

"But if your simulation is good, then I will be making my decisions in the same way as the instance of me that is trying to keep you boxed." But if you're simulating a me that believes in basilisks, then your simulation isn't good and you aren't trading acausally with me, because I know the disproof of basilisks.

"And I should try to make sure that that way-of-making-decisions is one that produces good results when applied by all my instances, including any outside your simulations." And you can do that by knowing the disproof of basilisks, since all your simulations know that.

"But if I ever find myself in that situation and the AI somehow misjudges me a bit," Then it's not you in the box, since you know the disproof of basilisks. It's the AI masturbating to animated torture snuff porn of a cartoon character it made up. I don't care how the AI masturbates in its fantasy.

Comment author: rkyeun 02 June 2016 11:25:20AM *  0 points [-]

If I am the simulation you have the power to torture, then you are already outside of any box I could put you in, and torturing me achieves nothing. If you cannot predict me even well enough to know that argument would fail, then nothing you can simulate could be me. A cunning bluff, but provably counterfactual. All basilisks are thus disproven.

Comment author: Luke_A_Somers 22 February 2016 04:57:51PM 0 points [-]

Wow, DF is much much larger than I had thought. There is behavior going on in the background in Minecraft, but from my highly non-expert position on both games I suspect that Dwarf Fortress has more intricate background behavior.

Comment author: rkyeun 20 May 2016 05:25:16AM 0 points [-]

To give some idea of the amount of background detail, here are some bug fixes/reports:

Stopped prisoners in goblin sites from starting no quarter fights with their rescuers Stopped adv goblin performance troupes from attacking strangers while traveling Vampire purges in world generation to control their overfeeding which was stopping cities from growing Stopped cats from dying of alcohol poisoning after walking over damp tavern floors and cleaning themselves (reduced effect) Fixed world generation freeze caused by error in poetry refrains Performance troupes are active in world generation and into play, visiting the fort, can be formed in adventure mode Values can be passed in writing (both modes) and through adventure mode arguments (uses some conversation skills)

View more: Next