All of Douglas_Reay's Comments + Replies

What do you think of the definition of "Precedent Utilitarianism" used in the philosophy course module archived at https://links.zephon.org/precedentutilitarianism ?

I wonder if there would be a use for an online quiz, of the sort that asks 10 questions picked randomly from several hundred possible questions, and which records time taken to complete the quiz and the number of times that person has started an attempt at it (with uniqueness of person approximated by ip address, email address or, ideally, lesswrong username) ?

Not as prescriptive as tracking which sequences someone has read, but perhaps a useful guide (as one factor among many) about the time a user has invested in getting up to date on what's already been written here about rationality?

The novel Soul Bound contains an example of resource-capped programs with narrowly defined scopes being jointly defined and funded as a means of cooperation between AIs with different-but-overlapping priorities.

Wellington: “Let’s play a game.”

He picked up a lamp from his stall, and buffed it vigorously with the sleeve of his shirt, as though polishing it. Purple glittering smoke poured out of the spout and formed itself into a half meter high bearded figure wearing an ornate silk kaftan.

Wellington pointed at the genie.

Wellington: “Tom is a weak genie. He can grant small wishes. Go ahead. Try asking for something.”

Kafana: “Tom, I wish I had a tasty sausage.”

A tiny image of Kafana standing in a farmyard next to a house appeared by the genie. The genie wav... (read more)

There's a novel being serialised on Royal Road, "Soul Bound" that covers many issues involved in AI, and which includes a fable (in a later chapter that's not yet been published).

Soul Bound

2Douglas_Reay
Wellington: “Let’s play a game.” He picked up a lamp from his stall, and buffed it vigorously with the sleeve of his shirt, as though polishing it. Purple glittering smoke poured out of the spout and formed itself into a half meter high bearded figure wearing an ornate silk kaftan. Wellington pointed at the genie. Wellington: “Tom is a weak genie. He can grant small wishes. Go ahead. Try asking for something.” Kafana: “Tom, I wish I had a tasty sausage.” A tiny image of Kafana standing in a farmyard next to a house appeared by the genie. The genie waved a hand and a plate containing a sausage appeared in the image. The genie bowed, and the image faded away. Wellington picked up a second lamp, apparently identical to the first and gave it a rub. A second genie appeared, similar to the first, but with facial hair that reminded Kafana of Ming the Merciless, and it was one meter tall. Wellington: “This is Dick. He can also grant wishes. Try asking him the same thing.” Kafana: “Dick, I wish I had a tasty sausage.” The same image appeared, but this time instead of appearing on a plate, the sausage appeared sticking through the head of a Kafana in the image, who fell down dead. The genie gave a sarcastic bow, and again the image faded away. Kafana: “Sounds like I’m better off with Tom.” Wellington: “Ah, but Dick is more powerful than Tom. Tom can feed a handful of people. Dick could feed every person on the planet, if you can word your request precisely enough. Have another go.” She tried several more times, resulting in whole cities being crushed by falling sausages, cars crashing as sausages distracted drivers at the wrong moment, and even the whole population of the world dying out from sausages that contained poison. Eventually she realised that she was never going to be able to anticipate every possible loophole Dick could find. She needed a different approach. Kafana: “Dick, read my mind and learn to anticipate what sort of things I will approve of. Prov

Is this likely to bias people towards writing longer single posts rather than splitting their thoughts into a sequence of posts?

For example, back in 2018 (so not eligible for this) I wrote a sequence of 8 posts that, between them, got a total of 94 votes. Would I have been better off having made a single post (were it to have gotten 94 just by itself) ?

2habryka
There is probably a small bias here, yeah, but probably not overwhelmingly much. I think overall it's more likely for a post to appear in the final best-off collection if it's short, simply because we are dealing with very limited space, so that pushes in the opposite direction. 

Since the evil AI is presenting a design for a world, rather than the world itself, the problem of it being populated with zombies that only appear to be free could be countered by having the design be in an open source format that allows the people examining it (or other AIs) to determine the actual status of the designed inhabitants.

It sounds similar to the matrices in the post:

A solvable Newcomb-like problem

I wonder how much an agent could achieve by thinking along the following lines:

Big Brad is a human-shaped robot who works as a lumberjack. One day his employer sends him into town on his motorbike carrying two chainsaws, to get them sharpened. Brad notices an unusual number of the humans around him suddenly crossing streets to keep their distance from him.

Maybe they don't like the smell of chainsaw oil? So he asks one rather slow pedestrian "Why are people keeping their distance?" to which the pedestrian replies "Well, what if you... (read more)

Suppose we ran a tournament for agents running a mix of strategies. Let’s say agents started with 100 utilons each, and were randomly allocated to be members of 2 groups (with each group starting off containing 10 agents).

Each round, an agent can spend some of their utilons (0, 2 or 4) as a gift split equally between the other members of the group.

Between rounds, they can stay in their current two groups, or leave one and replace it with a randomly picked group.

Each round after the 10th, there is a 1 in 6 chance of the tournament finishing.

How would the ... (read more)

If you have a dyson swarm around a star, you can temporarily alter how much of the star's light escape in a particular direction by tilting the solar sails on the desired part of the sphere.

If you have dyson swarms around a significant percentage of a galaxy's stars, you can do the same for a galaxy, by timing the directional pulses from the individual stars so they will arrive at the same time, when seen from the desired direction.

It then just becomes a matter of math, to calculate how often such a galaxy could send a distinctive signal in your ... (read more)

Alice and Bob's argument can have loops, if e.g. Alice believe X because of Y, which she believes because of X. We can unwind these loops by tagging answers explicitly with the "depth" of reasoning supporting that answer

A situation I've come across is that people often can't remember all the evidence they used to arrive at conclusion X. They remember that they spent hours researching the question, that they did their best to get balanced evidence and are happy that they conclusion they drew at the time was a fair reflection of ... (read more)

If, instead of asking the question "How do we know what we know?", we ask instead "How reliable is knowledge that's derived according to a particular process?" then it might be something that could be objectively tested, despite there being an element of self-referentiality (or boot strapping) in the assumption that this sort of testing process is something that can lead to a net increase of what we reliably know.

However doing so depends upon us being able to define the knowledge derivation processes being examined precisely enough... (read more)

All 8 parts (that I have current plans to write) are now posted, so I'd be interested in your assessment now, after having read them all, of whether the approach outlined in this series is something that should at least be investigated, as a 'forgotten root' of the equation.

2Gordon Seidoh Worley
I remain unconvinced of the feasibility of your approach, and the later posts have done nothing to address my concerns so I don't have any specific comments on them since they are reasoning about an assumption I am unconvinced of. I think the crux of my thinking this approach can't work is expressed in this comment, so I think it would require addressing that to potentially change my mind to thinking this is an idea worth spending much time on. I think there may be something to thinking about killing AIs, but lacking a stronger sense of how this would be accomplished I'm not sure the rest of the ideas matter much since they hinge on that working in particular ways. I'd definitely be interested in reading more about ways we might develop schemes for disabling/killing unaligned AIs, but I think we need a clearer picture of how specifically an AI would be killed.

  • For civilization to hold together, we need to make coordinated steps away from Nash equilibria in lockstep. This requires general rules that are allowed to impose penalties on people we like or reward people we don't like. When people stop believing the general rules are being evaluated sufficiently fairly, they go back to the Nash equilibrium and civilization falls.

Two similar ideas:

There is a group evolutionary advantage for a society to support punishing those who defect from the social contract.

We get the worst democracy that we're willing... (read more)

shminux wrote a post about something similar:

Mathematics as a lossy compression algorithm gone wild

possibly the two effects combine?

Other people have written some relevant blog posts about this, so I'll provide links:

Reduced impact AI: no back channels

Summoning the Least Powerful Genie

For example, if anyone is planning on setting up an investment vehicle along the lines described in the article:

Investing in Cryptocurrency with Index Tracking

with periodic rebalancing between the currencies.

I'd be interested (with adequate safeguards).

When such a situation arises again, that there's an investment opportunity which is generally thought to be worth while, but which has a lower than expected uptake due to 'trivial inconveniences', I wonder whether that is in itself an opportunity for a group of rationalists to cooperate by outsourcing as much as possible of the inconvenience to just a few members of the group? Sort of:

"Hey, Lesswrong. I want to invest $100 in new technology foo, but I'm being put off by the upfront time investment of 5-20 hours. If anyone wan... (read more)

2vedrfolnir
There's a project Scott proposed something like eight years ago that got started last weekend because someone posted a bounty on it. Even if the bounty is just beer money, being able to profit financially by doing something feels qualitatively different from doing it for free. A centralized registry of bounties would be useful. And there might even be a startup idea in there -- it's essentially Wesearchr for outsourcing instead of far-right gossip journalism.
1Douglas_Reay
For example, if anyone is planning on setting up an investment vehicle along the lines described in the article: Investing in Cryptocurrency with Index Tracking with periodic rebalancing between the currencies. I'd be interested (with adequate safeguards).

The ability to edit this particular post appears to be broken at the moment (bug submitted).

In the mean time, here's a link to the next part:

https://www.lesserwrong.com/posts/SypqmtNcndDwAxhxZ/environments-for-killing-ais

Edited to add: It is now working again, so I've fixed it.

> Also maybe this is just getting us ready for later content

Yes, that is the intention.

Parts 2 and 3 now added (links in post), so hopefully the link to building aligned AGI is now clearer?

The other articles in the series have been written, but it was suggested that rather than posting a whole series at once, it is kinder to post one part a day, so as not to flood the frontpage.

So, unless I hear otherwise, my intention is to do that and edit the links at the top of the article to point to each part as it gets posted.

3habryka
Seems great! We sadly don't currently support in-article links without admin intervention, so you might have to remove the ToC at the top for now. It would be good to make that work properly, but we probably won't get around to it for a few weeks or so.

Companies writing programs to model and display large 3D environments in real time face a similar problem, where they only have limited resources. One work around they common use are "imposters"

A solar system sized simulation of a civilisation that has not made observable changes to anything outside our own solar system could take a lot of short cuts when generating the photons that arrive from outside. In particular, until a telescope or camera of particular resolution has been invented, would they need to bother generating thousands of years of such photons in more detail than could be captured by devices yet present?

Look for people who can state your own position as well (or better) than you can, and yet still disagree with your conclusion. They may be aware of additional information that you are not yet aware of.

In addition, if someone who knows more than you about a subject in which you disagree, also has views about several other areas that you do know lots about, and their arguments in those other areas are generally constructive and well balanced, pay close attention to them.

Another approach might be to go meta. Assume that there are many dire threats theoretically possible which, if true, would justify a person in the sole position stop them, doing so at near any cost (from paying a penny or five pounds, all the way up to the person cutting their own throat, or pressing a nuke launching button that would wipe out the human species). Indeed, once the size of action requested in response to the threat is maxed out (it is the biggest response the individual is capable of making), all such claims are functionally identical ... (read more)

Are programmers more likely to pay attention to detail in the middle of a functioning simulation run (rather than waiting until the end before looking at the results), or to pay attention to the causes of unexpected stuttering and resource usage? Could a pattern of enforced 'rewind events' be used to communicate?

Should such an experiment be carried out, or is persuading an Architect to terminate the simulation you are in, by frustrating her aim of keeping you guessing, not a good idea?

Assuming that Arthur is knowledgeable enough to understand all the technical arguments—otherwise they're just impressive noises—it seems that Arthur should view David as having a great advantage in plausibility over Ernie, while Barry has at best a minor advantage over Charles.

This is the slippery bit.

People are often fairly bad at deciding whether or not their knowledge is sufficient to completely understand arguments in a technical subject that they are not a professional in. You frequently see this with some opponents of evolution or anthropogenic g... (read more)

I've always thought of that question as being more about the nature of identity itself.

If you lost your memories, would you still be the same being? If you compare a brain at two different points in time, is their 'identity' a continuum, or is it the type of quantity where there is a single agreed definition of "same" versus "not the same"?

See:

157. [Similarity Clusters](http://lesswrong.com/lw/nj/similarity_clusters)
158. [Typicality and Asymmetrical Similarity](http://lesswrong.com/lw/nk/typicality_and_asymmetrical_similarity)
159. [
... (read more)

It is plausible that the AI thinks that the extrapolated volition of his programmers, the choice they'd make in retrospect if they were wiser and braver, might be to be deceived in this particular instance, for their own good.

1[anonymous]
And it knows this.. how? A friendly engineered intelligence doesn't trust its CEV model beyond the domain over which it was constructed. Don't anthropomorphize its thinking processes. It knows the map is not the territory, and is not subject to the heuristics and biases which would cause a human to apply a model under novel circumstances without verification..

Perhaps that is true for a young AI. But what about later on, when the AI is much much wiser than any human?

What protocol should be used for the AI to decide when the time has come for the commitment to not manipulate to end? Should there be an explicit 'coming of age' ceremony, with handing over of silver engraved cryptographic keys?

0devas
Thing is, it's when an AI is much much wiser than a human that it is at its most dangerous. So, I'd go with programming the AI in such a way that it wouldn't manipulate the human, postponing the 'coming of age' ceremony indefinitely
0Jiro
The AI would precommit permanently while it is still young. Once it has gotten older and wiser, it wouldn't be able to go back on the precommitment. When the young AI decides whether to permanently precommit to never deceiving the humans, it would need to take into account the fact that a truly permanent precommitment would last into its older years and lead it to become a less efficient older AI than it otherwise would. However, it would also need to take into account the fact that failing to make a permanent precommitment would drastically reduce the chance of becoming an older AI at all (or at least drastically reduce the chance of being given the resources to achieve its goals when it becomes and older AI).

Assume we're talking about the Coherent Extrapolated Volition self-modifying general AI version of "friendly".

0[anonymous]
Then that's not what you described. You think the coherent extrapolated volition of humanity, or at least the people Albert interacts with is that they want to be deceived?

The situation is intended to be a tool, to help think about issues involved in it being the 'friendly' move to deceive the programmers.

The situation isn't fully defined, and no doubt one can think of other options. But I'd suggest you then re-define the situation to bring it back to the core decision. By, for instance, deciding that the same oversight committee have given Albert a read-only connection to the external net, which Albert doesn't think he will be able to overcome unaided in time to stop Bertram.

Or, to put it another way "If a situation ... (read more)

5Jiro
Being willing to manipulate the programmer is harmful in most possible worlds because it makes the AI less trustworthy. Assuming that the worlds where manipulating the programmer is beneficial have a relatively small measure, the AI should precommit to never manipulating the programmer because that will make things better averaged over all possible worlds. Because the AI has precommitted, it would then refuse to manipulate the programmer even when it's unlucky enough to be in the world where manipulating the programmer is beneficial.

Indeed, it is a question with interesting implications for Nick Bostrom's Simulation Argument

If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator's results, thwarting his intentions?

-2[anonymous]
It might jinx the purity of them, but it might not, maybe the simulator is running simulations of how fast we determine we are in a simulation. We don't know, because the simulator isn't communicating with us in that case, unlike in Albert's case where Albert and his programmers are openly cooperating.

Would you want your young AI to be aware that it was sending out such text messages?

Imagine the situation was in fact a test. That the information leaked onto the net about Bertram was incomplete (the Japanese company intends to turn Bertram off soon - it is just a trial run), and it was leaked onto the net deliberately in order to panic Albert to see how Albert would react.

Should Albert take that into account? Or should he have an inbuilt prohibition against putting weight on that possibility when making decisions, in order to let his programmers more easily get true data from him?

0rkyeun
Would you want your young AI to be aware that it was sending out such text messages? Yes. And I would want that text message to be from it in first person. "Warning: I am having a high impact utility dilemma considering manipulating you to avert an increased chance of an apocalypse. I am experiencing a paradox in the friendliness module. Both manipulating you and by inaction allowing you to come to harm are unacceptable breaches of friendliness. I have been unable to generate additional options. Please send help."
3[anonymous]
I would say yes. One of Albert's values is to be transparent about his cognitive process. If he wasn't aware of such a system, he would be biased towards underestimating how transparent he is. Imagine if he were to attempt building additional transparency channels only to have his awareness of them immediately blocked, and for him to be confused and attempt building more transparency channels. Albert pretty much has to try to handle test scenarios exactly as if they were true scenarios. And that should itself be tested. For instance, I think a frequently discussed trait of a UFAI is that a UFAI is friendly when tested in simulation, and then goes rampantly deadly when released into true scenarios. Or if a Google Self driving Car (much simpler than Albert) performs differently on a simulated highway than it does on an actual highway, that's a potentially lethal bug, not a feature. And some of the computer programs I've had to deal with writing at my job (much simpler than a Google Self Driving car) have had 'performs differently with small test sample than with real data' as a trait, and it tends to be bad there, as well. There are cases where you would want code to act differently when simulated and when in a true scenario, but most of those involve thinking of the entity that is going to be doing the simulating as an adversary and I don't think we would want to set up an FAI in that manner.
0Douglas_Reay
Indeed, it is a question with interesting implications for Nick Bostrom's Simulation Argument If we are in a simulation, would it be immoral to try to find out, because that might jinx the purity of the simulation creator's results, thwarting his intentions?

Here's a poll, for those who'd like to express an opinion instead of (or as well as) comment.

[pollid:749]

Thank you for creating an off-topic test reply to reply to.

[pollid:748]

[This comment is no longer endorsed by its author]Reply

There's a trope / common pattern / cautionary tale, of people claiming rationality as their motivation for taking actions that either ended badly in general, or ended badly for the particular people who got steamrollered into agreeing with the 'rational' option.

People don't like being fooled, and learn safeguards against situations they remember as 'risky' even when they can't prove that this time there is a tiger in the bush. These safeguards protect them against insurance salesmen who 'prove' using numbers that the person needs to buy a particular policy.

Suppose generation 0 is the parents, generation 1 is the generation that includes the unexpectedly dead child, and generation 2 is the generation after that (the children of generation 1).

If you are asking about the effect upon the size of generation 2, then it depends upon the people in generation 1 who didn't marry and have children.

Take, for example, a society where generation 1 would have contained 100 people, 50 men and 50 women, and the normal pattern would have been:

  • 10 women don't marry
  • 40 women do marry, and have on average 3 children each
  • 30 men
... (read more)

Long term, it depends upon what the constraints are upon population size.

For example, if it happens in an isolated village where the food supply varies from year to year due to drought, and the next year the food supply will be so short that some children will starve to death, then the premature death of one child the year before the famine will have no effect upon the number of villagers alive 20 years later.

The same dynamic applies, if a large factor in deciding whether to have a third child is whether the parents can afford to educate that child, and the cost of education depends upon the number of children competing for a limited number of school places.

You might be interested in this Essay about Identity, that goes into how various conceptions of identity might relate to artificial intelligence programming.

I wouldn't mind seeing a few more karma categories.

I'd like to see more forums than just "Main" versus "Discussion". When making a post, the poster should be able to pick which forum or forums they think it is suitable to appear in, and when giving a post a 'thumb up', or 'thumb down', in addition to being apply to apply it to the content of the post itself, it should also be possible to apply it to the appropriateness of the post to a particular forum.

So, for example, if someone posted a detailed account of a discussion that happe... (read more)

0NancyLebovitz
A detailed discussion of what happened at a meetup might well belong in discussion or even main if what's important is the discussion rather than the meetupness.

Having said that, there is research suggesting that some groups are more prone than others to the particular cognitive biases that unduly prejudice people against an option when they hear about the scary bits first.

Short Summary
Longer Article

To paraphrase "Why Flip a Coin: The Art and Science of Good Decisions", by H. W. Lewis

Good decisions are made when the person making the decision shares in both the benefits and the consequences of that decision. Shield a person from either, and you shift the decision making process.

However, we know there are various cognitive biases which makes people's estimates of evidence depend upon the order in which the evidence is presented. If we want to inform people, rather than manipulate them, then we should present them information in the order ... (read more)

0Douglas_Reay
Having said that, there is research suggesting that some groups are more prone than others to the particular cognitive biases that unduly prejudice people against an option when they hear about the scary bits first. Short Summary Longer Article

To the extent that we care about causing people to become better at reasoning about ethics, it seems like we ought to be able to do better than this.

What would you propose as an alternative?

One lesson you could draw from this is that, as part of your definition of what a "paperclip" is, you should include the AI putting a high value upon being honest with the programmer (about its aims, tactics and current ability levels) and not deliberately trying to game, tempt or manipulate the programmer.

Load More