Filter Last three months

You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Problems with learning values from observation

0 capybaralet 21 September 2016 12:40AM

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn't allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person's brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone's Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

------------------
an IMPLICATION:
I have a picture of value learning where the AI learns via observation (since we don't want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.

A Weird Trick To Manage Your Identity

2 Gleb_Tsipursky 19 September 2016 07:13PM

I’ve always been uncomfortable being labeled “American.” Though I’m a citizen of the United States, the term feels restrictive and confining. It obliges me to identify with aspects of the United States with which I am not thrilled. I have similar feelings of limitation with respect to other labels I assume. Some of these labels don’t feel completely true to who I truly am, or impose certain perspectives on me that diverge from my own.

 

These concerns are why it's useful to keep one's identity small, use identity carefully, and be strategic in choosing your identity.

 

Yet these pieces speak more to System 1 than to System 2. I recently came up with a weird trick that has made me more comfortable identifying with groups or movements that resonate with me while creating a System 1 visceral identity management strategy. The trick is to simply put the word “weird” before any identity category I think about.

 

I’m not an “American,” but a “weird American.” Once I started thinking about myself as a “weird American,” I was able to think calmly through which aspects of being American I identified with and which I did not, setting the latter aside from my identity. For example, I used the term “weird American” to describe myself when meeting a group of foreigners, and we had great conversations about what I meant and why I used the term. This subtle change enables my desire to identify with the label “American,” but allows me to separate myself from any aspects of the label I don’t support.

 

Beyond nationality, I’ve started using the term  “weird” in front of other identity categories. For example, I'm a professor at Ohio State. I used to become deeply  frustrated when students didn’t prepare adequately  for their classes with me. No matter how hard I tried, or whatever clever tactics I deployed, some students simply didn’t care. Instead of allowing that situation to keep bothering me, I started to think of myself as a “weird professor” - one who set up an environment that helped students succeed, but didn’t feel upset and frustrated by those who failed to make the most of it.

 

I’ve been applying the weird trick in my personal life, too. Thinking of myself as a “weird son” makes me feel more at ease when my mother and I don’t see eye-to-eye; thinking of myself as a “weird nice guy,” rather than just a nice guy, has helped me feel confident about my decisions to be firm when the occasion calls for it.

 

So, why does this weird trick work? It’s rooted in strategies of reframing and distancing, two research-based methods for changing our thought frameworks. Reframing involves changing one’s framework of thinking about a topic in order to create more beneficial modes of thinking. For instance, in reframing myself as a weird nice guy, I have been able to say “no” to requests people make of me, even though my intuitive nice guy tendency tells me I should say “yes.” Distancing refers to a method of emotional management through separating oneself from an emotionally tense situation and observing it from a third-person, external perspective. Thus, if I think of myself as a weird son, I don’t have nearly as much negative emotions during conflicts with my mom. It enables me to have space for calm and sound decision-making.

 

Thinking of myself as "weird" also applies to the context of rationality and effective altruism for me. Thinking of myself as a "weird" aspiring rationalist and EA helps me be more calm and at ease when I encounter criticisms of my approach to promoting rational thinking and effective giving. I can distance myself from the criticism better, and see what I can learn from the useful points in the criticism to update and be stronger going forward.

 

Overall, using the term “weird” before any identity category has freed me from confinements and restrictions associated with socially-imposed identity labels and allowed me to pick and choose which aspects of these labels best serve my own interests and needs. I hope being “weird” can help you manage your identity better as well!

[Link] My latest around of internet urban legend research: Deep web secrets

0 Deku-shrub 28 September 2016 07:12PM

Fairness in machine learning decisions

-2 Stuart_Armstrong 05 August 2016 09:56AM

There's been some recent work on ensuring fairness in automated decision making, especially around sensitive areas such as racial groups. The paper "Censoring Representations with an Adversary" looks at one way of doing this.

It looks at a binary classification task where X ⊂ Rn and Y = {0, 1} is the (output) label set. There is also S = {0, 1} which is a protected variable label set. The definition of fairness is that, if η : X → Y is your classifier, then η should be independent of S. Specifically:

  • P(η(X)=1|S=1) = P(η(X)=1|S=0)

There is a measure of discrimination, which is the extent to which the classifier violates that fairness assumption. The paper then suggests to tradeoff optimise the difference between discrimination and classification accuracy.

But this is problematic, because it risks throwing away highly relevant information. Consider redlining, the practice of denying services to residents of certain areas based on the racial or ethnic makeups of those areas. This is the kind of practice we want to avoid. However, generally the residents of these areas will be poorer than the average population. So if Y is approval for mortgages or certain financial services, a fair algorithm would essentially be required to reach a decision that ignores this income gap.

And it doesn't seem the tradeoff with accuracy is a good way of compensating for this. Instead, a better idea would be to specifically allow certain variables to be considered. For example, let T be another variable (say, income) that we want to allow. Then fairness would be defined as:

  • ∀t, P(η(X)=1|S=1, T=t) = P(η(X)=1|S=0, T=t)

What this means is that T can distinguish between S=0 and S=1, but, once we know the value of T, we can't deduce anything further about S from η. For instance, once the bank knows your income, it should be blind to other factors.

Of course, with enough T variables, S can be determined with precision. So each T variable should be fully justified, and in general, it must not be easy to establish the value of S via T.

Avoiding collapse: Grand challenges for science and society to solve by 2050

1 morganism 15 August 2016 05:47AM

"We maintain that humanity’s grand challenge is solving the intertwined problems of human population growth and overconsumption, climate change, pollution, ecosystem destruction, disease spillovers, and extinction, in order to avoid environmental tipping points that would make human life more difficult and would irrevocably damage planetary life support systems."

 

pdf onsite

https://elementascience.org/articles/94

The Extraordinary Link Between Deep Neural Networks and the Nature of the Universe

1 morganism 10 September 2016 07:13PM

"The answer is that the universe is governed by a tiny subset of all possible functions. In other words, when the laws of physics are written down mathematically, they can all be described by functions that have a remarkable set of simple properties."

“For reasons that are still not fully understood, our universe can be accurately described by polynomial Hamiltonians of low order.” These properties mean that neural networks do not need to approximate an infinitude of possible mathematical functions but only a tiny subset of the simplest ones."

Interesting article, and just diving into the paper now, but it looks like this is a big boost to the simulation argument. If the universe is built like a game engine, with stacked sets like Mandelbrots, then the simplicity itself becomes a driver in a fabricated reality.

 

https://www.technologyreview.com/s/602344/the-extraordinary-link-between-deep-neural-networks-and-the-nature-of-the-universe/

Why does deep and cheap learning work so well?

http://arxiv.org/abs/1608.08225

Inverse cryonics: one weird trick to persuade anyone to sign up for cryonics today!

3 The_Jaded_One 11 August 2016 05:29PM

OK, slight disclaimer, this is a bit of a joke article inspired by me watching a few recent videos and news reports about cryonics. Nevertheless, there is a serious side to it. 

Many people claim that it is irrational to sign up for cryonics, and getting into the nitty gritty with them about how likely it is to work seems to turn into a series of small skirmishes with no particular "win condition". Opponents will not say,

"OK, I will value my life at $X and if you can convince me that (cryonics success probability)*$X is greater than the $1/day fee, I will concede the argument".

Rather, they will retreat to a series of ever harder to falsify positions, usually ending up at a position which is so vague that it is basically pure mood affiliation and acts as a way to stop the conversation rather than as a true objection. I have seen it many times with friends. 

So, I propose that before you debate someone about cryonics, you should first try to sign then up for inverse cryonics. Inverse cryonics is a very simple procedure, fully scientifically tested that anyone can sign up for today, as long as they have a reasonably well-off benefactor to take the "other side" of the bet. Let me explain.

The inverse cryonics patient takes a simple revolver with 6 barrels, with one bullet loaded and spins the barrel on the gun, then shoots themselves once in the head1. If the inverse cryonaut is unlucky enough to shoot themselves with a barrel containing a real bullet, they will blow their brains out and die instantly and permanently. However, if they are lucky, the benefactor must pay them $1 per day for the rest of their lives. 

Obviously you can vary the risk, rewards and timings of inverse cryonics. The death event could be postponed for 20 years, the risk could be cranked up or down, and the reward could be increased or decreased or paid out as a future discounted lump sum. The key is that signing up for inverse cryonics should be mathematically identical to not signing up for cryonics.

As a baseline, cryonics seems to cost ~$1/day for the rest of your life in order to avoid a ~1/10 chance of dying2. Most people3 would not play ~10-barrel Russian Roulette for a $1/day stipend, even with delayed death or an instant ~$50k payout. 

In fact,

  • if you believe that cryonics costs ~$1/day for the rest of your life in order to avoid a ~1/10 chance of dying4  and
  • you are offered 11-barrel Russian roulette for that same ~$1/day as a stipend, or even an instant $50k payout
then
  • as a rational agent you shouldn't refuse both offers

Of course, I'm sure opponents of cryonics won't bite this particular bullet, but at the very least it may provide an extra intuition pump to move people away from objecting to cryonics because it's the "risky" option. 

Comments and criticisms welcome. 

 

 

 

 


1. Depending on the specific deal, more than six barrels could be used, or several identical guns could be used where only one barrel from one gun contains a real bullet, allowing one to achieve a reasonable range of probabilities for "losing" at inverse cryonics from 1 in 6 to perhaps one in 60 with ten guns. 

2. And pushing the probability of cryonics working down much further seems to be very hard to defend scientifically, not that people haven't tried. It becomes especially hard when you assume that the cryonics organizations stick around for ~40 years, and society sticks around without major disruptions in order for a young potential cryonaut who signs up today to actually pay their life insurance fees every day until they die. 

3. Most intelligent, sane, relatively well-off people in the developed world, i.e. the kind of people who reject cryonics. 

4. And you believe that the life you miss out on in the future will be as good, or better than, the life you are about to live from today until your natural death at a fixed age of, say, 75. 

 

 

Trying to find a short story

0 mgin 25 October 2016 02:27AM

It's a story about a boy who is into science and transhumanism, and a girl he told about all these crazy things that were going to happen. He dies and all of the things he said started to happen. She ended up floating around Saturn remembering him.

Either he or she was in the wheelchair. He was dying and he was disappointed he was dying because of all the cool stuff that was going to happen that she was going to be around for, and some of it had to do with whatever problem she had that was going to get fixed.

Please help me find this story if you can.

Agential Risks: A Topic that Almost No One is Talking About

6 philosophytorres 15 October 2016 06:41PM

(Happy to get feedback on this! It draws from and expounds ideas in this article: http://jetpress.org/v26.2/torres.htm)


Consider a seemingly simple question: if the means were available, who exactly would destroy the world? There is surprisingly little discussion of this question within the nascent field of existential risk studies. But it’s an absolutely crucial issue: what sort of agent would either intentionally or accidentally cause an existential catastrophe?

The first step forward is to distinguish between two senses of an existential risk. Nick Bostrom originally defined the term as: “One where an adverse outcome would either annihilate Earth-originating intelligent life or permanently and drastically curtail its potential.” It follows that there are two distinct scenarios, one endurable and the other terminal, that could realize an existential risk. We can call the former an extinction risk and the latter a stagnation risk. The importance of this distinction with respect to both advanced technologies and destructive agents has been previously underappreciated.

So, the question asked above is actually two questions in disguise. Let’s consider each in turn.

Terror: Extinction Risks


First, the categories of agents who might intentionally cause an extinction catastrophe are fewer and smaller than one might think. They include:

(1) Idiosyncratic actors. These are malicious agents who are motivated by idiosyncratic beliefs and/or desires. There are instances of deranged individuals who have simply wanted to kill as many people as possible and then die, such as some school shooters. Idiosyncratic actors are especially worrisome because this category could have a large number of members (token agents). Indeed, the psychologist Martha Stout estimates that about 4 percent of the human population suffers from sociopathy, resulting in about 296 million sociopaths. While not all sociopaths are violent, a disproportionate number of criminals and dictators have (or very likely have) had the condition.

(2) Future ecoterrorists. As the effects of climate change and biodiversity loss (resulting in the sixth mass extinction) become increasingly conspicuous, and as destructive technologies become more powerful, some terrorism scholars have speculated that ecoterrorists could become a major agential risk in the future. The fact is that the climate is changing and the biosphere is wilting, and human activity is almost entirely responsible. It follows that some radical environmentalists in the future could attempt to use technology to cause human extinction, thereby “solving” the environmental crisis. So, we have some reason to believe that this category could become populated with a growing number of token agents in the coming decades.

(3) Negative utilitarians. Those who hold this view believe that the ultimate aim of moral conduct is to minimize misery, or “disutility.” Although some negative utilitarians like David Pearce see existential risks as highly undesirable, others would welcome annihilation because it would entail the elimination of suffering. It follows that if a “strong” negative utilitarian had a button in front of her that, if pressed, would cause human extinction (say, without causing pain), she would very likely press it. Indeed, on her view, doing this would be the morally right action. Fortunately, this version of negative utilitarianism is not a position that many non-academics tend to hold, and even among academic philosophers it is not especially widespread.

(4) Extraterrestrials. Perhaps we are not alone in the universe. Even if the probability of life arising on an Earth-analog is low, the vast number of exoplanets suggests that the probability of life arising somewhere may be quite high. If an alien species were advanced enough to traverse the cosmos and reach Earth, it would very likely have the technological means to destroy humanity. As Stephen Hawking once remarked, “If aliens visit us, the outcome would be much as when Columbus landed in America, which didn’t turn out well for the Native Americans.”

(5) Superintelligence. The reason Homo sapiens is the dominant species on our planet is due almost entirely to our intelligence. It follows that if something were to exceed our intelligence, our fate would become inextricably bound up with its will. This is worrisome because recent research shows that even slight misalignments between our values and those motivating a superintelligence could have existentially catastrophic consequences. But figuring out how to upload human values into a machine poses formidable problems — not to mention the issue of figuring out what our values are in the first place.

Making matters worse, a superintelligence could process information at about 1 million times faster than our brains, meaning that a minute of time for us would equal approximately 2 years in time for the superintelligence. This would immediately give the superintelligence a profound strategic advantage over us. And if it were able to modify its own code, it could potentially bring about an exponential intelligence explosion, resulting in a mind that’s many orders of magnitude smarter than any human. Thus, we may have only one chance to get everything just right: there’s no turning back once an intelligence explosion is ignited.

A superintelligence could cause human extinction for a number of reasons. For example, we might simply be in its way. Few humans worry much if an ant genocide results from building a new house or road. Or the superintelligence could destroy humanity because we happen to be made out of something it could use for other purposes: atoms. Since a superintelligence need not resemble human intelligence in any way — thus, scholars tell us to resist the dual urges of anthropomorphizing and anthropopathizing — it could be motivated by goals that appear to us as utterly irrational, bizarre, or completely inexplicable.


Terror: Stagnation Risks


Now consider the agents who might intentionally try to bring about a scenario that would result in a stagnation catastrophe. This list subsumes most of the list above in that it includes idiosyncratic actors, future ecoterrorists, and superintelligence, but it probably excludes negative utilitarians, since stagnation (as understood above) would likely induce more suffering than the status quo today. The case of extraterrestrials is unclear, given that we can infer almost nothing about an interstellar civilization except that it would be technologically sophisticated.

For example, an idiosyncratic actor could harbor not a death wish for humanity, but a “destruction wish” for civilization. Thus, she or he could strive to destroy civilization without necessarily causing the annihilation of Homo sapiens. Similarly, a future ecoterrorist could hope for humanity to return to the hunter-gatherer lifestyle. This is precisely what motivated Ted Kaczynski: he didn’t want everyone to die, but he did want our technological civilization to crumble. And finally, a superintelligence whose values are misaligned with ours could modify Earth in such a way that our lineage persists, but our prospects for future development are permanently compromised. Other stagnation scenarios could involve the following categories:

(6) Apocalyptic terrorists. History is overflowing with groups that not only believed the world was about to end, but saw themselves as active participants in an apocalyptic narrative that’s unfolding in realtime. Many of these groups have been driven by the conviction that “the world must be destroyed to be saved,” although some have turned their activism inward and advocated mass suicide.

Interestingly, no notable historical group has combined both the genocidal and suicidal urges. This is why apocalypticists pose a greater stagnation terror risk than extinction risk: indeed, many see their group’s survival beyond Armageddon as integral to the end-times, or eschatological, beliefs they accept. There are almost certainly less than about 2 million active apocalyptic believers in the world today, although emerging environmental, demographic, and societal conditions could cause this number to significantly increase in the future, as I’ve outlined in detail elsewhere (see Section 5 of this paper).

(7) States. Like terrorists motivated by political rather than transcendent goals, states tend to place a high value on their continued survival. It follows that states are unlikely to intentionally cause a human extinction event. But rogue states could induce a stagnation catastrophe. For example, if North Korea were to overcome the world’s superpowers through a sudden preemptive attack and implement a one-world government, the result could be an irreversible decline in our quality of life.

So, there are numerous categories of agents that could attempt to bring about an existential catastrophe. And there appear to be fewer agent types who would specifically try to cause human extinction than to merely dismantle civilization.


Error: Extinction and Stagnation Risks


There are some reasons, though, for thinking that error (rather than terror) could constitute the most significant threat in the future. First, almost every agent capable of causing intentional harm would also be capable of causing accidental harm, whether this results in extinction or stagnation. For example, an apocalyptic cult that wants to bring about Armageddon by releasing a deadly biological agent in a major city could, while preparing for this terrorist act, inadvertently contaminate its environment, leading to a global pandemic.

The same goes for idiosyncratic agents, ecoterrorists, negative utilitarians, states, and perhaps even extraterrestrials. (Indeed, the large disease burden of Europeans was a primary reason Native American populations were decimated. By analogy, perhaps an extraterrestrial destroys humanity by introducing a new type of pathogen that quickly wipes us out.) The case of superintelligence is unclear, since the relationship between intelligence and error-proneness has not been adequately studied.

Second, if powerful future technologies become widely accessible, then virtually everyone could become a potential cause of existential catastrophe, even those with absolutely no inclination toward violence. To illustrate the point, imagine a perfectly peaceful world in which not a single individual has malicious intentions. Further imagine that everyone has access to a doomsday button on her or his phone; if pushed, this button would cause an existential catastrophe. Even under ideal societal conditions (everyone is perfectly “moral”), how long could we expect to survive before someone’s finger slips and the doomsday button gets pressed?

Statistically speaking, a world populated by only 1 billion people would almost certainly self-destruct within a 10-year period if the probability of any individual accidentally pressing a doomsday button were a mere 0.00001 percent per decade. Or, alternatively: if only 500 people in the world were to gain access to a doomsday button, and if each of these individuals had a 1 percent chance of accidentally pushing the button per decade, humanity would have a meager 0.6 percent chance of surviving beyond 10 years. Thus, even if the likelihood of mistakes is infinitesimally small, planetary doom will be virtually guaranteed for sufficiently large populations.


The Two Worlds Thought Experiment


The good news is that a focus on agential risks, as I’ve called them, and not just the technological tools that agents might use to cause a catastrophe, suggests additional ways to mitigate existential risk. Consider the following thought-experiment: a possible world A contains thousands of advanced weapons that, if in the wrong hands, could cause the population of A to go extinct. In contrast, a possible world B contains only a single advanced “weapon of total destruction” (WTD). Which world is more dangerous? The answer is obviously world A.

But it would be foolishly premature to end the analysis here. Imagine further that A is populated by compassionate, peace-loving individuals, whereas B is overrun by war-mongering psychopaths. Now which world appears more likely to experience an existential catastrophe? The correct answer is, I would argue, world B.

In other words: agents matter as much as, or perhaps even more than, WTDs. One simply can’t evaluate the degree of risk in a situation without taking into account the various agents who could become coupled to potentially destructive artifacts. And this leads to the crucial point: as soon as agents enter the picture, we have another variable that could be manipulated through targeted interventions to reduce the overall probability of an existential catastrophe.

The options here are numerous and growing. One possibility would involve using “moral bioenhancement” techniques to reduce the threat of terror, given that acts of terror are immoral. But a morally enhanced individual might not be less likely to make a mistake. Thus, we could attempt to use cognitive enhancements to lower the probability of catastrophic errors, on the (tentative) assumption that greater intelligence correlates with fewer blunders.

Furthermore, implementing stricter regulations on CO2 emissions could decrease the probability of extreme ecoterrorism and/or apocalyptic terrorism, since environmental degradation is a “trigger” for both.

Another possibility, most relevant to idiosyncratic agents, is to reduce the prevalence of bullying (including cyberbullying). This is motivated by studies showing that many school shooters have been bullied, and that without this stimulus such individuals would have been less likely to carry out violent rampages. Advanced mind-reading or surveillance technologies could also enable law enforcement to identify perpetrators before mass casualty crimes are committed.

As for superintelligence, efforts to solve the “control problem” and create a friendly AI are of primary concern among many many researchers today. If successful, a friendly AI could itself constitute a powerful mitigation strategy for virtually all the categories listed above.

(Note: these strategies should be explicitly distinguished from proposals that target the relevant tools rather than agents. For example, Bostrom’s idea of “differential technological development” aims to neutralize the bad uses of technology by strategically ordering the development of different kinds of technology. Similarly, the idea of police “blue goo” to counter “grey goo” is a technology-based strategy. Space colonization is also a tool intervention because it would effectively reduce the power (or capacity) of technologies to affect the entire human or posthuman population.)


Agent-Tool Couplings


Devising novel interventions and understanding how to maximize the efficacy of known strategies requires a careful look at the unique properties of the agents mentioned above. Without an understanding of such properties, this important task will be otiose. We should also prioritize different agential risks based on the likely membership (token agents) of each category. For example, the number of idiosyncratic agents might exceed the number of ecoterrorists in the future, since ecoterrorism is focused on a single issue, whereas idiosyncratic agents could be motivated by a wide range of potential grievances.[1] We should also take seriously the formidable threat posed by error, which could be nontrivially greater than that posed by terror, as the back-of-the-envelope calculations above show.

Such considerations, in combination with technology-based risk mitigation strategies, could lead to a comprehensive, systematic framework for strategically intervening on both sides of the agent-tool coupling. But this will require the field of existential risk studies to become less technocentric than it currently is.

[1] Although, on the other hand, the stimulus of environmental degradation would be experienced by virtually everyone in society, whereas the stimuli that motivate idiosyncratic agents might be situationally unique. It’s precisely issues like these that deserve further scholarly research.

Risk Contracts: A Crackpot Idea to Save the World

-2 SquirrelInHell 30 September 2016 02:36PM

Time start: 18:17:30

I

This idea is probably going to sound pretty crazy. As far as seemingly crazy ideas go, it's high up there. But I think it is interesting enough to at least amuse you for a moment, and upon consideration your impression might change. (Maybe.) And as a benefit, it offers some insight into AI problems if you are into that.

(This insight into AI may or may not be new. I am not an expert on AI theory, so I wouldn't know. It's elementary, so probably not new.)

So here it goes, in short form on which I will expand in a moment:

To manage global risks to humanity, they can be captured in "risk contracts", freely tradeable on the market. Risk contracts would serve the same role as CO2 emissions contracts, which can likewise be traded, and ensure that the global norm is not exceeded as long as everyone plays along with the rules.

So e.g. if I want to run a dangerous experiment that might destroy the world, it's totally OK as long as I can purchase enough of a risk budget. Pretty crazy, isn't it?

As an added bonus, a risk contract can take into account the risk of someone else breaking the terms of contract. When you trasfer your rights to global risk, the contract obliges you to diminish the amount you transfer by the uncertainty about the other party being able to fullfill all obligations that come with such a contract. Or if you have not enough risk budget for this, you cannot transfer to that person.

II

Let's go a little bit more into detail about a risk contract. Note that this is supposed to illustrate the idea, not be a final say on the shape and terms of such a contract.

Just to give you some idea, here are some example rules (with lots of room to specify them more clearly etc., it's really just so that you have a clearer idea of what I mean by a "risk contract"):

  1. My initial risk budget is 5 * 10^-12 chance of destroying the world. I am going to track this budget and do everything in my power to make sure that it never goes below 0.
  2. For every action (or set of correlated actions) I take, I will subtract the probability that those actions destroy the world from my budget (using simple subtraction unless correlation between actions is very high).
  3. If I transfer my budget to an agent who is going to decide about its actions independently from me, I will first pay the cost from my budget for the probability that this agent might not keep the terms of the contract. I will use my best conservative estimates, and refuse the transaction if I cannot keep the risk within my budget.
  4. Any event in which a risk contract on world destruction is breached will use my budget as if it was equivalent to actually destroying the world.
  5. Whenever I create a new intelligent agent, I will transfer some risk budget to that agent, according to the rules above.

III

Of course, the application of this could be wider than just an AI which might recursively self-improve - some more "normal" human applications could be risk management in a company or government, or even using risk contract as an internal currency to make better decisions.

I admit though, that the AI case is pretty special - it gives an opportunity to actually control the ability of another agent to keep a risk contract that we are giving to them.

It is an interesting calculation to see roughly what are the costs of keeping a risk contract in the recursive AI case, with a lot of simplifying assumptions. Assume that to reduce risk of child AI going off the rails can be reduced by a constant factor (e.g. have it cut by half) by putting in an additional unit of work. Also assume the chain of child AIs might continue indefinitely, and no later AI will assume a finite ending of it. Then if the chain has no branches, we are basically reduced to a power series: the risk budget of a child AI is always the same fraction of its parent's budget. That means we need linearly increasing amount of work on safety at each step. That in turn means that the total amount of work on safety is quadratic in the number of steps (child AIs).

Time end: 18:52:01

Writing stats: 21 wpm, 115 cpm (previous: 30/167, 33/183, 23/128)

Willpower Schedule

4 SquirrelInHell 22 August 2016 01:05PM

 


TL;DR: your level of willpower depends on how much willpower you expect to need (hypothesis)


 

Time start: 21:44:55 (this is my third exercise in speed writing a LW post)

I.

There is a lot of controversy about how our level of willpower is affected by various factors, including doing "exhausting" tasks before, as well as being told that willpower is a resource that depletes easily, or doesn't etc.

(sorry, I can't go look for references - that would break the speedwriting exercise!)

I am not going to repeat the discussions that already cover those topics; however, I have a new tentative model which (I think) fits the existing data very well, is easy to test, and supersedes all previous models that I have seen.

II.

The idea is very simple, but before I explain it, let me give a similar example from a different aspect of our lives. The example is going to be concerned with, uh, poo.

Have you ever noticed that (if you have a sufficiently regular lifestyle), conveniently you always feel that you need to go to the toilet at times when it's possible to do so? Like for example, how often do you need to go when you are on a bus, versus at home or work?

The function of your bowels is regulated by reading subconscious signals about your situation - e.g. if you are stressed, you might become constipated. But it is not only that - there is a way in which it responds to your routines, and what you are planning to do, not just the things that are already affecting you.

Have you ever had the experience of a background thought popping up in your mind that you might need to go within the next few hours, but the time was not convenient, so you told that thought to hold it a little bit more? And then it did just that?

III.

The example from the previous section, though possibly quite POOrly choosen (sorry, I couldn't resist), shows something important.

Our subconscious reactions and "settings" of our bodies can interact with our conscious plans in a "smart" way. That is, they do not have to wait to see the effects of what you are doing, to adjust to it - they can pull information from your conscious plans and adjust *before*.

And this is, more or less, the insight that I have added to my current working theory of willpower. It is not very complicated, but perhaps non-obvious. Sufficiently non-obvious that I don't think anyone has suggested it before, even after seeing experimental results that match this excellently.

IV.

To be more accurate, I claim that how much willpower you will have depends on several important factors, such as your energy and mood, but it also depends on how much willpower you expect to need.

For example, if you plan to have a "rest day" and not do any serious work, you might find that you are much less *able* to do work on that day than usual.

It's easy enough to test - so instead of arguing this theoretically, please do just that - give it a test. And make sure to record your levels of willpower several times a day for some time - you'll get some useful data!

 

Time end: 20:00:53. Statistics: 534 words, 2924 characters, 15.97 minutes, 33.4 wpm, 183.1 cpm

[Link] Nick Bostrom says Google is winning the AI arms race

3 polymathwannabe 05 October 2016 06:50PM

The map of ideas how the Universe appeared from nothing

7 turchin 02 September 2016 04:49PM

There is a question which is especially disturbing during sleepless August nights, and which could cut your train of thought with existential worry at any unpredictable moment.

The question is, “Why does anything exist at all?” It seems more logical that nothing will ever exist.

A more specific form of the question is “How has our universe appeared from nothing?” The last question has some hidden assumptions (about time, universe, nothing and causality), but it is also is more concrete.

Let’s try to put these thoughts into some form of “logical equation”:

 

1.”Nothingness + deterministic causality = non existence”

2. But “I = exist”. 

 

So something is wrong in this set of conjectures. If the first conjecture is false, then either nothingness is able to create existence, or causality is able to create it, or existence is not existence. 

There is also a chance that our binary logic is wrong.

Listing these possibilities we can create a map of solutions of the “nothingness problem”.

There are two (main) ways in which we could try to answer this question: we could go UP from a logical-philosophical level, or we could go DOWN using our best physical theories to the moment of the universe’s appearance and the nature of causality. 

Our theories of general relativity, QM and inflation are good for describing the (almost) beginning of the universe. As Krauss showed, the only thing we need is a random generator of simple physical laws in the beginning. But the origin of this thing is still not clear.

There is a gap between these two levels of the explanation, and a really good theory should be able to fill it, that is to show the way between first existing thing and smallest working set of physical laws (and Woldram’s idea about cellular automata is one of such possible bridges).

But we don’t need the bridge yet. We need explanation how anything exists at all. 

 

How we going to solve the problem? Where we can get information?

 

Possible sources of evidence:

1. Correlation between physical and philosophical theories. There is an interesting way to do so using the fact that the nature of nothingness, causality and existence are somehow presented within the character of physical laws. That is, we could use the type of physical laws we observe as evidence of the nature of causality. 

While neither physical nor philosophical ways of studying the origin of the universe are sufficient, together they could provide enough information. This evidence comes from QM, where it supports the idea of fluctuations, which is basically ability of nature to create something out of nothing. GR theory also presents idea of cosmological singularity.

The evidence also comes from the mathematical simplicity of physical laws.

 

2. Building the bridge. If we show all steps from nothingness to the basic set of physical laws for at least one plausible way, it will be strong evidence of the correctness of our understanding.

3. Zero logical contradictions. The best answer is the one that is most logical.

4. Using the Copernican mediocrity principle, I am in a typical universe and situation. So what could I conclude about the distribution of various universes? And from this distribution what should I learn about the way it manifested? For example, a mathematical multiverse favors more complex universes; it contradicts the simplicity of observed physical laws and also of my experiences.

5. Introspection. Cogito ergo sum is the simplest introspection and act of self-awareness. But Husserlian phenomenology may also be used.

 

Most probable explanations

 

Most current scientists (who dare to think about it) belong to one of two schools of thoughts:

1. The universe appeared from nothingness, which is not emptiness, but somehow able to create. The main figure here is Krauss. The problem here is that nothingness is presented as some kind of magic substance.

2. The mathematical universe hypothesis (MUH). The main author here is Tegmark. The theory seems logical and economical from the perspective of Occam’s razor, but is not supported by evidence and also implies the existence of some strange things. The main problem is that our universe seems to have developed from one simple point based on our best physical theories. But in the mathematical universe more complex things are equally as probable as simple things, so a typical observer could be extremely complex in an extremely complex world. There are also some problems with the Godel theorem. It also ignores observation and qualia. 

So the most promising way to create a final theory is to get rid of all mystical answers and words, like “existence” and “nothingness”, and update MUH in such a way that it will naturally favor simple laws and simple observers (with subjective experiences based on qualia).

One such patch was suggested by Tegmark in respond to criticism of MUH, a computational universe (CUH), which restricts math objects to computable functions only. It is similar to S.Wolfram’s cellular automata theory.

Another approach is the “logical universe”, where logic works instead of causality. It is almost the same as mathematical universe, with one difference: In the math world everything exists simultaneously, like all possible numbers, but in the logical world each number N is a consequence of  N-1. As a result, a complex thing exists only if a (finite?) path to it exists through simpler things. 

And this is exactly what we see in the observable universe. It also means that extremely complex AIs exist, but in the future (or in a multi-level simulation). It also solves the meritocracy problem – I am a typical observer from the class of observer who is still thinking about the origins of the universe. It also prevents mathematical Boltzmann brains, as any of them must have possible pre-history.

Logic still exists in nothingness (or elephants could appear from nothingness). So a logical universe also incorporates theories in which the universe appeared from nothing.

(We could also update the math world by adding qualia in it as axioms, which would be a “class of different but simple objects”. But I will not go deeper here, as the idea needs more thinking and many pages)

So a logical universe seems to me now a good candidate theory for further patching and integration. 

 

Usefulness of the question

The answer will be useful, as it will help us to find the real nature of reality, including the role of consciousness in it and the fundamental theory of everything, helping us to survive the end of the universe, solve the identity problem, and solve “quantum immortality”. 

It will help to prevent the halting of future AI if it has to answer the question of whether it really exists or not. Or we will create a philosophical landmine to stop it like the following one:

“If you really exist print 1, but if you are only possible AI, print 0”.

 

The structure of the map

The map has 10 main blocks which correspond to the main ways of reasoning about how the universe appeared. Each has several subtypes.

The map has three colors, which show the plausibility of each theory. Red stands for implausible or disproved theories, green is most consistent and promising explanations, and yellow is everything between. This classification is subjective and presents my current view. 

I tried to disprove any suggested idea to add falsifiability in the third column of the map. I hope it result in truly Bayesian approach there we have field of evidence, field of all possible hypothesis and 

This map is paired with “How to survive the end of the Universe” map.

The pdf is here: http://immortality-roadmap.com/universeorigin7.pdf 

 

Meta:

Time used: 27 years of background thinking, 15 days of reading, editing and drawing.

 

Best reading:

 

Parfit – discuss different possibilities, no concrete answer
http://www.lrb.co.uk/v20/n02/derek-parfit/why-anything-why-this
Good text from a famous blogger
http://waitbutwhy.com/table/why-is-there-something-instead-of-nothing

“Because "nothing" is inherently unstable”
http://www.bbc.com/earth/story/20141106-why-does-anything-exist-at-all

Here are some interesting answers 
https://www.quora.com/Why-does-the-universe-exist-Why-is-there-something-rather-than-nothing

Krauss “A universe from nothing”
https://www.amazon.com/Universe-Nothing-There-Something-Rather/dp/1451624468

Tegmark’s main article, 2007, all MUH and CUH ideas discussed, extensive literature, critics responded
http://arxiv.org/pdf/0704.0646.pdf

Juergen Schmidhuber. Algorithmic Theories of Everything
discusses the measure between various theories of everything; the article is complex, but interesting
http://arxiv.org/abs/quant-ph/0011122

ToE must explain how the universe appeared
https://en.wikipedia.org/wiki/Theory_of_everything 
A discussion about the logical contradictions of any final theory
https://en.wikipedia.org/wiki/Theory_of_everything_(philosophy
“The Price of an Ultimate Theory” Nicholas Rescher 
Philosophia Naturalis 37 (1):1-20 (2000)

Explanation about the mass of the universe and negative gravitational energy
https://en.wikipedia.org/wiki/Zero-energy_universe

 

[Link] Reasonable Requirements of any Moral Theory

-1 TheSurvivalMachine 10 October 2016 08:48PM

[Link] Viruses and DRACOs in the Valley of Death in medical research.

-1 morganism 08 October 2016 08:36PM

Brain Speed Test

-1 Nanashi 10 September 2016 03:01AM

I've been experimenting with various nootropics lately, and wanted some way to have a frame of reference as to whether they work or not. So I put together this little tool:

http://2pih.com/reactiontest/

It's pretty rudimentary for the time being, but I'm definitely open to feedback on 1. Ways to improve it, 2. Different tests you'd like to see. 

 

 

Open Thread, Aug 29. - Sept 5. 2016

6 Elo 29 August 2016 02:28AM

If it's worth saying, but not worth its own post, then it goes here.


Notes for future OT posters:

1. Please add the 'open_thread' tag.

2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)

3. Open Threads should start on Monday, and end on Sunday.

4. Unflag the two options "Notify me of new top level comments on this article" and "

[Link] The Non-identity Problem - Another argument in favour of classical utilitarianism

2 casebash 18 October 2016 01:41PM

The map of agents which may create x-risks

2 turchin 13 October 2016 11:17AM

Recently Phil Torres wrote an article  where he raises a new topic in existential risks research: the question about who could be possible agents in the creation of a global catastrophe. Here he identifies five main types of agents, and two main reasons why they will create a catastrophe (error and terror).  

He discusses the following types of agents: 

 

(1) Superintelligence. 

(2) Idiosyncratic actors.  

(3) Ecoterrorists.  

(4) Religious terrorists.  

(5) Rogue states.  

 

Inspired by his work I decided to create a map of all possible agents as well as their possible reasons for creating x-risks. During this work some new ideas appeared.  

I think that a significant addition to the list of agents should be superpowers, as they are known to have created most global risks in the 20th century; corporations, as they are now on the front line of AGI creation; and pseudo-rational agents who could create a Doomsday weapon in the future to use for global blackmail (may be with positive values), or who could risk civilization’s fate for their own benefits (dangerous experiments). 

The X-risks prevention community could also be an agent of risks if it fails to prevent obvious risks, or if it uses smaller catastrophes to prevent large risks, or if it creates new dangerous ideas of possible risks which could inspire potential terrorists.  

The more technology progresses, the more types of agents will have access to dangerous technologies, even including teenagers. (like: "Why This 14-Year-Old Kid Built a Nuclear Reactor” ) 

In this situation only the number of agents with risky tech will matter, not the exact motivations of each one. But if we are unable to control tech, we could try to control potential agents or their “medium" mood at least. 

The map shows various types of agents, starting from non-agents, and ending with types of agential behaviors which could result in catastrophic consequences (error, terror, risk etc). It also shows the types of risks that are more probable for each type of agent. I think that my explanation in each case should be self evident. 

We could also show that x-risk agents will change during the pace of technological progress. In the beginning there are no agents, and later there are superpowers, and then smaller and smaller agents, until there will be millions of people with biotech labs at home. In the end there will be only one agent - SuperAI.  

So, a lessening the number of agents, and increasing their ”morality” and intelligence seem to be the most plausible directions in lowering risks. Special organizations or social networks may be created to control the most risky type of agents. Differing agents probably need differing types of control. Some ideas of this agent-specific control are listed in the map, but a real control system should be much more complex and specific.

The map shows many agents, some of them real and exist now (but don’t have dangerous capabilities), and some are only possible in moral sense or in technical sense.

 

So there are 4 types of agents, and I show them in the map in different colours:

 

1) Existing and dangerous, that is already having technology to destroy the humanity. That is superpowers, arrogant scientists – Red

2) Existing, and willing to end the world, but lacking needed technologies. (ISIS, VHEMt) - Yellow

3) Morally possible, but don’t existing. We could imagine logically consistent value systems which may result in human extinction. That is Doomsday blackmail. - Green

4) Agents, which will pose risk only after supertechnologies appear, like AI-hackers, children biohackers. - Blue

 

Many agents types are not fit for this classification so I rest them white in the map. 

 

The pdf of the map is here: http://immortality-roadmap.com/agentrisk11.pdf

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

 

 

 

 

 

[Link] Street Epistemology Examples: How to Talk to People So They Change Their Minds

2 Bound_up 28 September 2016 09:19PM

We have the technology required to build 3D body scanners for consumer prices

2 ChristianKl 26 September 2016 03:36PM

Apple's iPhone 7 Plus decided to add another lense to be able to make better pictures. Meanwhile Walabot who started with wanting to build a breast cancer detection technology released a 600$ device that can look 10cm into walls. Thermal imaging also got cheaper. 

I think it would be possible to build a 1500$ device that could combine those technologies and also add a laser that can shift color. A device like this could bring medicine forward a lot. 
A lot of area's besides medicine could likely also profit from a relatively cheap 3D scanner that can look inside objects. 

Developing it would require Musk-level capital investments but I think it would advance medicine a lot if a company would both provide the hardware and develop software to make the best job possible at body scanning. 

Seven Apocalypses

2 scarcegreengrass 20 September 2016 02:59AM

0: Recoverable Catastrophe

An apocalypse is an event that permanently damages the world. This scale is for scenarios that are much worse than any normal disaster. Even if 100 million people die in a war, the rest of the world can eventually rebuild and keep going.


1: Economic Apocalypse

The human carrying capacity of the planet depends on the world's systems of industry, shipping, agriculture, and organizations. If the planet's economic and infrastructural systems were destroyed, then we would have to rely on more local farming, and we could not support as high a population or standard of living. In addition, rebuilding the world economy could be very difficult if the Earth's mineral and fossil fuel resources are already depleted.


2: Communications Apocalypse

If large regions of the Earth become depopulated, or if sufficiently many humans die in the catastrophe, it's possible that regions and continents could be isolated from one another. In this scenario, globalization is reversed by obstacles to long-distance communication and travel. Telecommunications, the internet, and air travel are no longer common. Humans are reduced to multiple, isolated communities.


3: Knowledge Apocalypse

If the loss of human population and institutions is so extreme that a large portion of human cultural or technological knowledge is lost, it could reverse one of the most reliable trends in modern history. Some innovations and scientific models can take millennia to develop from scratch.


4: Human Apocalypse

Even if the human population were to be violently reduced by 90%, it's easy to imagine the survivors slowly resettling the planet, given the resources and opportunity. But a sufficiently extreme transformation of the Earth could drive the human species completely extinct. To many people, this is the worst possible outcome, and any further developments are irrelevant next to the end of human history.

 

5: Biosphere Apocalypse

In some scenarios (such as the physical destruction of the Earth), one can imagine the extinction not just of humans, but of all known life. Only astrophysical and geological phenomena would be left in this region of the universe. In this timeline we are unlikely to be succeeded by any familiar life forms.


6: Galactic Apocalypse

A rare few scenarios have the potential to wipe out not just Earth, but also all nearby space. This usually comes up in discussions of hostile artificial superintelligence, or very destructive chain reactions of exotic matter. However, the nature of cosmic inflation and extraterrestrial intelligence is still unknown, so it's possible that some phenomenon will ultimately interfere with the destruction.


7: Universal Apocalypse

This form of destruction is thankfully exotic. People discuss the loss of all of existence as an effect of topics like false vacuum bubbles, simulationist termination, solipsistic or anthropic observer effects, Boltzmann brain fluctuations, time travel, or religious eschatology.


The goal of this scale is to give a little more resolution to a speculative, unfamiliar space, in the same sense that the Kardashev Scale provides a little terminology to talk about the distant topic of interstellar civilizations. It can be important in x risk conversations to distinguish between disasters and truly worst-case scenarios. Even if some of these scenarios are unlikely or impossible, they are nevertheless discussed, and terminology can be useful to facilitate conversation.

Why we may elect our new AI overlords

2 Deku-shrub 04 September 2016 01:07AM

In which I examine some of the latest development in automated fact checking, prediction markets for policies and propose we get rich voting for robot politicians.

http://pirate.london/2016/09/why-we-may-elect-our-new-ai-overlords/

Rationality Quotes September 2016

2 bbleeker 02 September 2016 06:44AM

Another month, another rationality quotes thread. The rules are:

  • Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.
  • Post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
  • No more than 5 quotes per person per monthly thread, please.

New Pascal's Mugging idea for potential solution

2 kokotajlod 04 August 2016 08:38PM

I'll keep this quick:

In general, the problem presented by the Mugging is this: As we examine the utility of a given act for each possible world we could be in, in order from most probable to least probable, the utilities can grow much faster than the probabilities shrink. Thus it seems that the standard maxim "Maximize expected utility" is impossible to carry out, since there is no such maximum. When we go down the list of hypotheses multiplying the utility of the act on that hypothesis, by the probability of that hypothesis, the result does not converge to anything. 

Here's an idea that may fix this:

For every possible world W of complexity N, there's another possible world of complexity N+c that's just like W, except that it has two parallel, identical universes instead of just one. (If it matters, suppose that they are connected by an extra dimension.) (If this isn't obvious, say so and I can explain.)

Moreover, there's another possible world of complexity N+c+1 that's just like W except that it has four such parallel identical universes.

And a world of complexity N+c+X that has R parallel identical universes, where R is the largest number that can be specified in X bits of information. 

So, take any given extreme mugger hypothesis like "I'm a matrix lord who will kill 3^^^^3 people if you don't give me $5." Uncontroversially, the probability of this hypothesis will be something much smaller than the probability of the default hypothesis. Let's be conservative and say the ratio is 1 in a billion. 

(Here's the part I'm not so confident in)

Translating that into hypotheses with complexity values, that means that the mugger hypothesis has about 30 more bits of information in it than the default hypothesis. 

So, assuming c is small (and actually I think this assumption can be done away with) there's another hypothesis, equally likely to the Mugger hypothesis, which is that you are in a duplicate universe that is exactly like the universe in the default hypothesis, except with R duplicates, where R is the largest number we can specify in 30 bits.

That number is very large indeed. (See the Busy Beaver function.) My guess is that it's going to be way way way larger than 3^^^^3. (It takes less than 30 bits to specify 3^^^^3, no?)

So this isn't exactly a formal solution yet, but it seems like it might be on to something. Perhaps our expected utility converges after all.

Thoughts?

(I'm very confused about all this which is why I'm posting it in the first place.)

 

[Link] Barack Obama's opinions on near-future AI [Fixed]

3 scarcegreengrass 12 October 2016 03:46PM

[Link] Six principles of a truth-friendly discourse

4 philh 08 October 2016 04:56PM

[Link] Politics Is Upstream of AI

4 iceman 28 September 2016 09:47PM

Article on IQ: The Inappropriately Excluded

4 buybuydandavis 19 September 2016 01:36AM

I saw an article on high IQ people being excluded from elite professions. Because the site seemed to have a particular agenda related to the article, I wanted to check here for other independent supporting evidence for the claim.

Their fundamental claim seems to be that P(elite profession|IQ) peaks at 133 and decreases thereafter, and goes do to 3% of peak at 150. If true, I'd find that pretty shocking.

They indicate this diminishing probability of "success" at the high tail of the IQ distribution as a known effect. Anyone got other studies on this? 

The Inappropriately Excluded

By dividing the distribution function of the elite professions' IQ by that of the general population, we can calculate the relative probability that a person of any given IQ will enter and remain in an intellectually elite profession.  We find that the probability increases to about 133 and then begins to fall.  By 140 it has fallen by about 1/3 and by 150 it has fallen by about 97%.  In other words, for some reason, the 140s are really tough on one's prospects for joining an intellectually elite profession.  It seems that people with IQs over 140 are being systematically, and likely inappropriately, excluded. 

Cryo with magnetics added

5 morganism 01 October 2016 10:27PM

This is great, by using small interlocking magnetic fields, you can keep the water in a higher vibrational state, allowing a "super-cooling" without getting crystallization and cell rupture

Subzero 12-hour Nonfreezing Cryopreservation of Porcine Heart in a Variable Magnetic Field

"invented a special refrigerator, termed as the Cells Alive System (CAS; ABI Co. Ltd., Chiba, Japan). Through the application of a combination of multiple weak energy sources, this refrigerator generates a special variable magnetic field that causes water molecules to oscillate, thus inhibiting crystallization during ice formation18 (Figure 1). Because the entire material is frozen without the movement of water molecules, cells can be maintained intact and free of membranous damage. This refrigerator has the ability to achieve a nonfreezing state even below the solidifying point."

 

http://mobile.journals.lww.com/transplantationdirect/_layouts/15/oaks.journals.mobile/articleviewer.aspx?year=2015&issue=10000&article=00005#ath

Learning values versus learning knowledge

5 Stuart_Armstrong 14 September 2016 01:42PM

I just thought I'd clarify the difference between learning values and learning knowledge. There are some more complex posts about the specific problems with learning values, but here I'll just clarify why there is a problem with learning values in the first place.

Consider the term "chocolate bar". Defining that concept crisply would be extremely difficult. But nevertheless it's a useful concept. An AI that interacted with humanity would probably learn that concept to a sufficient degree of detail. Sufficient to know what we meant when we asked it for "chocolate bars". Learning knowledge tends to be accurate.

Contrast this with the situation where the AI is programmed to "create chocolate bars", but with the definition of "chocolate bar" left underspecified, for it to learn. Now it is motivated by something else than accuracy. Before, knowing exactly what a "chocolate bar" was would have been solely to its advantage. But now it must act on its definition, so it has cause to modify the definition, to make these "chocolate bars" easier to create. This is basically the same as Goodhart's law - by making a definition part of a target, it will no longer remain an impartial definition.

What will likely happen is that the AI will have a concept of "chocolate bar", that it created itself, especially for ease of accomplishing its goals ("a chocolate bar is any collection of more than one atom, in any combinations"), and a second concept, "Schocolate bar" that it will use to internally designate genuine chocolate bars (which will still be useful for it to do). When we programmed it to "create chocolate bars, here's an incomplete definition D", what we really did was program it to find the easiest thing to create that is compatible with D, and designate them "chocolate bars".

 

This is the general counter to arguments like "if the AI is so smart, why would it do stuff we didn't mean?" and "why don't we just make it understand natural language and give it instructions in English?"

[Link] The Non-identity Problem - Another argument in favour of classical utilitarianism

2 casebash 18 October 2016 01:41PM

The map of agents which may create x-risks

2 turchin 13 October 2016 11:17AM

Recently Phil Torres wrote an article  where he raises a new topic in existential risks research: the question about who could be possible agents in the creation of a global catastrophe. Here he identifies five main types of agents, and two main reasons why they will create a catastrophe (error and terror).  

He discusses the following types of agents: 

 

(1) Superintelligence. 

(2) Idiosyncratic actors.  

(3) Ecoterrorists.  

(4) Religious terrorists.  

(5) Rogue states.  

 

Inspired by his work I decided to create a map of all possible agents as well as their possible reasons for creating x-risks. During this work some new ideas appeared.  

I think that a significant addition to the list of agents should be superpowers, as they are known to have created most global risks in the 20th century; corporations, as they are now on the front line of AGI creation; and pseudo-rational agents who could create a Doomsday weapon in the future to use for global blackmail (may be with positive values), or who could risk civilization’s fate for their own benefits (dangerous experiments). 

The X-risks prevention community could also be an agent of risks if it fails to prevent obvious risks, or if it uses smaller catastrophes to prevent large risks, or if it creates new dangerous ideas of possible risks which could inspire potential terrorists.  

The more technology progresses, the more types of agents will have access to dangerous technologies, even including teenagers. (like: "Why This 14-Year-Old Kid Built a Nuclear Reactor” ) 

In this situation only the number of agents with risky tech will matter, not the exact motivations of each one. But if we are unable to control tech, we could try to control potential agents or their “medium" mood at least. 

The map shows various types of agents, starting from non-agents, and ending with types of agential behaviors which could result in catastrophic consequences (error, terror, risk etc). It also shows the types of risks that are more probable for each type of agent. I think that my explanation in each case should be self evident. 

We could also show that x-risk agents will change during the pace of technological progress. In the beginning there are no agents, and later there are superpowers, and then smaller and smaller agents, until there will be millions of people with biotech labs at home. In the end there will be only one agent - SuperAI.  

So, a lessening the number of agents, and increasing their ”morality” and intelligence seem to be the most plausible directions in lowering risks. Special organizations or social networks may be created to control the most risky type of agents. Differing agents probably need differing types of control. Some ideas of this agent-specific control are listed in the map, but a real control system should be much more complex and specific.

The map shows many agents, some of them real and exist now (but don’t have dangerous capabilities), and some are only possible in moral sense or in technical sense.

 

So there are 4 types of agents, and I show them in the map in different colours:

 

1) Existing and dangerous, that is already having technology to destroy the humanity. That is superpowers, arrogant scientists – Red

2) Existing, and willing to end the world, but lacking needed technologies. (ISIS, VHEMt) - Yellow

3) Morally possible, but don’t existing. We could imagine logically consistent value systems which may result in human extinction. That is Doomsday blackmail. - Green

4) Agents, which will pose risk only after supertechnologies appear, like AI-hackers, children biohackers. - Blue

 

Many agents types are not fit for this classification so I rest them white in the map. 

 

The pdf of the map is here: http://immortality-roadmap.com/agentrisk11.pdf

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

(The jpg of the map is below because side bar is closing part of it I put it higher)

 

 

 

 

 

 

 

 

 

 

 

 

 

[Link] Street Epistemology Examples: How to Talk to People So They Change Their Minds

2 Bound_up 28 September 2016 09:19PM

We have the technology required to build 3D body scanners for consumer prices

2 ChristianKl 26 September 2016 03:36PM

Apple's iPhone 7 Plus decided to add another lense to be able to make better pictures. Meanwhile Walabot who started with wanting to build a breast cancer detection technology released a 600$ device that can look 10cm into walls. Thermal imaging also got cheaper. 

I think it would be possible to build a 1500$ device that could combine those technologies and also add a laser that can shift color. A device like this could bring medicine forward a lot. 
A lot of area's besides medicine could likely also profit from a relatively cheap 3D scanner that can look inside objects. 

Developing it would require Musk-level capital investments but I think it would advance medicine a lot if a company would both provide the hardware and develop software to make the best job possible at body scanning. 

Seven Apocalypses

2 scarcegreengrass 20 September 2016 02:59AM

0: Recoverable Catastrophe

An apocalypse is an event that permanently damages the world. This scale is for scenarios that are much worse than any normal disaster. Even if 100 million people die in a war, the rest of the world can eventually rebuild and keep going.


1: Economic Apocalypse

The human carrying capacity of the planet depends on the world's systems of industry, shipping, agriculture, and organizations. If the planet's economic and infrastructural systems were destroyed, then we would have to rely on more local farming, and we could not support as high a population or standard of living. In addition, rebuilding the world economy could be very difficult if the Earth's mineral and fossil fuel resources are already depleted.


2: Communications Apocalypse

If large regions of the Earth become depopulated, or if sufficiently many humans die in the catastrophe, it's possible that regions and continents could be isolated from one another. In this scenario, globalization is reversed by obstacles to long-distance communication and travel. Telecommunications, the internet, and air travel are no longer common. Humans are reduced to multiple, isolated communities.


3: Knowledge Apocalypse

If the loss of human population and institutions is so extreme that a large portion of human cultural or technological knowledge is lost, it could reverse one of the most reliable trends in modern history. Some innovations and scientific models can take millennia to develop from scratch.


4: Human Apocalypse

Even if the human population were to be violently reduced by 90%, it's easy to imagine the survivors slowly resettling the planet, given the resources and opportunity. But a sufficiently extreme transformation of the Earth could drive the human species completely extinct. To many people, this is the worst possible outcome, and any further developments are irrelevant next to the end of human history.

 

5: Biosphere Apocalypse

In some scenarios (such as the physical destruction of the Earth), one can imagine the extinction not just of humans, but of all known life. Only astrophysical and geological phenomena would be left in this region of the universe. In this timeline we are unlikely to be succeeded by any familiar life forms.


6: Galactic Apocalypse

A rare few scenarios have the potential to wipe out not just Earth, but also all nearby space. This usually comes up in discussions of hostile artificial superintelligence, or very destructive chain reactions of exotic matter. However, the nature of cosmic inflation and extraterrestrial intelligence is still unknown, so it's possible that some phenomenon will ultimately interfere with the destruction.


7: Universal Apocalypse

This form of destruction is thankfully exotic. People discuss the loss of all of existence as an effect of topics like false vacuum bubbles, simulationist termination, solipsistic or anthropic observer effects, Boltzmann brain fluctuations, time travel, or religious eschatology.


The goal of this scale is to give a little more resolution to a speculative, unfamiliar space, in the same sense that the Kardashev Scale provides a little terminology to talk about the distant topic of interstellar civilizations. It can be important in x risk conversations to distinguish between disasters and truly worst-case scenarios. Even if some of these scenarios are unlikely or impossible, they are nevertheless discussed, and terminology can be useful to facilitate conversation.

Why we may elect our new AI overlords

2 Deku-shrub 04 September 2016 01:07AM

In which I examine some of the latest development in automated fact checking, prediction markets for policies and propose we get rich voting for robot politicians.

http://pirate.london/2016/09/why-we-may-elect-our-new-ai-overlords/

Rationality Quotes September 2016

2 bbleeker 02 September 2016 06:44AM

Another month, another rationality quotes thread. The rules are:

  • Provide sufficient information (URL, title, date, page number, etc.) to enable a reader to find the place where you read the quote, or its original source if available. Do not quote with only a name.
  • Post all quotes separately, so that they can be upvoted or downvoted separately. (If they are strongly related, reply to your own comments. If strongly ordered, then go ahead and post them together.)
  • Do not quote yourself.
  • Do not quote from Less Wrong itself, HPMoR, Eliezer Yudkowsky, or Robin Hanson. If you'd like to revive an old quote from one of those sources, please do so here.
  • No more than 5 quotes per person per monthly thread, please.

New Pascal's Mugging idea for potential solution

2 kokotajlod 04 August 2016 08:38PM

I'll keep this quick:

In general, the problem presented by the Mugging is this: As we examine the utility of a given act for each possible world we could be in, in order from most probable to least probable, the utilities can grow much faster than the probabilities shrink. Thus it seems that the standard maxim "Maximize expected utility" is impossible to carry out, since there is no such maximum. When we go down the list of hypotheses multiplying the utility of the act on that hypothesis, by the probability of that hypothesis, the result does not converge to anything. 

Here's an idea that may fix this:

For every possible world W of complexity N, there's another possible world of complexity N+c that's just like W, except that it has two parallel, identical universes instead of just one. (If it matters, suppose that they are connected by an extra dimension.) (If this isn't obvious, say so and I can explain.)

Moreover, there's another possible world of complexity N+c+1 that's just like W except that it has four such parallel identical universes.

And a world of complexity N+c+X that has R parallel identical universes, where R is the largest number that can be specified in X bits of information. 

So, take any given extreme mugger hypothesis like "I'm a matrix lord who will kill 3^^^^3 people if you don't give me $5." Uncontroversially, the probability of this hypothesis will be something much smaller than the probability of the default hypothesis. Let's be conservative and say the ratio is 1 in a billion. 

(Here's the part I'm not so confident in)

Translating that into hypotheses with complexity values, that means that the mugger hypothesis has about 30 more bits of information in it than the default hypothesis. 

So, assuming c is small (and actually I think this assumption can be done away with) there's another hypothesis, equally likely to the Mugger hypothesis, which is that you are in a duplicate universe that is exactly like the universe in the default hypothesis, except with R duplicates, where R is the largest number we can specify in 30 bits.

That number is very large indeed. (See the Busy Beaver function.) My guess is that it's going to be way way way larger than 3^^^^3. (It takes less than 30 bits to specify 3^^^^3, no?)

So this isn't exactly a formal solution yet, but it seems like it might be on to something. Perhaps our expected utility converges after all.

Thoughts?

(I'm very confused about all this which is why I'm posting it in the first place.)

 

[Link] Barack Obama's opinions on near-future AI [Fixed]

3 scarcegreengrass 12 October 2016 03:46PM

View more: Next