Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.
A frequently-raised worry about AI is that it may reason in ways which are very different from us, and understand the world in a very alien manner. For example, Armstrong, Sandberg & Bostrom (2012) consider the possibility of restricting an AI via "rule-based motivational control" and programming it to follow restrictions like "stay within this lead box here", but they raise worries about the difficulty of rigorously defining "this lead box here". To address this, they go on to consider the possibility of making an AI internalize human concepts via feedback, with the AI being told whether or not some behavior is good or bad and then constructing a corresponding world-model based on that. The authors are however worried that this may fail, because
Humans seem quite adept at constructing the correct generalisations – most of us have correctly deduced what we should/should not be doing in general situations (whether or not we follow those rules). But humans share a common of genetic design, which the OAI would likely not have. Sharing, for instance, derives partially from genetic predisposition to reciprocal altruism: the OAI may not integrate the same concept as a human child would. Though reinforcement learning has a good track record, it is neither a panacea nor a guarantee that the OAIs generalisations agree with ours.
Addressing this, a possibility that I raised in Sotala (2015) was that possibly the concept-learning mechanisms in the human brain are actually relatively simple, and that we could replicate the human concept learning process by replicating those rules. I'll start this post by discussing a closely related hypothesis: that given a specific learning or reasoning task and a certain kind of data, there is an optimal way to organize the data that will naturally emerge. If this were the case, then AI and human reasoning might naturally tend to learn the same kinds of concepts, even if they were using very different mechanisms. Later on the post, I will discuss how one might try to verify that similar representations had in fact been learned, and how to set up a system to make them even more similar.
A particularly fascinating branch of recent research relates to the learning of word embeddings, which are mappings of words to very high-dimensional vectors. It turns out that if you train a system on one of several kinds of tasks, such as being able to classify sentences as valid or invalid, this builds up a space of word vectors that reflects the relationships between the words. For example, there seems to be a male/female dimension to words, so that there's a "female vector" that we can add to the word "man" to get "woman" - or, equivalently, which we can subtract from "woman" to get "man". And it so happens (Mikolov, Yih & Zweig 2013) that we can also get from the word "king" to the word "queen" by adding the same vector to "king". In general, we can (roughly) get to the male/female version of any word vector by adding or subtracting this one difference vector!
Why would this happen? Well, a learner that needs to classify sentences as valid or invalid needs to classify the sentence "the king sat on his throne" as valid while classifying the sentence "the king sat on her throne" as invalid. So including a gender dimension on the built-up representation makes sense.
But gender isn't the only kind of relationship that gets reflected in the geometry of the word space. Here are a few more:
It turns out (Mikolov et al. 2013) that with the right kind of training mechanism, a lot of relationships that we're intuitively aware of become automatically learned and represented in the concept geometry. And like Olah (2014) comments:
It’s important to appreciate that all of these properties of W are side effects. We didn’t try to have similar words be close together. We didn’t try to have analogies encoded with difference vectors. All we tried to do was perform a simple task, like predicting whether a sentence was valid. These properties more or less popped out of the optimization process.
This seems to be a great strength of neural networks: they learn better ways to represent data, automatically. Representing data well, in turn, seems to be essential to success at many machine learning problems. Word embeddings are just a particularly striking example of learning a representation.
It gets even more interesting, for we can use these for translation. Since Olah has already written an excellent exposition of this, I'll just quote him:
We can learn to embed words from two different languages in a single, shared space. In this case, we learn to embed English and Mandarin Chinese words in the same space.
We train two word embeddings, Wen and Wzh in a manner similar to how we did above. However, we know that certain English words and Chinese words have similar meanings. So, we optimize for an additional property: words that we know are close translations should be close together.
Of course, we observe that the words we knew had similar meanings end up close together. Since we optimized for that, it’s not surprising. More interesting is that words we didn’t know were translations end up close together.
In light of our previous experiences with word embeddings, this may not seem too surprising. Word embeddings pull similar words together, so if an English and Chinese word we know to mean similar things are near each other, their synonyms will also end up near each other. We also know that things like gender differences tend to end up being represented with a constant difference vector. It seems like forcing enough points to line up should force these difference vectors to be the same in both the English and Chinese embeddings. A result of this would be that if we know that two male versions of words translate to each other, we should also get the female words to translate to each other.
Intuitively, it feels a bit like the two languages have a similar ‘shape’ and that by forcing them to line up at different points, they overlap and other points get pulled into the right positions.
After this, it gets even more interesting. Suppose you had this space of word vectors, and then you also had a system which translated images into vectors in the same space. If you have images of dogs, you put them near the word vector for dog. If you have images of Clippy you put them near word vector for "paperclip". And so on.
You do that, and then you take some class of images the image-classifier was never trained on, like images of cats. You ask it to place the cat-image somewhere in the vector space. Where does it end up?
You guessed it: in the rough region of the "cat" words. Olah once more:
This was done by members of the Stanford group with only 8 known classes (and 2 unknown classes). The results are already quite impressive. But with so few known classes, there are very few points to interpolate the relationship between images and semantic space off of.
The Google group did a much larger version – instead of 8 categories, they used 1,000 – around the same time (Frome et al. (2013)) and has followed up with a new variation (Norouzi et al. (2014)). Both are based on a very powerful image classification model (from Krizehvsky et al. (2012)), but embed images into the word embedding space in different ways.
The results are impressive. While they may not get images of unknown classes to the precise vector representing that class, they are able to get to the right neighborhood. So, if you ask it to classify images of unknown classes and the classes are fairly different, it can distinguish between the different classes.
Even though I’ve never seen a Aesculapian snake or an Armadillo before, if you show me a picture of one and a picture of the other, I can tell you which is which because I have a general idea of what sort of animal is associated with each word. These networks can accomplish the same thing.
These algorithms made no attempt of being biologically realistic in any way. They didn't try classifying data the way the brain does it: they just tried classifying data using whatever worked. And it turned out that this was enough to start constructing a multimodal representation space where a lot of the relationships between entities were similar to the way humans understand the world.
How useful is this?
"Well, that's cool", you might now say. "But those word spaces were constructed from human linguistic data, for the purpose of predicting human sentences. Of course they're going to classify the world in the same way as humans do: they're basically learning the human representation of the world. That doesn't mean that an autonomously learning AI, with its own learning faculties and systems, is necessarily going to learn a similar internal representation, or to have similar concepts."
This is a fair criticism. But it is mildly suggestive of the possibility that an AI that was trained to understand the world via feedback from human operators would end up building a similar conceptual space. At least assuming that we chose the right learning algorithms.
When we train a language model to classify sentences by labeling some of them as valid and others as invalid, there's a hidden structure implicit in our answers: the structure of how we understand the world, and of how we think of the meaning of words. The language model extracts that hidden structure and begins to classify previously unseen things in terms of those implicit reasoning patterns. Similarly, if we gave an AI feedback about what kinds of actions counted as "leaving the box" and which ones didn't, there would be a certain way of viewing and conceptualizing the world implied by that feedback, one which the AI could learn.
"Hmm, maaaaaaaaybe", is your skeptical answer. "But how would you ever know? Like, you can test the AI in your training situation, but how do you know that it's actually acquired a similar-enough representation and not something wildly off? And it's one thing to look at those vector spaces and claim that there are human-like relationships among the different items, but that's still a little hand-wavy. We don't actually know that the human brain does anything remotely similar to represent concepts."
Here we turn, for a moment, to neuroscience.
Multivariate Cross-Classification (MVCC) is a clever neuroscience methodology used for figuring out whether different neural representations of the same thing have something in common. For example, we may be interested in whether the visual and tactile representation of a banana have something in common.
We can test this by having several test subjects look at pictures of objects such as apples and bananas while sitting in a brain scanner. We then feed the scans of their brains into a machine learning classifier and teach it to distinguish between the neural activity of looking at an apple, versus the neural activity of looking at a banana. Next we have our test subjects (still sitting in the brain scanners) touch some bananas and apples, and ask our machine learning classifier to guess whether the resulting neural activity is the result of touching a banana or an apple. If the classifier - which has not been trained on the "touch" representations, only on the "sight" representations - manages to achieve a better-than-chance performance on this latter task, then we can conclude that the neural representation for e.g. "the sight of a banana" has something in common with the neural representation for "the touch of a banana".
A particularly fascinating experiment of this type is that of Shinkareva et al. (2011), who showed their test subjects both the written words for different tools and dwellings, and, separately, line-drawing images of the same tools and dwellings. A machine-learning classifier was both trained on image-evoked activity and made to predict word-evoked activity and vice versa, and achieved a high accuracy on category classification for both tasks. Even more interestingly, the representations seemed to be similar between subjects. Training the classifier on the word representations of all but one participant, and then having it classify the image representation of the left-out participant, also achieved a reliable (p<0.05) category classification for 8 out of 12 participants. This suggests a relatively similar concept space between humans of a similar background.
We can now hypothesize some ways of testing the similarity of the AI's concept space with that of humans. Possibly the most interesting one might be to develop a translation between a human's and an AI's internal representations of concepts. Take a human's neural activation when they're thinking of some concept, and then take the AI's internal activation when it is thinking of the same concept, and plot them in a shared space similar to the English-Mandarin translation. To what extent do the two concept geometries have similar shapes, allowing one to take a human's neural activation of the word "cat" to find the AI's internal representation of the word "cat"? To the extent that this is possible, one could probably establish that the two share highly similar concept systems.
One could also try to more explicitly optimize for such a similarity. For instance, one could train the AI to make predictions of different concepts, with the additional constraint that its internal representation must be such that a machine-learning classifier trained on a human's neural representations will correctly identify concept-clusters within the AI. This might force internal similarities on the representation beyond the ones that would already be formed from similarities in the data.
Next post in series: The problem of alien concepts.
(Cross-posted from my blog.)
Sometimes at LW meetups, I'll want to raise a topic for discussion. But we're currently already talking about something, so I'll wait for a lull in the current conversation. But it feels like the duration of lull needed before I can bring up something totally unrelated, is longer than the duration of lull before someone else will bring up something marginally related. And so we can go for a long time, with the topic frequently changing incidentally, but without me ever having a chance to change it deliberately.
Which is fine. I shouldn't expect people to want to talk about something just because I want to talk about it, and it's not as if I find the actual conversation boring. But it's not necessarily optimal. People might in fact want to talk about the same thing as me, and following the path of least resistance in a conversation is unlikely to result in the best possible conversation.
At the last meetup I had two topics that I wanted to raise, and realized that I had no way of raising them, which was a third topic worth raising. So when an interruption occured in the middle of someone's thought - a new person arrived, and we did the "hi, welcome, join us" thing - I jumped in. "Before you start again, I have three things I'd like to talk about at some point, but not now. Carry on." Then he started again, and when that topic was reasonably well-trodden, he prompted me to transition.
Then someone else said that he also had two things he wanted to talk about, and could I just list my topics and then he'd list his? (It turns out that no I couldn't. You can't dangle an interesting train of thought in front of the London LW group and expect them not to follow it. But we did manage to initially discuss them only briefly.)
This worked pretty well. Someone more conversationally assertive than me might have been able to take advantage of a less solid interruption than the one I used. Someone less assertive might not have been able to use that one.
What else could we do to solve this problem?
Someone suggested a hand signal: if you think of something that you'd like to raise for discussion later, make the signal. I don't think this is ideal, because it's not continuous. You make it once, and then it would be easy for people to forget, or just to not notice.
I think what I'm going to do is bring some poker chips to the next meetup. I'll put a bunch in the middle, and if you have a topic that you want to raise at some future point, you take one and put it in front of you. Then if a topic seems to be dying out, someone can say "<person>, what did you want to talk about?"
I guess this still needs at least one person assertive enough to do that. I imagine it would be difficult for me. But the person who wants to raise the topic doesn't need to be assertive, they just need to grab a poker chip. It's a fairly obvious gesture, so probably people will notice, and it's easy to just look and see for a reminder of whether anyone wants to raise anything. (Assuming the table isn't too messy, which might be a problem.)
I don't know how well this will work, but it seems worth experimenting.
(I'll also take a moment to advocate another conversation-signal that we adopted, via CFAR. If someone says something and you want to tell people that you agree with them, instead of saying that out loud, you can just raise your hands a little and wiggle your fingers. Reduces interruptions, gives positive feedback to the speaker, and it's kind of fun.)
Astronomical research has what may be an under-appreciated role in helping us understand and possibly avoiding the Great Filter. This post will examine how astronomy may be helpful for identifying potential future filters. The primary upshot is that we may have an advantage due to our somewhat late arrival: if we can observe what other civilizations have done wrong, we can get a leg up.
This post is not arguing that colonization is a route to remove some existential risks. There is no question that colonization will reduce the risk of many forms of Filters, but the vast majority of astronomical work has no substantial connection to colonization. Moreover, the case for colonization has been made strongly by many others already, such as Robert Zubrin's book "The Case for Mars" or this essay by Nick Bostrom.
Note: those already familiar with the Great Filter and proposed explanations may wish to skip to the section "How can we substantially improve astronomy in the short to medium term?"
What is the Great Filter?
There is a worrying lack of signs of intelligent life in the universe. The only intelligent life we have detected has been that on Earth. While planets are apparently numerous, there have been no signs of other life. There are three possible lines of evidence we would expect to see if civilizations were common in the universe: radio signals, direct contact, and large-scale constructions. The first two of these issues are well-known, but the most serious problem arises from the lack of large-scale constructions: as far as we can tell the universe look natural. The vast majority of matter and energy in the universe appears to be unused. The Great Filter is one possible explanation for this lack of life, namely that some phenomenon prevents intelligent life from passing into the interstellar, large-scale phase. Variants of the idea have been floating around for a long time; the term was first coined by Robin Hanson in this essay. There are two fundamental versions of the Filter: filtration which has occurred in our past, and Filtration which will occur in our future. For obvious reasons the second of the two is more of a concern. Moreover, as our technological level increases, the chance that we are getting to the last point of serious filtration gets higher since as one has a civilization spread out to multiple stars, filtration becomes more difficult.
Evidence for the Great Filter and alternative explanations:
At this point, over the last few years, the only major updates to the situation involving the Filter since Hanson's essay have been twofold:
First, we have confirmed that planets are very common, so a lack of Earth-size planets or planets in the habitable zone are not likely to be a major filter.
Second, we have found that planet formation occurred early in the universe. (For example see this article about this paper.) Early planet formation weakens the common explanation of the Fermi paradox that the argument that some species had to be the first intelligent species and we're simply lucky. Early planet formation along with the apparent speed at which life arose on Earth after the heavy bombardment ended, as well as the apparent speed with which complex life developed from simple life, strongly refutes this explanation. The response has been made that early filtration may be so common that if life does not arise early on a planet's star's lifespan, then it will have no chance to reach civilization. However, if this were the case, we'd expect to have found ourselves orbiting a more long-lived star like a red dwarf. Red dwarfs are more common than sun-like stars and have much longer lifespans by multiple orders of magnitude. While attempts to understand the habitable zone of red dwarfs are still ongoing, current consensus is that many red dwarfs contain habitable planets.
These two observations, together with further evidence that the universe looks natural makes future filtration seem likely. If advanced civilizations existed, we would expect them to make use of the large amounts of matter and energy available. We see no signs of such use. We've seen no indication of ring-worlds, Dyson spheres, or other megascale engineering projects. While such searches have so far been confined to around 300 parsecs and some candidates were hard to rule out, if a substantial fraction of stars in a galaxy have Dyson spheres or swarms we would notice the unusually high infrared spectrum. Note that this sort of evidence is distinct from arguments about contact or about detecting radio signals. There's a very recent proposal for mini-Dyson spheres around white dwarfs which would be much easier to engineer and harder to detect, but they would not reduce the desirability of other large-scale structures, and they would likely be detectable if there were a large number of them present in a small region. One recent study looked for signs of large-scale modification to the radiation profile of galaxies in a way that should show presence of large scale civilizations. They looked at 100,000 galaxies and found no major sign of technologically advanced civilizations (for more detail see here).
We will not discuss all possible rebuttals to case for a Great Filter but will note some of the more interesting ones:
There have been attempts to argue that the universe only became habitable more recently. There are two primary avenues for this argument. First, there is the point that early stars had very low metallicity (that is had low concentrations of elements other than hydrogen and helium) and thus the universe would have had too low a metal level for complex life. The presence of old rocky planets makes this argument less viable, and this only works for the first few billion years of history. Second, there's an argument that until recently galaxies were more likely to have frequent gamma bursts. In that case, life would have been wiped out too frequently to evolve in a complex fashion. However, even the strongest version of this argument still leaves billions of years of time unexplained.
There have been attempts to argue that space travel may be very difficult. For example, Geoffrey Landis proposed that a percolation model, together with the idea that interstellar travel is very difficult, may explain the apparent rarity of large-scale civilizations. However, at this point, there's no strong reason to think that interstellar travel is so difficult as to limit colonization to that extent. Moreover, discoveries made in the last 20 years that brown dwarfs are very common and that most stars do contain planets is evidence in the opposite direction: these brown dwarfs as well as common planets would make travel easier because there are more potential refueling and resupply locations even if they are not used for full colonization. Others have argued that even without such considerations, colonization should not be that difficult. Moreover, if colonization is difficult and civilizations end up restricted to small numbers of nearby stars, then it becomes more, not less, likely that civilizations will attempt the large-scale engineering projects that we would notice.
Another possibility is that we are underestimating the general growth rate of the resources used by civilizations, and so while extrapolating now makes it plausible that large-scale projects and endeavors will occur, it becomes substantially more difficult to engage in very energy intensive projects like colonization. Rather than a continual, exponential or close to exponential growth rate, we may expect long periods of slow growth or stagnation. This cannot be ruled out, but even if growth continues at only slightly higher than linear rate, the energy expenditures available in a few thousand years will still be very large.
Another possibility that has been proposed are variants of the simulation hypothesis— the idea that we exist in a simulated reality. The most common variant of this in a Great Filter context suggests that we are in an ancestor simulation, that is a simulation by the future descendants of humanity of what early humans would have been like.
The simulation hypothesis runs into serious problems, both in general and as an explanation of the Great Filter in particular. First, if our understanding of the laws of physics is approximately correct, then there are strong restrictions on what computations can be done with a given amount of resources. For example, BQP, the set of problems which can be solved efficiently by quantum computers is contained in PSPACE, the set of problems which can solved when one has a polynomial amount of space available and no time limit. Thus, in order to do a detailed simulation, the level of resources needed would likely be large since one would even if one made a close to classical simulation still need about as many resources. There are other results, such as Holevo's theorem, which place other similar restrictions. The upshot of these results is that one cannot make a detailed simulation of an object without using at least much resources as the object itself. There may be potential ways of getting around this: for example, consider a simulator interested primarily in what life on Earth is doing. The simulation would not need to do a detailed simulation of the inside of planet Earth and other large bodies in the solar system. However, even then, the resources involved would be very large.
The primary problem with the simulation hypothesis as an explanation is that it requires the future of humanity to have actually already passed through the Great Filter and to have found their own success sufficiently unlikely that they've devoted large amounts of resources to actually finding out how they managed to survive. Moreover, there are strong limits on how accurately one can reconstruct any given quantum state which means an ancestry simulation will be at best a rough approximation. In this context, while there are interesting anthropic considerations here, it is more likely that the simulation hypothesis is wishful thinking.
Variants of the "Prime Directive" have also been proposed. The essential idea is that advanced civilizations would deliberately avoid interacting with less advanced civilizations. This hypothesis runs into two serious problems: first, it does not explain the apparent naturalness, only the lack of direct contact by alien life. Second, it assumes a solution to a massive coordination problem between multiple species with potentially radically different ethical systems. In a similar vein, Hanson in his original essay on the Great Filter raised the possibility of a single very early species with some form of faster than light travel and a commitment to keeping the universe close to natural looking. Since all proposed forms of faster than light travel are highly speculative and would involve causality violations this hypothesis cannot be assigned a substantial probability.
People have also suggested that civilizations move outside galaxies to the cold of space where they can do efficient reversible computing using cold dark matter. Jacob Cannell has been one of the most vocal proponents of this idea. This hypothesis suffers from at least three problems. First, it fails to explain why those entities have not used the conventional matter to any substantial extent in addition to the cold dark matter. Second, this hypothesis would either require dark matter composed of cold conventional matter (which at this point seems to be only a small fraction of all dark matter), or would require dark matter which interacts with itself using some force other than gravity. While there is some evidence for such interaction, it is at this point, slim. Third, even if some species had taken over a large fraction of dark matter to use for their own computations, one would then expect later species to use the conventional matter since they would not have the option of using the now monopolized dark matter.
Other exotic non-Filter explanations have been proposed but they suffer from similar or even more severe flaws.
It is possible that future information will change this situation. One of the more plausible explanations of the Great Filter is that there is no single Great Filter in the past but rather a large number of small filters which come together to drastically filter out civilizations. However, the evidence for such a viewpoint at this point is slim but there is some possibility that astronomy can help answer this question.
For example, one commonly cited aspect of past filtration is the origin of life. There are at least three locations, other than Earth, where life could have formed: Europa, Titan and Mars. Finding life on one, or all of them, would be a strong indication that the origin of life is not the filter. Similarly, while it is highly unlikely that Mars has multicellular life, finding such life would indicate that the development of multicellular life is not the filter. However, none of them are as hospitable to the extent of Earth, so determining whether there is life will require substantial use of probes. We might also look for signs of life in the atmospheres of extrasolar planets, which would require substantially more advanced telescopes.
Another possible early filter is that planets like Earth frequently get locked into a "snowball" state which planets have difficulty exiting. This is an unlikely filter since Earth has likely been in near-snowball conditions multiple times— once very early on during the Huronian and later, about 650 million years ago. This is an example of an early partial Filter where astronomical observation may be of assistance in finding evidence of the filter. The snowball Earth filter does have one strong virtue: if many planets never escape a snowball situation, then this explains in part why we are not around a red dwarf: planets do not escape their snowball state unless their home star is somewhat variable, and red dwarfs are too stable.
It should be clear that none of these explanations are satisfactory and thus we must take seriously the possibility of future Filtration.
How can we substantially improve astronomy in the short to medium term?
Before we examine the potentials for further astronomical research to understand a future filter we should note that there are many avenues in which we can improve our astronomical instruments. The most basic way is to simply make better conventional optical, near-optical telescopes, and radio telescopes. That work is ongoing. Examples include the European Extreme Large Telescope and the Thirty Meter Telescope. Unfortunately, increasing the size of ground based telescopes, especially size of the aperture, is running into substantial engineering challenges. However, in the last 30 years the advent of adaptive optics, speckle imaging, and other techniques have substantially increased the resolution of ground based optical telescopes and near-optical telescopes. At the same time, improved data processing and related methods have improved radio telescopes. Already, optical and near-optical telescopes have advanced to the point where we can gain information about the atmospheres of extrasolar planets although we cannot yet detect information about the atmospheres of rocky planets.
Increasingly, the highest resolution is from space-based telescopes. Space-based telescopes also allow one to gather information from types of radiation which are blocked by the Earth's atmosphere or magnetosphere. Two important examples are x-ray telescopes and gamma ray telescopes. Space-based telescopes also avoid many of the issues created by the atmosphere for optical telescopes. Hubble is the most striking example but from a standpoint of observatories relevant to the Great Filter, the most relevant space telescope (and most relevant instrument in general for all Great Filter related astronomy), is the planet detecting Kepler spacecraft which is responsible for most of the identified planets.
Another type of instrument are neutrino detectors. Neutrino detectors are generally very large bodies of a transparent material (generally water) kept deep underground so that there are minimal amounts of light and cosmic rays hitting the the device. Neutrinos are then detected when they hit a particle which results in a flash of light. In the last few years, improvements in optics, increasing the scale of the detectors, and the development of detectors like IceCube, which use naturally occurring sources of water, have drastically increased the sensitivity of neutrino detectors.
There are proposals for larger-scale, more innovative telescope designs but they are all highly speculative. For example, in the ground based optical front, there's been a suggestion to make liquid mirror telescopes with ferrofluid mirrors which would give the advantages of liquid mirror telescopes, while being able to apply adaptive optics which can normally only be applied to solid mirror telescopes. An example of potential space-based telescopes is the Aragoscope which would take advantage of diffraction to make a space-based optical telescope with a resolution at least an order of magnitude greater than Hubble. Other examples include placing telescopes very far apart in the solar system to create effectively very high aperture telescopes. The most ambitious and speculative of such proposals involve such advanced and large-scale projects that one might as well presume that they will only happen if we have already passed through the Great Filter.
What are the major identified future potential contributions to the filter and what can astronomy tell us?
One threat type where more astronomical observations can help are natural threats, such as asteroid collisions, supernovas, gamma ray bursts, rogue high gravity bodies, and as yet unidentified astronomical threats. Careful mapping of asteroids and comets is ongoing and requires more continued funding rather than any intrinsic improvements in technology. Right now, most of our mapping looks at objects at or near the plane of the ecliptic and so some focus off the plane may be helpful. Unfortunately, there is very little money to actually deal with such problems if they arise. It might be possible to have a few wealthy individuals agree to set up accounts in escrow which would be used if an asteroid or similar threat arose.
Supernovas are unlikely to be a serious threat at this time. There are some stars which are close to our solar system and are large enough that they will go supernova. Betelgeuse is the most famous of these with a projected supernova likely to occur in the next 100,000 years. However, at its current distance, Betelgeuse is unlikely to pose much of a problem unless our models of supernovas are very far off. Further conventional observations of supernovas need to occur in order to understand this further, and better neutrino observations will also help but right now, supernovas do not seem to be a large risk. Gamma ray bursts are in a situation similar to supernovas. Note also that if an imminent gamma ray burst or supernova is likely to occur, there's very little we can at present do about it. In general, back of the envelope calculations establish that supernovas are highly unlikely to be a substantial part of the Great Filter.
Rogue planets, brown dwarfs or other small high gravity bodies such as wandering black holes can be detected and further improvements will allow faster detection. However, the scale of havoc created by such events is such that it is not at all clear that detection will help. The entire planetary nuclear arsenal would not even begin to move their orbits a substantial extent.
Note also it is unlikely that natural events are a large fraction of the Great Filter. Unlike most of the other threat types, this is a threat type where radio astronomy and neutrino information may be more likely to identify problems.
Biological threats take two primary forms: pandemics and deliberately engineered diseases. The first is more likely than one might naively expect as a serious contribution to the filter, since modern transport allows infected individuals to move quickly and come into contact with a large number of people. For example, trucking has been a major cause of the spread of HIV in Africa and it is likely that the recent Ebola epidemic had similar contributing factors. Moreover, keeping chickens and other animals in very large quanities in dense areas near human populations makes it easier for novel variants of viruses to jump species. Astronomy does not seem to provide any relevant assistance here; the only plausible way of getting such information would be to see other species that were destroyed by disease. Even with resolutions and improvements in telescopes by many orders of magnitude this is not doable.
For reasons similar to those in the biological threats category, astronomy is unlikely to help us detect if nuclear war is a substantial part of the Filter. It is possible that more advanced telescopes could detect an extremely large nuclear detonation if it occurred in a very nearby star system. Next generation telescopes may be able to detect a nearby planet's advanced civilization purely based on the light they give off and a sufficiently large detonation would be of the same light level. However, such devices would be multiple orders of magnitude larger than the largest current nuclear devices. Moreover, if a telescope was not looking at exactly the right moment, it would not see anything at all, and the probability that another civilization wipes itself out at just the same instant that we are looking is vanishingly small.
This category is one of the most difficult to discuss because it so open. The most common examples people point to involve high-energy physics. Aside from theoretical considerations, cosmic rays of very high energy levels are continually hitting the upper atmosphere. These particles frequently are multiple orders of magnitude higher energy than the particles in our accelerators. Thus high-energy events seem to be unlikely to be a cause of any serious filtration unless/until humans develop particle accelerators whose energy level is orders of magnitude higher than that produced by most cosmic rays. Cosmic rays with energy levels beyond what is known as the GZK energy limit are rare. We have observed occasional particles with energy levels beyond the GZK limit, but they are rare enough that we cannot rule out a risk from many collisions involving such high energy particles in a small region. Since our best accelerators are nowhere near the GZK limit, this is not an immediate problem.
There is an argument that we should if anything worry about unexpected physics, it is on the very low energy end. In particular, humans have managed to make objects substantially colder than the background temperature of 4 K with temperature as on the order of 10-9 K. There's an argument that because of the lack of prior examples of this, the chance that something can go badly wrong should be higher than one might estimate (See here.) While this particular class of scenario seems unlikely, it does illustrate that it may not be obvious which situations could cause unexpected, novel physics to come into play. Moreover, while the flashy, expensive particle accelerators get attention, they may not be a serious source of danger compared to other physics experiments.
Three of the more plausible catastrophic unexpected physics dealing with high energy events are, false vacuum collapse, black hole formation, and the formation of strange matter which is more stable than regular matter.
False vacuum collapse would occur if our universe is not in its true lowest energy state and an event occurs which causes it to transition to the true lowest state (or just a lower state). Such an event would be almost certainly fatal for all life. False vacuum collapses cannot be avoided by astronomical observations since once initiated they would expand at the speed of light. Note that the indiscriminately destructive nature of false vacuum collapses make them an unlikely filter. If false vacuum collapses were easy we would not expect to see almost any life this late in the universe's lifespan since there would be a large number of prior opportunities for false vacuum collapse. Essentially, we would not expect to find ourselves this late in a universe's history if this universe could easily engage in a false vacuum collapse. While false vacuum collapses and similar problems raise issues of observer selection effects, careful work has been done to estimate their probability.
People have mentioned the idea of an event similar to a false vacuum collapse but which occurs at a speed slower than the speed of light. Greg Egan used it is a major premise in his novel, "Schild's Ladder." I'm not aware of any reason to believe such events are at all plausible. The primary motivation seems to be for the interesting literary scenarios which arise rather than for any scientific considerations. If such a situation can occur, then it is possible that we could detect it using astronomical methods. In particular, if the wave-front of the event is fast enough that it will impact the nearest star or nearby stars around it, then we might notice odd behavior by the star or group of stars. We can be confident that no such event has a speed much beyond a few hundredths of the speed of light or we would already notice galaxies behaving abnormally. There is a very narrow range where such expansions could be quick enough to devastate the planet they arise on but take too long to get to their parent star in a reasonable amount of time. For example, the distance from the Earth to the Sun is on the order of 10,000 times the diameter of the Earth, so any event which would expand to destroy the Earth would reach the Sun in about 10,000 times as long. Thus in order to have a time period which would destroy one's home planet but not reach the parent star it would need to be extremely slow.
The creation of artificial black holes are unlikely to be a substantial part of the filter— we expect that small black holes will quickly pop out of existence due to Hawking radiation. Even if the black hole does form, it is likely to fall quickly to the center of the planet and eat matter very slowly and over a time-line which does not make it constitute a serious threat. However, it is possible that black holes would not evaporate; the fact that we have not detected the evaporation of any primordial black holes is weak evidence that the behavior of small black holes is not well-understood. It is also possible that such a hole would eat much faster than we expect but this doesn't seem likely. If this is a major part of the filter, then better telescopes should be able to detect it by finding very dark objects with the approximate mass and orbit of habitable planets. We also may be able to detect such black holes via other observations such as from their gamma or radio signatures.
The conversion of regular matter into strange matter, unlike a false vacuum collapse or similar event, might be naturally limited to the planet where the conversion started. In that case, the only hope for observation would be to notice planets formed of strange matter and notice changes in the behavior of their light. Without actual samples of strange matter, this may be very difficult to do unless we just take notice of planets looking abnormal as similar evidence. Without substantially better telescopes and a good idea of what the range is for normal rocky planets, this would be tough. On the other hand, neutron stars which have been converted into strange matter may be more easily detectable.
Global warming and related damage to biosphere:
Astronomy is unlikely to help here. It is possible that climates are more sensitive than we realize and that comparatively small changes can result in Venus-like situations. This seems unlikely given the general variation level in human history and the fact that current geological models strongly suggest that any substantial problem would eventually correct itself. But if we saw many planets that looked Venus-like in the middle of their habitable zones, this would be a reason to be worried. Note that this would require detailed ability to analyze atmospheres on planets well beyond current capability. Even if it is possible Venus-ify a planet, it is not clear that the Venusification would last long. Thus there may be very few planets in this state at any given time. Since stars become brighter as they age, so high greenhouse gas levels have more of an impact on climate when the parent star is old. If civilizations are more likely to arise in a late point of their home star's lifespan, global warming becomes a more plausible filter, but even given given such considerations, global warming does not seem to be sufficient as a filter. It is also possible that global warming by itself is not the Great Filter but rather general disruption of the biosphere including possibly for some species global warming, reduction in species diversity, and other problems. There is some evidence that human behavior is collectively causing enough damage to leave an unstable biosphere.
A change in planetary overall temperature of 10o C would likely be enough to collapse civilization without leaving any signal observable to a telescope. Similarly, substantial disruption to a biosphere may be very unlikely to be detected.
AI is a complicated existential risk from the standpoint of the Great Filter. AI is not likely to be the Great Filter if one considers simply the Fermi paradox. The essential problem has been brought up independently by a few people. (See for example Katja Grace's remark here and my blog here.) The central issue is that if an AI takes over it is likely to attempt to control all resources in its future light-cone. However, if the AI spreads out at a substantial fraction of the speed of light, then we would notice the result. The argument has been made that we would not see such an AI if it expanded its radius of control at very close to the speed of light but this requires expansion at 99% of the speed of light or greater. It is highly questionable that velocities more than 99% of the speed of light are practically possible due to collisions with the interstellar medium and the need to slow down if one is going to use the resources in a given star system. Another objection is that AI may expand at a large fraction of light speed but do so stealthily. It is not likely that all AIs would favor stealth over speed. Moreover, this would lead to the situation of what one would expect when multiple slowly expanding, stealth AIs run into each other. It is likely that such events would have results would catastrophic enough that they would be visible even with comparatively primitive telescopes.
While these astronomical considerations make AI unlikely to be the Great Filter, it is important to note that if the Great Filter is largely in our past then these considerations do not apply. Thus, any discovery which pushes more of the filter into the past makes AI a larger fraction of total expected existential risks since the absence of observable AI becomes much weaker evidence against strong AI if there are no major civilizations out there to hatch such explosions.
Note also that AI as a risk cannot be discounted if one assigns a high probability to existential risk based on non-Fermi concerns, such as the Doomsday Argument.
Astronomy is unlikely to provide direct help here for reasons similar to the problems with nuclear exchange, biological problems, and global warming. This connects to the problem of civilization bootstrapping: to get to our current technology level, we used a large number of non-renewable resources, especially energy sources. On the other hand, large amounts of difficult-to-mine and refine resources (especially aluminum and titanium) will be much more accessible to future civilization. While there remains a large amount of accessible fossil fuels, the technology required to obtain deeper sources is substantially more advanced than the relatively easy to access oil and coal. Moreover, the energy return rate, how much energy one needs to put in to get the same amount of energy out, is lower. Nick Bostrom has raised the possibility that the depletion of easy-to-access resources may contribute to making civilization-collapsing problems that, while not full-scale existential risks by themselves, prevent the civilizations from recovering. Others have begun to investigate the problem of rebuilding without fossil fuels, such as here.
Resource depletion is unlikely to be the Great Filter, because small changes to human behavior in the 1970s would have drastically reduced the current resource problems. Resource depletion may contribute to existential threat to humans if it leads to societal collapse, global nuclear exchange, or motivate riskier experimentation. Resource depletion may also combine with other risks such as a global warming where the combined problems may be much greater than either at an individual level. However there is a risk that large scale use of resources to engage in astronomy research will directly contribute to the resource depletion problem.
A simple exercise in rationality: rephrase an objective statement as subjective and explore the caveats
"This book is awful" => "I dislike this book" => "I dislike this book because it is shallow and is full of run-on sentences." => I dislike this book because I prefer reading books I find deep and clearly written."
"The sky is blue" => ... => "When I look at the sky, the visual sensation I get is very similar to when I look at a bunch of other objects I've been taught to associate with the color blue."
"Team X lost but deserved to win" => ...
"Being selfish is immoral"
"The Universe is infinite, so anything imaginable happens somewhere"
In general, consider a quick check whether in a given context replacing "is" with "appears to be" leads to something you find non-trivial.
Why? Because it exposes the multiple levels of maps we normally skip. So one might find illuminating occasionally walking through the levels and making sure they are still connected as firmly as the last time. And maybe figuring out where the people who hold a different opinion from yours construct a different chain of maps. Also to make sure you don't mistake a map for the territory.
That is all. ( => "I think that I have said enough for one short post and adding more would lead to diminishing returns, though I could be wrong here, but I am too lazy to spend more time looking for links and quotes and better arguments without being sure that they would improve the post.")
I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.
In the previous post in this series, I talked about how one might get an AI to have similar concepts as humans. However, one would intuitively assume that a superintelligent AI might eventually develop the capability to entertain far more sophisticated concepts than humans would ever be capable of having. Is that a problem?
Just what are concepts, anyway?
To answer the question, we first need to define what exactly it is that we mean by a "concept", and why exactly more sophisticated concepts would be a problem.
Unfortunately, there isn't really any standard definition of this in the literature, with different theorists having different definitions. Machery even argues that the term "concept" doesn't refer to a natural kind, and that we should just get rid of the whole term. If nothing else, this definition from Kruschke (2008) is at least amusing:
Models of categorization are usually designed to address data from laboratory experiments, so “categorization” might be best defined as the class of behavioral data generated by experiments that ostensibly study categorization.
Because I don't really have the time to survey the whole literature and try to come up with one grand theory of the subject, I will for now limit my scope and only consider two compatible definitions of the term.
Definition 1: Concepts as multimodal neural representations. I touched upon this definition in the last post, where I mentioned studies indicating that the brain seems to have shared neural representations for e.g. the touch and sight of a banana. Current neuroscience seems to indicate the existence of brain areas where representations from several different senses are combined together into higher-level representations, and where the activation of any such higher-level representation will also end up activating the lower sense modalities in turn. As summarized by Man et al. (2013):
Briefly, the Damasio framework proposes an architecture of convergence-divergence zones (CDZ) and a mechanism of time-locked retroactivation. Convergence-divergence zones are arranged in a multi-level hierarchy, with higher-level CDZs being both sensitive to, and capable of reinstating, specific patterns of activity in lower-level CDZs. Successive levels of CDZs are tuned to detect increasingly complex features. Each more-complex feature is defined by the conjunction and configuration of multiple less-complex features detected by the preceding level. CDZs at the highest levels of the hierarchy achieve the highest level of semantic and contextual integration, across all sensory modalities. At the foundations of the hierarchy lie the early sensory cortices, each containing a mapped (i.e., retinotopic, tonotopic, or somatotopic) representation of sensory space. When a CDZ is activated by an input pattern that resembles the template for which it has been tuned, it retro-activates the template pattern of lower-level CDZs. This continues down the hierarchy of CDZs, resulting in an ensemble of well-specified and time-locked activity extending to the early sensory cortices.
On this account, my mental concept for "dog" consists of a neural activation pattern making up the sight, sound, etc. of some dog - either a generic prototypical dog or some more specific dog. Likely the pattern is not just limited to sensory information, either, but may be associated with e.g. motor programs related to dogs. For example, the program for throwing a ball for the dog to fetch. One version of this hypothesis, the Perceptual Symbol Systems account, calls such multimodal representations simulators, and describes them as follows (Niedenthal et al. 2005):
A simulator integrates the modality-specific content of a category across instances and provides the ability to identify items encountered subsequently as instances of the same category. Consider a simulator for the social category, politician. Following exposure to different politicians, visual information about how typical politicians look (i.e., based on their typical age, sex, and role constraints on their dress and their facial expressions) becomes integrated in the simulator, along with auditory information for how they typically sound when they talk (or scream or grovel), motor programs for interacting with them, typical emotional responses induced in interactions or exposures to them, and so forth. The consequence is a system distributed throughout the brain’s feature and association areas that essentially represents knowledge of the social category, politician.
The inclusion of such "extra-sensory" features helps understand how even abstract concepts could fit this framework: for example, one's understanding of the concept of a derivative might be partially linked to the procedural programs one has developed while solving derivatives. For a more detailed hypothesis of how abstract mathematics may emerge from basic sensory and motor programs and concepts, I recommend Lakoff & Nuñez (2001).
Definition 2: Concepts as areas in a psychological space. This definition, while being compatible with the previous one, looks at concepts more "from the inside". Gärdenfors (2000) defines the basic building blocks of a psychological conceptual space to be various quality dimensions, such as temperature, weight, brightness, pitch, and the spatial dimensions of height, width, and depth. These are psychological in the sense of being derived from our phenomenal experience of certain kinds of properties, rather than the way in which they might exist in some objective reality.
For example, one way of modeling the psychological sense of color is via a color space defined by the quality dimensions of hue (represented by the familiar color circle), chromaticness (saturation), and brightness.
The second phenomenal dimension of color is chromaticness (saturation), which ranges from grey (zero color intensity) to increasingly greater intensities. This dimension is isomorphic to an interval of the real line. The third dimension is brightness which varies from white to black and is thus a linear dimension with two end points. The two latter dimensions are not totally independent, since the possible variation of the chromaticness dimension decreases as the values of the brightness dimension approaches the extreme points of black and white, respectively. In other words, for an almost white or almost black color, there can be very little variation in its chromaticness. This is modeled by letting that chromaticness and brightness dimension together generate a triangular representation ... Together these three dimensions, one with circular structure and two with linear, make up the color space. This space is often illustrated by the so called color spindle
This kind of a representation is different from the physical wavelength representation of color, where e.g. the hue is mostly related to the wavelength of the color. The wavelength representation of hue would be linear, but due to the properties of the human visual system, the psychological representation of hue is circular.
Gärdenfors defines two quality dimensions to be integral if a value cannot be given for an object on one dimension without also giving it a value for the other dimension: for example, an object cannot be given a hue value without also giving it a brightness value. Dimensions that are not integral with each other are separable. A conceptual domain is a set of integral dimensions that are separable from all other dimensions: for example, the three color-dimensions form the domain of color.
From these definitions, Gärdenfors develops a theory of concepts where more complicated conceptual spaces can be formed by combining lower-level domains. Concepts, then, are particular regions in these conceptual spaces: for example, the concept of "blue" can be defined as a particular region in the domain of color. Notice that the notion of various combinations of basic perceptual domains making more complicated conceptual spaces possible fits well together with the models discussed in our previous definition. There more complicated concepts were made possible by combining basic neural representations for e.g. different sensory modalities.
The origin of the different quality dimensions could also emerge from the specific properties of the different simulators, as in PSS theory.
Thus definition #1 allows us to talk about what a concept might "look like from the outside", with definition #2 talking about what the same concept might "look like from the inside".
Interestingly, Gärdenfors hypothesizes that much of the work involved with learning new concepts has to do with learning new quality dimensions to fit into one's conceptual space, and that once this is done, all that remains is the comparatively much simpler task of just dividing up the new domain to match seen examples.
For example, consider the (phenomenal) dimension of volume. The experiments on "conservation" performed by Piaget and his followers indicate that small children have no separate representation of volume; they confuse the volume of a liquid with the height of the liquid in its container. It is only at about the age of five years that they learn to represent the two dimensions separately. Similarly, three- and four-year-olds confuse high with tall, big with bright, and so forth (Carey 1978).
The problem of alien concepts
With these definitions for concepts, we can now consider what problems would follow if we started off with a very human-like AI that had the same concepts as we did, but then expanded its conceptual space to allow for entirely new kinds of concepts. This could happen if it self-modified to have new kinds of sensory or thought modalities that it could associate its existing concepts with, thus developing new kinds of quality dimensions.
An analogy helps demonstrate this problem: suppose that you're operating in a two-dimensional space, where a rectangle has been drawn to mark a certain area as "forbidden" or "allowed". Say that you're an inhabitant of Flatland. But then you suddenly become aware that actually, the world is three-dimensional, and has a height dimension as well! That raises the question of, how should the "forbidden" or "allowed" area be understood in this new three-dimensional world? Do the walls of the rectangle extend infinitely in the height dimension, or perhaps just some certain distance in it? If just a certain distance, does the rectangle have a "roof" or "floor", or can you just enter (or leave) the rectangle from the top or the bottom? There doesn't seem to be any clear way to tell.
As a historical curiosity, this dilemma actually kind of really happened when airplanes were invented: could landowners forbid airplanes from flying over their land, or was the ownership of the land limited to some specific height, above which the landowners had no control? Courts and legislation eventually settled on the latter answer. A more AI-relevant example might be if one was trying to limit the AI with rules such as "stay within this box here", and the AI then gained an intuitive understanding of quantum mechanics, which might allow it to escape from the box without violating the rule in terms of its new concept space.
More generally, if previously your concepts had N dimensions and now they have N+1, you might find something that fulfilled all the previous criteria while still being different from what we'd prefer if we knew about the N+1th dimension.
In the next post, I will present some (very preliminary and probably wrong) ideas for solving this problem.
Next post in series: What are concepts for, and how to deal with alien concepts.
Our sun appears to be a typical star: unremarkable in age, composition, galactic orbit, or even in its possession of many planets. Billions of other stars in the milky way have similar general parameters and orbits that place them in the galactic habitable zone. Extrapolations of recent expolanet surveys reveal that most stars have planets, removing yet another potential unique dimension for a great filter in the past.
According to Google, there are 20 billion earth like planets in the Galaxy.
A paradox indicates a flaw in our reasoning or our knowledge, which upon resolution, may cause some large update in our beliefs.
Ideally we could resolve this through massive multiscale monte carlo computer simulations to approximate Solonomoff Induction on our current observational data. If we survive and create superintelligence, we will probably do just that.
In the meantime, we are limited to constrained simulations, fermi estimates, and other shortcuts to approximate the ideal bayesian inference.
While there is still obvious uncertainty concerning the likelihood of the series of transitions along the path from the formation of an earth-like planet around a sol-like star up to an early tech civilization, the general direction of the recent evidence flow favours a strong Mediocrity Principle.
Here are a few highlight developments from the last few decades relating to an early filter:
- The time window between formation of earth and earliest life has been narrowed to a brief interval. Panspermia has also gained ground, with some recent complexity arguments favoring a common origin of life at 9 billion yrs ago.
- Discovery of various extremophiles indicate life is robust to a wider range of environments than the norm on earth today.
- Advances in neuroscience and studies of animal intelligence lead to the conclusion that the human brain is not nearly as unique as once thought. It is just an ordinary scaled up primate brain, with a cortex enlarged to 4x the size of a chimpanzee. Elephants and some cetaceans have similar cortical neuron counts to the chimpanzee, and demonstrate similar or greater levels of intelligence in terms of rituals, problem solving, tool use, communication, and even understanding rudimentary human language. Elephants, cetaceans, and primates are widely separated lineages, indicating robustness and inevitability in the evolution of intelligence.
When modelling the future development of civilization, we must recognize that the future is a vast cloud of uncertainty compared to the past. The best approach is to focus on the most key general features of future postbiological civilizations, categorize the full space of models, and then update on our observations to determine what ranges of the parameter space are excluded and which regions remain open.
An abridged taxonomy of future civilization trajectories :
Civilization is wiped out due to an existential catastrophe that sterilizes the planet sufficient enough to kill most large multicellular organisms, essentially resetting the evolutionary clock by a billion years. Given the potential dangers of nanotech/AI/nuclear weapons - and then aliens, I believe this possibility is significant - ie in the 1% to 50% range.
This is the old-skool sci-fi scenario. Humans or our biological descendants expand into space. AI is developed but limited to human intelligence, like CP30. No or limited uploading.
This leads eventually to slow colonization, terraforming, perhaps eventually dyson spheres etc.
This scenario is almost not worth mentioning: prior < 1%. Unfortunately SETI in current form is till predicated on a world model that assigns a high prior to these futures.
PostBiological Warm-tech AI Civilization:
This is Kurzweil/Moravec's sci-fi scenario. Humans become postbiological, merging with AI through uploading. We become a computational civilization that then spreads out some fraction of the speed of light to turn the galaxy into computronium. This particular scenario is based on the assumption that energy is a key constraint, and that civilizations are essentially stellavores which harvest the energy of stars.
One of the very few reasonable assumptions we can make about any superintelligent postbiological civilization is that higher intelligence involves increased computational efficiency. Advanced civs will upgrade into physical configurations that maximize computation capabilities given the local resources.
Thus to understand the physical form of future civs, we need to understand the physical limits of computation.
One key constraint is the Landauer Limit, which states that the erasure (or cloning) of one bit of information requires a minimum of kTln2 joules. At room temperature (293 K), this corresponds to a minimum of 0.017 eV to erase one bit. Minimum is however the keyword here, as according to the principle, the probability of the erasure succeeding is only 50% at the limit. Reliable erasure requires some multiple of the minimal expenditure - a reasonable estimate being about 100kT or 1eV as the minimum for bit erasures at today's levels of reliability.
Now, the second key consideration is that Landauer's Limit does not include the cost of interconnect, which is already now dominating the energy cost in modern computing. Just moving bits around dissipates energy.
Moore's Law is approaching its asymptotic end in a decade or so due to these hard physical energy constraints and the related miniaturization limits.
I assign a prior to the warm-tech scenario that is about the same as my estimate of the probability that the more advanced cold-tech (reversible quantum computing, described next) is impossible: < 10%.
From Warm-tech to Cold-tech
There is a way forward to vastly increased energy efficiency, but it requires reversible computing (to increase the ratio of computations per bit erasures), and full superconducting to reduce the interconnect loss down to near zero.
The path to enormously more powerful computational systems necessarily involves transitioning to very low temperatures, and the lower the better, for several key reasons:
- There is the obvious immediate gain that one gets from lowering the cost of bit erasures: a bit erasure at room temperature costs 100 times more than a bit erasure at the cosmic background temperature, and a hundred thousand times more than an erasure at 0.01K (the current achievable limit for large objects)
- Low temperatures are required for most superconducting materials regardless.
- The delicate coherence required for practical quantum computation requires or works best at ultra low temperatures.
Assuming large scale quantum computing is possible, then the ultimate computer is thus a reversible massively entangled quantum device operating at absolute zero. Unfortunately, such a device would be delicate to a degree that is hard to imagine - even a single misplaced high energy particle could cause enormous damage.
Stellar Escape Trajectories
The Great Game
If two civs both discover each other's locations around the same time, then MAD (mutually assured destruction) dynamics takeover and cooperation has stronger benefits. The vast distances involve suggest that one sided discoveries are more likely.
Spheres of Influence
Conditioning on our Observational Data
Observational Selection Effects
All advanced civs will have strong instrumental reasons to employ deep simulations to understand and model developmental trajectories for the galaxy as a whole and for civilizations in particular. A very likely consequence is the production of large numbers of simulated conscious observers, ala the Simulation Argument. Universes with the more advanced low temperature reversible/quantum computing civilizations will tend to produce many more simulated observer moments and are thus intrinsically more likely than one would otherwise expect - perhaps massively so.
We estimate that there may be up to ∼ 10^5 compact objects in the mass range 10^−8 to 10^−2M⊙per main sequence star that are unbound to a host star in the Galaxy. We refer to these objects asnomads; in the literature a subset of these are sometimes called free-floating or rogue planets.
Although the error range is still large, it appears that free floating planets outnumber planets bound to stars, and perhaps by a rather large margin.
Assuming the galaxy is colonized: It could be that rogue planets form naturally outside of stars and then are colonized. It could be they form around stars and then are ejected naturally (and colonized). Artificial ejection - even if true - may be a rare event. Or not. But at least a few of these options could potentially be differentiated with future observations - for example if we find an interesting discrepancy in the rogue planet distribution predicted by simulations (which obviously do not yet include aliens!) and actual observations.
Also: if rogue planets outnumber stars by a large margin, then it follows that rogue planet flybys are more common in proportion.
SETI to date allows us to exclude some regions of the parameter space for alien civs, but the regions excluded correspond to low prior probability models anyway, based on the postbiological perspective on the future of life. The most interesting regions of the parameter space probably involve advanced stealthy aliens in the form of small compact cold objects floating in the interstellar medium.
The upcoming WFIST telescope should shed more light on dark matter and enhance our microlensing detection abilities significantly. Sadly, it's planned launch date isn't until 2024. Space development is slow.
Welcome to the Rationality reading group. This week we discuss the Preface by primary author Eliezer Yudkowsky, Introduction by editor & co-author Rob Bensinger, and the first sequence: Predictably Wrong. This sequence introduces the methods of rationality, including its two major applications: the search for truth and the art of winning. The desire to seek truth is motivated, and a few obstacles to seeking truth--systematic errors, or biases--are discussed in detail.
This post summarizes each article of the sequence, linking to the original LessWrong posting where available, and offers a few relevant notes, thoughts, and ideas for further investigation. My own thoughts and questions for discussion are in the comments.
Reading: Preface, Biases: An Introduction, and Sequence A: Predictably Wrong (pi-xxxv and p1-42)
Preface. Introduction to the ebook compilation by Eliezer Yudkowsky. Retrospectively identifies mistakes of the text as originally presented. Some have been corrected in the ebook, others stand as-is. Most notably the book focuses too much on belief, and too little on practical actions, especially with respect to our everyday lives. Establishes that the goal of the project is to teach rationality, those ways of thinking which are common among practicing scientists and the foundation of the Enlightenment, yet not systematically organized or taught in schools (yet).
Biases: An Introduction. Editor & co-author Rob Bensinger motivates the subject of rationality by explaining the dangers of systematic errors caused by *cognitive biases*, which the arts of rationality are intended to de-bias. Rationality is not about Spock-like stoicism -- it is about simply "doing the best you can with what you've got." The System 1 / System 2 dual process dichotomy is explained: if our errors are systematic and predictable, then we can instil behaviors and habits to correct them. A number of exemplar biases are presented. However a warning: it is difficult to recognize biases in your own thinking even after learning of them, and knowing about a bias may grant unjustified overconfidence that you yourself do not fall pray to such mistakes in your thinking. To develop as a rationalist actual experience is required, not just learned expertise / knowledge. Ends with an introduction of the editor and an overview of the organization of the book.
A. Predictably Wrong
1. What do I mean by "rationality"? Rationality is a systematic means of forming true beliefs and making winning decisions. Probability theory is the set of laws underlying rational belief, "epistemic rationality": it describes how to process evidence and observations to revise ("update") one's beliefs. Decision theory is the set of laws underlying rational action, "instrumental rationality", independent of what one's goals and available options are. (p7-11)
2. Feeling rational. Becoming more rational can diminish feelings or intensify them. If one cares about the state of the world, it is expected that he or she should have an emotional response to the acquisition of truth. "That which can be destroyed by the truth should be," but also "that which the truth nourishes should thrive." The commonly perceived dichotomy between emotions and "rationality" [sic] is more often about fast perceptual judgements (System 1, emotional) vs slow deliberative judgements (System 2, "rational" [sic]). But both systems can serve the goal of truth, or defeat it, depending on how they are used. (p12-14)
3. Why truth? and... Why seek the truth? Curiosity: to satisfy an emotional need to know. Pragmatism: to accomplish some specific real-world goal. Morality: to be virtuous, or fulfill a duty to truth. Curiosity motivates a search for the most intriguing truths, pragmatism the most useful, and morality the most important. But be wary of the moral justification: "To make rationality into a moral duty is to give it all the dreadful degrees of freedom of an arbitrary tribal custom. People arrive at the wrong answer, and then indignantly protest that they acted with propriety, rather than learning from their mistake." (p15-18)
4. ...what's a bias, again? A bias is an obstacle to truth, specifically those obstacles which are produced by our own thinking processes. We describe biases as failure modes which systematically prevent typical human beings from determining truth or selecting actions that would have best achieved their goals. Biases are distinguished from mistakes which originate from false beliefs or brain injury. Do better seek truth and achieve our goals we must identify our biases and do what we can to correct for or eliminate them. (p19-22)
5. Availability. The availability heuristic is judging the frequency or probability of an event by the ease with which examples of the event come to mind. If you think you've heard about murders twice as much as suicides then you might suppose that murder is twice as common as suicide, when in fact the opposite is true. Use of the availability heuristic gives rise to the absurdity bias: events that have never happened are not recalled, and hence deemed to have no probability of occurring. In general, memory is not always a good guide to probabilities in the past, let alone to the future. (p23-25)
6. Burdensome details. The conjunction fallacy is when humans rate the probability of two events has higher than the probability of either event alone: adding detail can make a scenario sound more plausible, even though the event as described necessarily becomes less probable. Possible fixes include training yourself to notice the addition of details and discount appropriately, thinking about other reasons why the central idea could be true other than the added detail, or training oneself to hold a preference for simpler explanations -- to feel every added detail as a burden. (p26-29)
7. Planning fallacy. The planning fallacy is the mistaken belief that human beings are capable of making accurate plans. The source of the error is that we tend to imagine how things will turn out if everything goes according to plan, and do not appropriately account for possible troubles or difficulties along the way. The typically adequate solution is to compare the new project to broadly similar previous projects undertaken in the past, and ask how long those took to complete. (p30-33)
8. Illusion of transparency: why no one understands you. The illusion of transparency is our bias to assume that others will understand the intent behind our attempts to communicate. The source of the error is that we do not sufficiently consider alternative frames of mind or personal histories, which might lead the recipient to alternative interpretations. Be not too quick to blame those who misunderstand your perfectly clear sentences, spoken or written. Chances are, your words are more ambiguous than you think. (p34-36)
9. Expecting short inferential distances. Human beings are generally capable of processing only one piece of new information at at time. Worse, someone who says something with no obvious support is a liar or an idiot, and if you say something blatantly obvious and the other person doesn't see it, they're the idiot. This is our bias towards explanations of short inferential distance. A clear argument has to lay out an inferential pathway, starting from what the audience already knows or accepts. If at any point you make a statement without obvious justification in arguments you've previously supported, the audience just thinks you're crazy. (p37-39)
10. The lens that sees its own flaws. We humans have the ability to introspect our own thinking processes, a seemingly unique skill among life on Earth. As consequence, a human brain is able to understand its own flaws--its systematic errors, its biases--and apply second-order corrections to them. (p40-42)
It is at this point that I would generally like to present an opposing viewpoint. However I must say that this first introductory sequence is not very controversial! Educational, yes, but not controversial. If anyone can provide a link or citation to one or more decent non-strawman arguments which oppose any of the ideas of this introduction and first sequence, please do so in the comments. I certainly encourage awarding karma to anyone that can do a reasonable job steel-manning an opposing viewpoint.
This has been a collection of notes on the assigned sequence for this week. The most important part of the reading group though is discussion, which is in the comments section. I pose some questions for you there, and I invite you to add your own. Please remember that this group contains a variety of levels of expertise: if a line of discussion seems too basic or too incomprehensible, look around for one that suits you better!
The next reading will cover Sequence B: Fake Beliefs (p43-77). The discussion will go live on Wednesday, 6 May 2015 at or around 6pm PDT, right here on the discussion forum of LessWrong.
For those of you unfamiliar with Churning, it's the practice of signing up for a rewards credit card, spending enough with your everyday purchases to get the (usually significant) reward and then cancelling it. Many of these cards are cards with annual fees (which is commonly waived and/or the one-time reward will pay for). For a nominal amount of work, you can churn cards for significant bonuses.
Ordinarily I wouldn't come close to spending enough money to qualify for many of these rewards, but I recently made the Giving What You Can pledge. I now have a steady stream of predictable expenses, and conveniently, GiveWell allows donations via most any credit card. I've started using new rewards cards to pay for these expenses each time, resulting in free flights (this is how I'm paying to fly to NYC this summer), Amazon gift cards, or sometimes just straight cash.
Since the first of the year (total expenses $4000, including some personal expenses) I've churned $700 worth of bonuses (from a Delta Amazon Express Gold and a Capital One Venture Card). This money can be redonated, saved, spent, or whatever.
1. I hope it goes without saying that you should pay off your balance in full each month, just like you should with any other card.
2. This has some negative impact on your credit, in the short term.
3. It should be noted that credit card companies make at least some money (I think 3%) off of your transactions, so if you're trying to hit a target of X% to charity, you would need to donate X/0.97, or 10.31% for 10% to account for that 3%. The reward should more than cover it.
4. Read more about this, including the pros and cons, from multiple sources before you try it. It's not something that should be done lightly, but does synergize very nicely with charity donations.
I'm currently reading through some relevant literature for preparing my FLI grant proposal on the topic of concept learning and AI safety. I figured that I might as well write down the research ideas I get while doing so, so as to get some feedback and clarify my thoughts. I will posting these in a series of "Concept Safety"-titled articles.
In The Problem of Alien Concepts, I posed the following question: if your concepts (defined as either multimodal representations or as areas in a psychological space) previously had N dimensions and then they suddenly have N+1, how does that affect (moral) values that were previously only defined in terms of N dimensions?
I gave some (more or less) concrete examples of this kind of a "conceptual expansion":
- Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
- As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends. Similarly for satellites and nation-states.
- As an inhabitant of Flatland, you've been told that the inside of a certain rectangle is a forbidden territory. Then you learn that the world is actually three-dimensional, leaving open the question of the height of which the forbidden territory extends.
- An AI has previously been reasoning in terms of classical physics and been told that it can't leave a box, which it previously defined in terms of classical physics. Then it learns about quantum physics, which allow for definitions of "location" which are substantially different from the classical ones.
As a hint of the direction where I'll be going, let's first take a look at how humans solve these kinds of dilemmas, and consider examples #1 and #2.
The first example - children realizing that items have a volume that's separate from their height - rarely causes any particular crises. Few children have values that would be seriously undermined or otherwise affected by this discovery. We might say that it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain.
As for the second example, I don't know the exact cognitive process by which it was decided that you didn't need the landowner's permission to fly over their land. But I'm guessing that it involved reasoning like: if the plane flies at a sufficient height, then that doesn't harm the landowner in any way. Flying would become impossible difficult if you had to get separate permission from every person whose land you were going to fly over. And, especially before the invention of radar, a ban on unauthorized flyovers would be next to impossible to enforce anyway.
We might say that after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values.
Concepts, values, and reinforcement learning
Before we go on, we need to talk a bit about why we have concepts and values in the first place.
From an evolutionary perspective, creatures that are better capable of harvesting resources (such as food and mates) and avoiding dangers (such as other creatures who think you're food or after their mates) tend to survive and have offspring at better rates than otherwise comparable creatures who are worse at those things. If a creature is to be flexible and capable of responding to novel situations, it can't just have a pre-programmed set of responses to different things. Instead, it needs to be able to learn how to harvest resources and avoid danger even when things are different from before.
How did evolution achieve that? Essentially, by creating a brain architecture that can, as a very very rough approximation, be seen as consisting of two different parts. One part, which a machine learning researcher might call the reward function, has the task of figuring out when various criteria - such as being hungry or getting food - are met, and issuing the rest of the system either a positive or negative reward based on those conditions. The other part, the learner, then "only" needs to find out how to best optimize for the maximum reward. (And then there is the third part, which includes any region of the brain that's neither of the above, but we don't care about those regions now.)
The mathematical theory of how to learn to optimize for rewards when your environment and reward function are unknown is reinforcement learning (RL), which recent neuroscience indicates is implemented by the brain. An RL agent learns a mapping from states of the world to rewards, as well as a mapping from actions to world-states, and then uses that information to maximize the amount of lifetime rewards it will get.
There are two major reasons why an RL agent, like a human, should learn high-level concepts:
- They make learning massively easier. Instead of having to separately learn that "in the world-state where I'm sitting naked in my cave and have berries in my hand, putting them in my mouth enables me to eat them" and that "in the world-state where I'm standing fully-clothed in the rain outside and have fish in my hand, putting it in my mouth enables me to eat it" and so on, the agent can learn to identify the world-states that correspond to the abstract concept of having food available, and then learn the appropriate action to take in all those states.
- There are useful behaviors that need to be bootstrapped from lower-level concepts to higher-level ones in order to be learned. For example, newborns have an innate preference for looking at roughly face-shaped things (Farroni et al. 2005), which develops into a more consistent preference for looking at faces over the first year of life (Frank, Vul & Johnson 2009). One hypothesis is that this bias towards paying attention to the relatively-easy-to-encode-in-genes concept of "face-like things" helps direct attention towards learning valuable but much more complicated concepts, such as ones involved in a basic theory of mind (Gopnik, Slaughter & Meltzoff 1994) and the social skills involved with it.
Viewed in this light, concepts are cognitive tools that are used for getting rewards. At the most primitive level, we should expect a creature to develop concepts that abstract over situations that are similar with regards to the kind of reward that one can gain from taking a certain action in those states. Suppose that a certain action in state s1 gives you a reward, and that there are also states s2 - s5 in which taking some specific action causes you to end up in s1. Then we should expect the creature to develop a common concept for being in the states s2 - s5, and we should expect that concept to be "more similar" to the concept of being in state s1 than to the concept of being in some state that was many actions away.
"More similar" how?
In reinforcement learning theory, reward and value are two different concepts. The reward of a state is the actual reward that the reward function gives you when you're in that state or perform some action in that state. Meanwhile, the value of the state is the maximum total reward that you can expect to get from moving that state to others (times some discount factor). So a state A with reward 0 might have value 5 if you could move from it to state B, which had a reward of 5.
Below is a figure from DeepMind's recent Nature paper, which presented a deep reinforcement learner that was capable of achieving human-level performance or above on 29 of 49 Atari 2600 games (Mnih et al. 2015). The figure is a visualization of the representations that the learning agent has developed for different game-states in Space Invaders. The representations are color-coded depending on the value of the game-state that the representation corresponds to, with red indicating a higher value and blue a lower one.
As can be seen (and is noted in the caption), representations with similar values are mapped closer to each other in the representation space. Also, some game-states which are visually dissimilar to each other but have a similar value are mapped to nearby representations. Likewise, states that are visually similar but have a differing value are mapped away from each other. We could say that the Atari-playing agent has learned a primitive concept space, where the relationships between the concepts (representing game-states) depend on their value and the ease of moving from one game-state to another.
In most artificial RL agents, reward and value are kept strictly separate. In humans (and mammals in general), this doesn't seem to work quite the same way. Rather, if there are things or behaviors which have once given us rewards, we tend to eventually start valuing them for their own sake. If you teach a child to be generous by praising them when they share their toys with others, you don't have to keep doing it all the way to your grave. Eventually they'll internalize the behavior, and start wanting to do it. One might say that the positive feedback actually modifies their reward function, so that they will start getting some amount of pleasure from generous behavior without needing to get external praise for it. In general, behaviors which are learned strongly enough don't need to be reinforced anymore (Pryor 2006).
Why does the human reward function change as well? Possibly because of the bootstrapping problem: there are things such as social status that are very complicated and hard to directly encode as "rewarding" in an infant mind, but which can be learned by associating them with rewards. One researcher I spoke with commented that he "wouldn't be at all surprised" if it turned out that sexual orientation was learned by men and women having slightly different smells, and sexual interest bootstrapping from an innate reward for being in the presence of the right kind of a smell, which the brain then associated with the features usually co-occurring with it. His point wasn't so much that he expected this to be the particular mechanism, but that he wouldn't find it particularly surprising if a core part of the mechanism was something that simple. Remember that incest avoidance seems to bootstrap from the simple cue of "don't be sexually interested in the people you grew up with".
This is, in essence, how I expect human values and human concepts to develop. We have some innate reward function which gives us various kinds of rewards for different kinds of things. Over time we develop a various concepts for the purpose of letting us maximize our rewards, and lived experiences also modify our reward function. Our values are concepts which abstract over situations in which we have previously obtained rewards, and which have become intrinsically rewarding as a result.
Getting back to conceptual expansion
Having defined these things, let's take another look at the two examples we discussed above. As a reminder, they were:
- Children learn to represent dimensions such as "height" and "volume", as well as "big" and "bright", separately at around age 5.
- As an inhabitant of the Earth, you've been used to people being unable to fly and landowners being able to forbid others from using their land. Then someone goes and invents an airplane, leaving open the question of the height to which the landowner's control extends.
I summarized my first attempt at describing the consequences of #1 as "it's a non-issue because none of the children's values have been defined in terms of the affected conceptual domain". We can now reframe it as "it's a non-issue because the [concepts that abstract over the world-states which give the child rewards] mostly do not make use of the dimension that's now been split into 'height' and 'volume'".
Admittedly, this new conceptual distinction might be relevant for estimating the value of a few things. A more accurate estimate of the volume of a glass leads to a more accurate estimate of which glass of juice to prefer, for instance. With children, there probably is some intuitive physics module that figures out how to apply this new dimension for that purpose. Even if there wasn't, and it was unclear whether it was the "tall glass" or "high-volume glass" concept that needed be mapped closer to high-value glasses, this could be easily determined by simple experimentation.
As for the airplane example, I summarized my description of it by saying that "after an option became available which forced us to include a new dimension in our existing concept of landownership, we solved the issue by considering it in terms of our existing values". We can similarly reframe this as "after the feature of 'height' suddenly became relevant for the concept of landownership, when it hadn't been a relevant feature dimension for landownership before, we redefined landownership by considering which kind of redefinition would give us the largest amounts of rewarding things". "Rewarding things", here, shouldn't be understood only in terms of concrete physical rewards like money, but also anything else that people have ended up valuing, including abstract concepts like right to ownership.
Note also that different people, having different experiences, ended up making redefinitions. No doubt some landowners felt that the "being in total control of my land and everything above it" was a more important value than "the convenience of people who get to use airplanes"... unless, perhaps, they got to see first-hand the value of flying, in which case the new information could have repositioned the different concepts in their value-space.
As an aside, this also works as a possible partial explanation for e.g. someone being strongly against gay rights until their child comes out of the closet. Someone they care about suddenly benefiting from the concept of "gay rights", which previously had no positive value for them, may end up changing the value of that concept. In essence, they gain new information about the value of the world-states that the concept of "my nation having strong gay rights" abstracts over. (Of course, things don't always go this well, if their concept of homosexuality is too strongly negative to start with.)
The Flatland case follows a similar principle: the Flatlanders have some values that declared the inside of the rectangle a forbidden space. Maybe the inside of the rectangle contains monsters which tend to eat Flatlanders. Once they learn about 3D space, they can rethink about it in terms of their existing values.
Dealing with the AI in the box
This leaves us with the AI case. We have, via various examples, taught the AI to stay in the box, which was defined in terms of classical physics. In other words, the AI has obtained the concept of a box, and has come to associate staying in the box with some reward, or possibly leaving it with a lack of a reward.
Then the AI learns about quantum mechanics. It learns that in the QM formulation of the universe, "location" is not a fundamental or well-defined concept anymore - and in some theories, even the concept of "space" is no longer fundamental or well-defined. What happens?
Let's look at the human equivalent for this example: a physicist who learns about quantum mechanics. Do they start thinking that since location is no longer well-defined, they can now safely jump out of the window on the sixth floor?
Maybe some do. But I would wager that most don't. Why not?
The physicist cares about QM concepts to the extent that the said concepts are linked to things that the physicist values. Maybe the physicist finds it rewarding to develop a better understanding of QM, to gain social status by making important discoveries, and to pay their rent by understanding the concepts well enough to continue to do research. These are some of the things that the QM concepts are useful for. Likely the brain has some kind of causal model indicating that the QM concepts are relevant tools for achieving those particular rewards. At the same time, the physicist also has various other things they care about, like being healthy and hanging out with their friends. These are values that can be better furthered by modeling the world in terms of classical physics.
In some sense, the physicist knows that if they started thinking "location is ill-defined, so I can safely jump out of the window", then that would be changing the map, not the territory. It wouldn't help them get the rewards of being healthy and getting to hang out with friends - even if a hypothetical physicist who did make that redefinition would think otherwise. It all adds up to normality.
A part of this comes from the fact that the physicist's reward function remains defined over immediate sensory experiences, as well as values which are linked to those. Even if you convince yourself that the location of food is ill-defined and you thus don't need to eat, you will still suffer the negative reward of being hungry. The physicist knows that no matter how they change their definition of the world, that won't affect their actual sensory experience and the rewards they get from that.
So to prevent the AI from leaving the box by suitably redefining reality, we have to somehow find a way for the same reasoning to apply to it. I haven't worked out a rigorous definition for this, but it needs to somehow learn to care about being in the box in classical terms, and realize that no redefinition of "location" or "space" is going to alter what happens in the classical model. Also, its rewards need to be defined over models to a sufficient extent to avoid wireheading (Hibbard 2011), so that it will think that trying to leave the box by redefining things would count as self-delusion, and not accomplish the things it really cared about. This way, the AI's concept for "being in the box" should remain firmly linked to the classical interpretation of physics, not the QM interpretation of physics, because it's acting in terms of the classical model that has always given it the most reward.
It is my hope that this could also be made to extend to cases where the AI learns to think in terms of concepts that are totally dissimilar to ours. If it learns a new conceptual dimension, how should that affect its existing concepts? Well, it can figure out how to reclassify the existing concepts that are affected by that change, based on what kind of a classification ends up producing the most reward... when the reward function is defined over the old model.
I'm reading Thinking, Fast and Slow. In appendix B I came across the following comment. Emphasis mine:
Studies of language comprehension indicate that people quickly recode much of what they hear into an abstract representation that no longer distinguishes whether the idea was expressed in an active or in a passive form and no longer discriminates what was actually said from what was implied, presupposed, or implicated (Clark and Clark 1977).
My first thought on seeing this is: holy crap, this explains why people insist on seeing relevance claims in my statements that I didn't put there. If the brain doesn't distinguish statement from implicature, and my conversational partner believes that A implies B when I don't, then of course I'm going to be continually running into situations where people model me as saying and believing B when I actually only said A. At a minimum this will happen any time I discuss any question of seemingly-morally-relevant fact with someone who hasn't trained themselves to make the is-ought distinction. Which is most people.
The next thought my brain jumped to: This process might explain the failure to make the is-ought distinction in the first place. That seems like much more of a leap, though. I looked up the Clark and Clark cite. Unfortunately it's a fairly long book that I'm not entirely sure I want to wade through. Has anyone else read it? Can someone offer more details about exactly what findings Kahneman is referencing?
This post contains no new insights; it just puts together some old insights in a format I hope is clearer.
Most satisficers are unoptimised (above the satisficing level): they have a limited drive to optimise and transform the universe. They may still end up optimising the universe anyway: they have no penalty for doing so (and sometimes it's a good idea for them). But if they can lazily achieve their goal, then they're ok with that too. So they simply have low optimisation pressure.
A safe "satisficer" design (or a reduced impact AI design) needs to be not only un-optimised, but specifically anti-optimised. It has to be setup so that "go out and optimise the universe" scores worse that "be lazy and achieve your goal". The problem is that these terms are undefined (as usual), that there are many minor actions that can optimise the universe (such as creating a subagent), and the approach has to be safe against all possible ways of optimising the universe - not just the "maximise u" for a specific and known u.
That's why the reduced impact/safe satisficer/anti-optimised designs are so hard: you have to add a very precise yet general (anti-)optimising pressure, rather than simply removing the current optimising pressure.
Part of the problem with a reduced impact AI is that it will, by definition, only have a reduced impact.
Some of the designs try and get around the problem by allowing a special "output channel" on which impact can be large. But that feels like cheating. Here is a design that accomplishes the same without using that kind of hack.
Imagine there is an asteroid that will hit the Earth, and we have a laser that could destroy it. But we need to aim the laser properly, so need coordinates. There is a reduced impact AI that is motivated to give the coordinates correctly, but also motivated to have reduced impact - and saving the planet from an asteroid with certainty is not reduced impact.
Now imagine that instead there are two AIs, X and Y. By abuse of notation, let ¬X refer to the event that the output signal from X is scrambled away from the the original output.
Then we ask X to give us the x-coordinates for the laser, under the assumption of ¬Y (that AI Y's signal will be scrambled). Similarly, we Y to give us the y-coordinates of the laser, under the assumption ¬X.
Then X will reason "since ¬Y, the laser will certainly miss its target, as the y-coordinates will be wrong. Therefore it is reduced impact to output the correct x-coordinates, so I shall." Similarly, Y will output the right y-coordinates, the laser will fire and destroy the asteroid, having a huge impact, hooray!
The approach is not fully general yet, because we can have "subagent problems". X could create an agent that behave nicely given ¬Y (the assumption it was given), but completely crazily given Y (the reality). But it shows how we could get high impact from slight tweaks to reduced impact.
EDIT: For those worried about lying to the AIs, do recall http://lesswrong.com/r/discussion/lw/lyh/utility_vs_probability_idea_synthesis/ and http://lesswrong.com/lw/ltf/false_thermodynamic_miracles/
I've recently read the decision theory FAQ, as well as Eliezer's TDT paper. When reading the TDT paper, a simple decision procedure occurred to me which as far as I can tell gets the correct answer to every tricky decision problem I've seen. As discussed in the FAQ above, evidential decision theory get's the chewing gum problem wrong, causal decision theory gets Newcomb's problem wrong, and TDT gets counterfactual mugging wrong.
In the TDT paper, Eliezer postulates an agent named Gloria (page 29), who is defined as an agent who maximizes decision-determined problems. He describes how a CDT-agent named Reena would want to transform herself into Gloria. Eliezer writes
By Gloria’s nature, she always already has the decision-type causal agents wish they had, without need of precommitment.
Eliezer then later goes on the develop TDT, which is supposed to construct Gloria as a byproduct.
Gloria, as we have defined her, is defined only over completely decision-determined problems of which she has full knowledge. However, the agenda of this manuscript is to introduce a formal, general decision theory which reduces to Gloria as a special case.
Why can't we instead construct Gloria directly, using the idea of the thing that CDT agents wished they were? Obviously we can't just postulate a decision algorithm that we don't know how to execute, and then note that a CDT agent would wish they had that decision algorithm, and pretend we had solved the problem. We need to be able to describe the ideal decision algorithm to a level of detail that we could theoretically program into an AI.
Consider this decision algorithm, which I'll temporarily call Nameless Decision Theory (NDT) until I get feedback about whether it deserves a name: you should always make the decision that a CDT-agent would have wished he had pre-committed to, if he had previously known he'd be in his current situation and had the opportunity to precommit to a decision.
In effect, you are making an general precommittment to behave as if you made all specific precommitments that would ever be advantageous to you.
NDT is so simple, and Eliezer comes so close to stating it in his discussion of Gloria, that I assume there is some flaw with it that I'm not seeing. Perhaps NDT does not count as a "real"/"well defined" decision procedure, or can't be formalized for some reason? Even so, it does seem like it'd be possible to program an AI to behave in this way.
Can someone give an example of a decision problem for which this decision procedure fails? Or for which there are multiple possible precommitments that you would have wished you'd made and it's not clear which one is best?
EDIT: I now think this definition of NDT better captures what I was trying to express: You should always make the decision that a CDT-agent would have wished he had precommitted to, if he had previously considered the possibility of his current situation and had the opportunity to costlessly precommit to a decision.
I posted a stupid question a couple of weeks ago and got some good feedback.
Runnable Code - fork it and mess around with it:
I'd love some more feedback and opinions.
A couple of other things for context:
hypercapital.info - all about hypercapitalism
This isn't a trick question, nor do I have a particular answer in mind.
Tomorrow, all of your memories are going to be wiped. There is a crucial piece of information that you need to make sure you remember, and more specifically, you need to be very confident you were the one that sent this message and not a third party pretending to be you.
How do you go about transmitting, "signing", and verifying such a message*?
--edit: I should have clarified that one of the assumptions is that some malicious third party can/will be attempting to send you false information from "yourself" and you need to distinguish between that and what's really you.
--edit2: this may be formally impossible, I don't actually know. If anyone can demonstrate this I'd be very appreciative.
--edit3: I don't have a particular universal definition for the term "memory wipe" in mind, mainly because I didn't want to pigeonhole the discussion. I think this pretty closely mimics reality. So I think it's totally fine to say, "If you retain this type of memory, then I'd do X."
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
following on from this thread:
User Algon asked:
I don't drink alcohol, but is it really all that? I just assumed that most people have alcoholic beverages for the 'buzz'/intoxication.
I related my experience:
I have come to the conclusion that I taste things differently to a large subset of the population. I have a very sweet tooth and am very sensitive to bitter flavours.
I don't eat olives, most alcohol only tastes like the alcoholic aftertaste (which apparently some people don't taste) - imagine the strongest burning taste of the purest alcohol you have tasted, some people never taste that, I taste it with nearly every alcoholic beverage. Beer is usually awfully bitter too.
The only wine I could ever bother to drink is desert wine (its very sweet) and only slowly. (or also a half shot of rum and maple syrup)
Having said all this - yes; some people love their alcoholic beverages for their flavours.
I am wondering what the sensory experience of other LW users is of alcohol. Do you drink (if not why not?)? Do you have specific preferences? Do you have a particular pallet for foods (probably relevant)?
I hypothesise a lower proportion of drinkers than the rest of the population. (subject of course to cultural norms where you come from)
Edit: I will make another post in a week about taste preferences because (as we probably already know) human tastes vary. I did want to mention that I avoid spicy things except for sweet chilli which is not spicy at all. And I don't drink coffee (because it tastes bad and I am always very awake and never need caffeine to wake me up). I am also quite sure I am a super-taster but wanted to not use that word for concern that the jargon might confuse people who don't yet know about it.
Thanks for all the responses! This has been really interesting and exactly what I expected (number of posts)!
In regards to experiences, I would mention that heavy drinking is linked with nearly every health problem you could think of and I am surprised we had a selection of several heavy drinkers (to those who are heavy drinkers I would suggest reading about the health implications and reconsidering the lifestyle, it sounds like most of you are not addicted). about the heavy drinkers - I suspect that is not representative of average, but rather the people who feel they are outliers decided to mention their cases (of people who did not reply; there are probably none or very few heavy drinkers, whereas there are probably some who did not reply and are light drinkers or did not reply and don't drink).
I hope to reply to a bunch of the comments and should get to it in the next few days.
Thank you again! Maybe this should be included on the next survey...
It occurred to me that the anti-Pascaline agent design could be used as part of a satisficer approach.
The obvious thing to reduce dangerous optimisation pressure is to make a bounded utility function, with an easily achievable bound. Such as giving them a utility linear in paperclips that maxs out at 10.
The problem with this is that, if the entity is a maximiser (which it might become), it can never be sure that it's achieved its goals. Even after building 10 paperclips, and an extra 2 to be sure, and an extra 20 to be really sure, and an extra 3^^^3 to be really really sure, and extra cameras to count them, with redundant robots patrolling the cameras to make sure that they're all behaving well, etc... There's still an ε chance that it might have just dreamed this, say, or that its memory is faulty. So it has a current utility of (1-ε)10, and can increase this by reducing ε - hence by building even more paperclips.
Hum... ε, you say? This seems a place where the anti-Pascaline design could help. Here we would use it at the lower bound of utility. It currently has probability ε of having utility < 10 (ie it has not built 10 paperclips) and (1-ε) of having utility = 10. Therefore and anti-Pascaline agent with ε lower bound would round this off to 10, discounting the unlikely event that it has been deluded, and thus it has no need to build more paperclips or paperclip counting devices.
Note that this is an un-optimising approach, not an anti-optimising one, so the agent may still build more paperclips anyway - it just has no pressure to do so.
This summary was posted to LW main on April 10th. The following week's summary is here.
Irregularly scheduled Less Wrong meetups are taking place in:
- Atlanta Meetup: Game Night: 11 April 2015 02:05PM
- Australian Less Wrong Mega Meetup #2: 17 July 2015 07:00PM
- Australia-wide Mega-Camp!: 17 July 2015 07:00PM
- Cologne meetpu: 11 April 2015 05:00PM
- [Edinburgh] LessWrong Scotland: 12 April 2015 02:00PM
- European Community Weekend 2015: 12 June 2015 12:00PM
- Munich April Meetup: 18 April 2015 02:00PM
- [Netherlands] Effective Altruism Netherlands: Small actions, big impact (part 2): 12 April 2015 02:19PM
- Saint Petersburg meetup. Habits and spring.: 18 April 2015 06:00PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- Austin, TX - Caffe Medici - Why People Fail: 11 April 2015 01:30PM
- Canberra: A Sequence Post You Disagreed With + Discussion: 11 April 2015 06:00PM
- Moscow meetup: lectures and tournaments: 12 April 2015 02:00PM
- Sydney Rationality Dojo - Skill Training Part 2: 12 April 2015 05:00PM
- [Vienna] Rationality Meetup Vienna: 18 April 2015 02:00PM
- [Vienna] Rationality Meetup Vienna: 09 May 2015 02:00PM
- Washington, D.C.: Cherry Blossoms: 12 April 2015 03:00PM
Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Cambridge UK, Canberra, Columbus, London, Madison WI, Melbourne, Moscow, Mountain View, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers.
Did you know The surprising downsides of being clever? Is Happiness And Intelligence: Rare Combination? There are longitudinal studies which seem to imply this: Being Labeled as Gifted, Self-appraisal, and Psychological Well-being: A Life Span Developmental Perspective
I found these via slashdot.
As LessWrong is harbor to unusually high-IQ people (see section B in here). I wonder how happiness compares to the mean. What are your thoughts.