New Alzheimer’s treatment fully restores memory function in mice
The team reports fully restoring the memory function of 75 percent of the mice they tested it on, with zero damage to the surrounding brain tissue.
"We’re extremely excited by this innovation of treating Alzheimer’s without using drug therapeutics."
The team says they’re planning on starting trials with higher animal models, such as sheep, and hope to get their human trials underway in 2017.
http://www.sciencealert.com/new-alzheimer-s-treatment-fully-restores-memory-function
Does the Utility Function Halt?
Suppose, for a moment, that somebody has written the Utility Function. It takes, as its input, some Universe State, runs it through a Morality Modeling Language, and outputs a number indicating the desirability of that state relative to some baseline, and more importantly, other Universe States which we might care to compare it to.
Can I feed the Utility Function the state of my computer right now, as it is executing a program I have written? And is a universe in which my program halts superior to one in which my program wastes energy executing an endless loop?
If you're inclined to argue that's not what the Utility Function is supposed to be evaluating, I have to ask what, exactly, it -is- supposed to be evaluating? We can reframe the question in terms of the series of keys I press as I write the program, if that is an easier problem to solve than what my computer is going to do.
An extended class of utility functions
This is a technical result that I wanted to check before writing up a major piece on value loading.
The purpose of a utility function is to give an agent criteria with which to make a decision. If two utility functions always give the same decisions, they're generally considered the same utility function. So, for instance, the utility function u always gives the same decisions as u+C for some constant C, or Du for some positive constant D. Thus we can say that utility functions are equivalent if they are related by a positive affine transformation.
For specific utility functions, and specific agents, the class of functions that give the same decisions is quite a bit larger. For instance, imagine that v is a utility function with the property v("any universe which contains humans")=constant. Then any human who attempts to follow u, could equivalently follow u+v (neglecting acausal trade) - it makes no difference. In general, if no action the agent could ever take would change the value of v, then u and u+v give the same decisions.
More subtly, if the agent can change v but cannot change the expectation of v, then u and u+v still give the same decisions. This is because for any actions a and b the agent could take:
E(u+v | a) = E(u | a) + E(v | a) = E(u | a) + E(v | b).
Hence E(u+v | a) > E(u+v | b) if and only if E(u | a) > E(u | b), and so the decision hasn't changed.
Note that E(v | a) need not be constant for all actions: simply that for every actions and b that an agent could take at a particular decision point, E(v | a) = E(v | b). It's perfectly possible for the expectation of v to be different at different moments, or conditional on different decisions made at different times.
Finally, as long as v obeys the above properties, there is no reason for it to be a utility function in the classical sense - it could be constructed any way we want.
An example: suffer not from probability, nor benefit from it
The preceding seems rather abstract, but here is the motivating example. It's a correction term T that adds or subtracts utility, as external evidence comes in (it's important that the evidence is external - the agent gets no correction from knowing what its own actions are/were). If the AI knows evidence e, and new (external) evidence f comes in, then its utility gets adjusted by T(e,f) which is defined as
T(e,f) = E(u | e) - E(u | e, f)
In other words, the agents utility gets adjusted by the difference between the new expected utility and the old - and hence the agent's expected utility is unchanged by new external evidence.
Consider for instance an agent with a utility u linear in money. It much choose between a bet that goes 50-50 on $0 (heads) or $100 (tails), versus a sure $49. It correctly choose the bet, having an expected utility of u=$50 - in other words, E(u, bet)=$50. But now imagine that the coin comes out heads. The utility u plunges to $0 (in other words E(u | bet, heads)=0). But the correction term cancels that out:
u(bet, heads) + T(bet, heads) = $0 + E(u | bet) - E(u |bet, heads) = $0 + $50 -$0 = $50.
A similar effect leaves utility unchanging if the coin is tails, cancelling the increase. In other words, adding the T correction term removes the impact of stochastic effects on utility.
But the agent will still make the same decisions. This is because before seeing evidence f, it cannot predict its impact on EU(u). In other words, summing over all possible evidences f:
E(u | e) = Σ p(f)E(u | e, f),
which is another way of phrasing "conservation of expected evidence". This implies that
E(T(e,-)) = Σ p(f)T(e,f)
= Σ p(f)((E(u | e) - E(u | e, f))
= E(u | e) - Σ p(f)E(u | e, f)
= 0,
and hence that adding the T term does not change the agent's decisions. All the various corrections add on to the utility as the agent continues making decisions, but none of them make the agent change what it does.
The relevance of this will be explained in a subsequent post (unless someone finds an error here).
Military Rationalities and Irrationalities
In response to the question
"Does anyone happen to know of reliable ways for increasing one's supply of executive function, by the way? I seem to run out of it very quickly in general."
(Kaj_Solata)
I posted that my military experience seems effectively designed to increase executive function. Some examples of this from myself and metastable are
Uniforms- not having to think about your wardrobe, ever, saves a lot of time, mental effort, and money. Steve Jobs and President Obama are known for also using uniforms specifically for this purpose.
PT- Daily, routinized exercise. Done in a way that very few people are deciding what comes next.
-Maximum use of daylight hours
Med Group and Force Support-Minimized high-risk projects outside of workplace (paternalistic health care, insurance, and in many cases, housing and continuing education.)
After a moment's thought it occurred to me that there are some double-edged swords in Military Rationality as well, some of which lead to classic jokes like 'Military Intelligence is an oxymoron.'
Regulations- A select few 'experts' create policies which everyone else is required to follow at all times. Unfortunately these experts are never (never ever) encouraged to consider knock-on effects. Ugh.
Anybody else have insights on the military they want to share here? I feel a couple of good posts on increasing executive function might come out of a discussion on the rationalities and irrationalities of the armed forces.
From Capuchins to AI's, Setting an Agenda for the Study of Cultural Cooperation (Part2)
This is a multi-purpose essay-on-the-making, it is being written aiming at the following goals 1) Mandatory essay writing at the end of a semester studying "Cognitive Ethology: Culture in Human and Non-Human Animals" 2) Drafting something that can later on be published in a journal that deals with cultural evolution, hopefully inclining people in the area to glance at future oriented research, i.e. FAI and global coordination 3) Publishing it in Lesswrong and 4) Ultimately Saving the World, as everything should. If it's worth doing, it's worth doing in the way most likely to save the World. Since many of my writings are frequently too long for Lesswrong, I'll publish this in a sequence-like form made of self-contained chunks. My deadline is Sunday, so I'll probably post daily, editing/creating the new sessions based on previous commentary.
Abstract: The study of cultural evolution has drawn much of its momentum from academic areas far removed from human and animal psychology, specially regarding the evolution of cooperation. Game theoretic results and parental investment theory come from economics, kin selection models from biology, and an ever growing amount of models describing the process of cultural evolution in general, and the evolution of altruism in particular come from mathematics. Even from Artificial Intelligence interest has been cast on how to create agents that can communicate, imitate and cooperate. In this article I begin to tackle the 'why?' question. By trying to retrospectively make sense of the convergence of all these fields, I contend that further refinements in these fields should be directed towards understanding how to create environmental incentives fostering cooperation.
We need systems that are wiser than we are. We need institutions and cultural norms that make us better than we tend to be. It seems to me that the greatest challenge we now face is to build them. - Sam Harris, 2013, The Power Of Bad Incentives
1) Introduction
2) Cultures evolve
Culture is perhaps the most remarkable outcome of the evolutionary algorithm (Dennett, 1996) so far. It is the cradle of most things we consider humane - that is, typically human and valuable - and it surrounds our lives to the point that we may be thought of as creatures made of culture even more than creatures of bone and flesh (Hofstadter, 2007; Dennett, 1992). The appearance of our cultural complexity has relied on many associated capacities, among them:
1) The ability to observe, be interested by, and go nearby an individual doing something interesting, an ability we share with norway rats, crows, and even lemurs (Galef & Laland, 2005).
2) Ability to learn from and scrounge the food of whoever knows how to get food, shared by capuchin monkeys (Ottoni et al, 2005).
3) Ability to tolerate learners, to accept learners, and to socially learn, probably shared by animals as diverse as fish, finches and Fins (Galef & Laland, 2005).
4) Understanding and emulating other minds - Theory of Mind - empathizing, relating, perhaps re-framing an experience as one's own, shared by chimpanzees, dogs, and at least some cetaceans (Rendella & Whitehead, 2001).
5) Learning the program level description of the action of others, for which the evidence among other animals is controversial (but see Cantor & Whitehead, 2013). And finally...
6) Sharing intentions. Intricate understanding of how two minds can collaborate with complementary tasks to achieve a mutually agreed goal (Tomasello et al, 2005).
Irrespective of definitional disputes around the true meaning of the word "culture" (which doesn't exist, see e.g. Pinker, 2007 pg115; Yudkowsky 2008A), each of these is more cognitively complex than its predecessor, and even (1) is sufficient for intra-specific non-environmental, non-genetic behavioral variation, which I will call "culture" here, whoever it may harm.
By transitivity, (2-6) allow the development of culture. It is interesting to notice that tool use, frequently but falsely cited as the hallmark of culture, is ubiquitously equiprobable in the animal kingdom. A graph showing, per biological family, which species shows tool use gives us a power law distribution, whose similarity with the universal prior will help in understanding that being from a family where a species uses tools tells us very little about a specie's own tool use (Michael Haslam, personal conversation).
Once some of those abilities are available, and given an amount of environmental facilities, need, and randomness, cultures begin to form. Occasionally, so do more developed traditions. Be it by imitation, program level imitation, goal emulation or intention sharing, information is transmitted between agents giving rise to elements sufficient to constitute a primeval Darwinian soup. That is, entities form such that they exhibit 1)Variation 2)Heredity or replication 3)Differential fitness (Dennett, 1996). In light of the article Five Misunderstandings About Cultural Evolution (Henrich, Boyd & Richerson, 2008) we can improve Dennett's conditions for the evolutionary algorithm as 1)Discrete or continuous variation 2)Heredity, replication, or less faithful replication plus content attractors 3)Differential fitness. Once this set of conditions is met, an evolutionary algorithm, or many, begin to carve their optimizing paws into whatever surpassed the threshold for long enough. Cultures, therefore, evolve.
The intricacies of cultural evolution and mathematical and computational models of how cultures evolve have been the subject of much interdisciplinary research, for an extensive account of human culture see Not By Genes Alone (Richerson & Boyd, 2005). For computational models of social evolution, there is work by Mesoudi, Novak, and others e.g. (Hauert et al, 2007). For mathematical models, the aptly named Mathematical models of social evolution: A guide for the perplexed by McElrath and Rob Boyd (2007) makes the textbook-style walk-through. For animal culture, see (Laland & Galef, 2009).
Cultural evolution satisfies David Deutsch's criterion for existence, it kicks back, it satisfies the evolutionary equivalent of the condition posed by the Quine-Putnam Indispensability argument in mathematics, i.e. it is a sine qua non condition for understanding how the World works nomologically. It is falsifiable to Popperian content, and it inflates the Worlds ontology a little, by inserting a new kind of "replicator", the meme. Contrary to what happened on the internet, the name 'meme' has lost much of it's appeal within cultural evolution theorists, and "memetics" is considered by some to refer only to the study of memes as monolithic atomic high fidelity replicators, which would make the theory obsolete. This has created the following conundrum: the name 'meme' remains by far the most well known one to speak of "that which evolves culturally" within, and specially outside, the specialist arena. Further, the niche occupied by the word 'meme' is so conceptually necessary within the area to communicate and explain that it is frequently put under scare quotes, or some other informal excuse. In fact, as argued by Tim Tyler - who frequently posts here - in the very sharp Memetics (2010), there are nearly no reasons to try to abandon the 'meme' meme, and nearly all reasons (practicality, Qwerty reasons, mnemonics) to keep it. To avoid contradicting the evidence ever since Dawkins first coined the term, I suggest we must redefine Meme as an attractor in cultural evolution (dual-inheritance) whose development over time structurally mimics to a significant extent the discrete behavior of genes, frequently coinciding with the smallest unit of cultural replication. The definition is long, but the idea is simple: Memes are not the best analogues of genes because they are discrete units that replicate just like genes, but because they are continuous conceptual clusters being attracted to a point in conceptual space whose replication is just like that of genes. Even more simply, memes are the mathematically closest things to genes in cultural evolution. So the suggestion here is for researchers of dual-inheritance and cultural evolution to take off the scare quotes of our memes and keep business as usual.
The evolutionary algorithm has created a new attractor-replicator, the meme, it didn't privilege with it any specific families in the biological trees and it ended up creating a process of cultural-genetic coevolution known as dual-inheritance. This process has been studied in ever more quantified ways by primatologists, behavioral ecologists, population biologists, anthropologists, ethologists, sociologists, neuroscientists and even philosophers. I've shown at least six distinct abilities which helped scaffold our astounding level of cultural intricacy, and some animals who share them with us. We will now take a look at the evolution of cooperation, collaboration, altruism, moral behavior, a sub-area of cultural evolution that saw an explosion of interest and research during the last decade, with publications (most from the last 4 years) such as The Origins of Morality, Supercooperators, Good and Real, The Better Angels of Our Nature, Non-Zero, The Moral Animal, Primates and Philosophers, The Age of Empathy, Origins of Altruism and Cooperation, The Altruism Equation, Altruism in Humans, Cooperation and Its Evolution, Moral Tribes, The Expanding Circle, The Moral Landscape.
3) Cooperation evolves
Despite the selfish nature of genes (Dawkins, 1999) and other units of Darwinian transmission (Jablonka & Lamb, 2007), altruism at the individual level (cost to self for benefit to other) can and does arise because of several intertwined factors.
1) Alleles (the molecular biologist word for what less-specialized areas call genes) under normal conditions optimize for there being more copies of themselves in the future. This happens regardless of whether it is that physical instantiation - also known as token - that is present in the future.
2) Copies of alleles are spread over space, individuals, groups, species and time, but they only care about the time dimension and the quantity dimension. In the long run alleles don't thrive if they are doing better than their neighbors, they thrive if they are doing better than the average allele. A token (instantiation) of an allele that codes for cancer, multiplying itself uncontrollably could, had he a mind, think he's doing great, but if the mutation that gave rise to it only happened in somatic cells (that do not go through the germ line), he'd be in for a surprise. One reason why biologists say natural selection is short-sighted.
3) The above reasoning applies exactly equally and for the same reasons to an allele that codes for individual-selfish behavior in a species in which more altruist groups tend to outlive more egotistic ones. The allele for individual-selfishness, and the selfish individual, may think they are doing great, comparing to their neighbors, when all of a sudden, with high probability, their group dies. Altruism wins in this case not because there is a new spooky unit of selection that reverses reductionism, and applies downward causation which originates in groups. Altruism thrives because the average long term fitness of each allele that coded for it was higher than that of genes that code for individual-selfish behavior. Group selectionc - as well as superoganism selection, somatic cells selection, species selection and individual selection - only happens when the selective forces operating on that level coincide with the allele's fitness increasing in relation to all the competing alleles. (Group selectionc is selection for altruist genes at the group level, the only definition under which the entire discussion was dealing with a controversy of substance instead of talking past each other, as brilliantly explained in this post by PhilGoetz, 2010, please read the case study section in that post to get a more precise understanding than the above short definition). See also the excursus on what a fitness function is below.
4) Completely independent from the reasons in (3), alleles, epigenetics, and learning can program individuals to be cooperative if they "expect" (consciously or not) the interaction with another individual, say, Malou, to: (a) Begin a cycle of reciprocation with Malou in the future whose benefit exceeds the current cost being paid; (b) Counterfactually increase their reputation with sufficiently many individuals that those will award more benefit than current cost; (c) Avoid being punished by third parties; (d) Conform to, or help enforce, by setting an example, social norms and rules upon which selection pressures act (Tomasello, 2005). A key notion in all these mechanisms based on this encoded "expectation" is that uncertainty must be present. In the absence of uncertainty, a state that doesn't exist in nature, an agent in a prisoner dilemma like interaction would be required to defect instead of cooperating from round one, predicting the backwards-in-time cascade of defection from whichever was the last round of interaction, in which by definition cooperating is worse. The problems that in Lesswrong people are trying to solve using Timeless Decision Theory, Updateless Decision Theory, PrudentBot, and other IQ140+ gimmicks, evolution solved by inserting stupidity! More precisely by embracing higher level uncertainty about how many future interactions will there be. Kissing, saying "I love you", becoming engaged, and getting married are all increasingly honest ways in which the computer program programmed by your alleles informs Malou that there will be more cooperation and less defection in the future.
5) Finally, altruism only poses paradoxes of the "Group Selectionc" kind when we are trying to explain why a replicator that codes for Altruism emerged? And we are trying to explain it at that replicators level. It is no mystery why a composition of the phenotypic effects of a gene (replicator) and two memes (attractor-replicators) in all individuals who posses the three of them makes them altruistic, if it does. Each gene and meme in that composition may be fending for itself, but as things turn out, they do make some really nice people (or bonobos) once their extended phenotypes are clustered within those people. If we trust Jablonka & Lamb (2007), there are four streams of heredity flowing concomitantly: Genetic, Epigenetic, Behavioral and Symbolic. Some of the flowing hereditary entities are not even attractor-replicators (niche construction for instance), they don't exhibit replicator dynamics and any altruism that spreads through them requires no special explanation at all!
To the best of my knowledge, none of the 5 factors above, which all do play a role in the existence and maintenance of altruism, requires a revision of Neodarwinism of the Dawkins, Dennett, Trivers, Pinker sort. None of them challenges the validity of our models of replicator dynamics as replicator dynamics. None of them challenges the metaphysically fundamental notion of Darwinism as Universal Acid (Dennett,1996). None of them compromises the claim that everything in the universe that has complex design of which we are aware can be traced back to Darwinian mind-less processes operating, by and large, in replicator-like entities (Dennett, opus cit). None of them poses an obstacle to physicalist reductionism - in this biology-ladden context being the claim that all macrophysical facts, including biological facts, are materially determined by the microphysical facts.
Cooperation evolves, and altruism evolves. They evolve for natural, non-mysterious reasons, and before any more shaking of the edifice of Darwinism is made, and it's constitutive reductionism or universal corrosive powers are contested, any counteracting evidence must be able to traverse undetectably by the far less demanding possibility of being explained by any of the factors above or a combination of them, or being simply the result of one of the many confusions clarified in the excursus below. Despite many people's attempts to look for Skyhooks that would cast away the all-too-natural demons of Neodarwinism and reductionism, things remain as they were before, Cranes all the way up. I will be listening attentively for a case of altruism found in the biological world or mathematical simulations based on it that can pierce through these many layers of epistemic explanatory ability, but I won't be holding my breadth.
Excursus: What is a fitness function?
It is worth pointing out here not only that the altruism and group selection confusion happens, but showing why it does. And PhilGoetz did half of the explanatory job already. The other half is noticing that the fitness function is a many-place function (there is a newer and better post on Lesswrong explaining many-place functions/words, but I didn't find it in 12min, please point to it if you can). The complicated description of "what the fitness function is", in David Lewis's manner of speaking, would be that it is a function from things to functions from functions to functions. More understandably, with e.g. the specific "thing" being a token of an altruistic allele of kind "Aallele", call it "Aallele334":
Aallele344--1-->((number of Aalleles--3-->total number of alleles)--2-->(amplitude configuration slice--4-->simplest ordering))
Here arrow 4 is the function we call time from a timeless physics, quantum physics perspective. Just substitute the whole parenthesis for "time" instead if you haven't read the Quantum Physics sequence. Arrow 3 is how good Aalleles are doing, i.e. how many of them there are in relation to the total number of competing alleles. Arrow 2 is how this relation between Aalleles and total varies over time. The fitness function is arrow 1, once you are given a specific token of an allele, it is the function that describes how well copies of that token do over time in relation to all the competing alleles. Needless to say, not many biologists are aware of that complex computation.
The reason why the unexplained half of controversies happen is that the punctual fitness of an allele will appear very different when you factor it against the competing alleles of other cells, of other individuals, of other groups, or of other species. Fitness is what philosophers call an externalist concept, if you increase the amount of contextually relevant surroundings, the output number changes significantly. It will also appear very different when you factor it for final time T1 or T2. The fitness of an allele coding for a species specific characteristic of T-Rex's large bodies will be very high if the final time is 65 million years ago, but negative if 64.
I remember Feynman saying, I believe in this interview, that it is amazing what the eye does. Surrounded in a 3d equivalent of an insect floating up and down in the 2d surface of a swimming pool, we manage to abstract away all the waves going through the space between us and a seen object, and still capture information enough to locate it, interact with it, and admire it. It is as if the insect could tell only from his vertical oscillations how many children were in the pool, where they were located etc. The state of knowledge in many fields, adaptive fitness included, strikes me as similarly amazing. If this many-place function underlies what biologists should be talking about to avoid talking past each other, how can many of them be aware of only one or two of the many variables that should be input, and still be making good science? Or are they?
If you fail to see hidden variables, you can fall prey to anomalies like the Simpson's paradox, which is exactly the mistake described in PhilGoetz's post on group/species selection.
The function above also works for things other than alleles, like individuals with a characteristic, in which case it will be calculating the fitness of having that characteristic at the individual level.
4) The complexity of cultural items doesn't undermine the validity of mathematical models.
4.1) Cognitive attractors and biases substitute for memes discreteness
The math becomes equivalent.
4.2) Despite the Unilateralist Curse and the Tragedy of the Commons, dyadic interaction models help us understand large scale cooperation
Once we know these two failure modes, dyadic iterated (or reputation-sensitive) interaction is close enough.
5) From Monkeys to Apes to Humans to Transhumans to AIs, the ranges of achievable altruistic skill.
Possible modes of being altruistic. Graph like Bostrom's. Second and third order punishment and cooperation. Newcomb-like signaling problems within AI.
6) Unfit for the Future: the need for greater altruism.
We fail and will remain failing in Tragedy of the Commons problems unless we change our nature.
7) From Science, through Philosophy, towards Engineering: the future of studies of altruism.
Philosophy: Existential Risk prevention through global coordination and cooperation prior to technical maturity. Engineering Humans: creating enhancements and changing incentives. Engineering AI's: making them better and realer.
8) A different kind of Moral Landscape
Like Sam Harris's one, except comparing not how much a society approaches The Good Life (Moral Landscape pg15), but how much it fosters altruistic behavior.
9) Conclusions
Not yet.
Bibliography (Only of the parts already written, obviously):
Boyd, R., Gintis, H., Bowles, S., & Richerson, P. J. (2003). The evolution of altruistic punishment. Proceedings of the National Academy of Sciences, 100(6), 3531-3535.
Cantor, M., & Whitehead, H. (2013). The interplay between social networks and culture: theoretically and among whales and dolphins. Philosophical Transactions of the Royal Society B: Biological Sciences, 368(1618).
Dawkins, R. (1999). The extended phenotype: The long reach of the gene. Oxford University Press, USA.
Dennett, D. C. (1996). Darwin's dangerous idea: Evolution and the meanings of life (No. 39). Simon & Schuster.
Dennett, D. C. (1992). The self as a center of narrative gravity. Self and consciousness: Multiple perspectives.
Galef Jr, B. G., & Laland, K. N. (2005). Social learning in animals: empirical studies and theoretical models. Bioscience, 55(6), 489-499.
Hauert, C., Traulsen, A., Brandt, H., Nowak, M. A., & Sigmund, K. (2007). Via freedom to coercion: the emergence of costly punishment. science, 316(5833), 1905-1907.
Henrich, J., Boyd, R., & Richerson, P. J. (2008). Five misunderstandings about cultural evolution. Human Nature, 19(2), 119-137.
Hofstadter, D. R. (2007). I am a Strange Loop. Basic Books
Jablonka, E., & Lamb, M. J. (2007). Precis of evolution in four dimensions. Behavioral and Brain Sciences, 30(4), 353-364.
McElreath, R., & Boyd, R. (2007). Mathematical models of social evolution: A guide for the perplexed. University of Chicago Press.
Ottoni, E. B., de Resende, B. D., & Izar, P. (2005). Watching the best nutcrackers: what capuchin monkeys (Cebus apella) know about others’ tool-using skills. Animal cognition, 8(4), 215-219.
Persson, I., & Savulescu, J. Unfit for the Future: The Need for Moral Enhancement Oxford: Oxford University Press, 2012 ISBN 978-0199653645 (HB)£ 21.00. 160pp. On the brink of civil war, Abraham Lincoln stood on the steps of the US Capitol and appealed.
PhilGoetz. (2010), Group selection update. Available at http://lesswrong.com/lw/300/group_selection_update/
Pinker, S. (2007). The stuff of thought: Language as a window into human nature. Viking Adult.
Rendella, L., & Whitehead, H. (2001). Culture in whales and dolphins.Behavioral and Brain Sciences, 24, 309-382.
Richardson, P. J., & Boyd, R. (2005). Not by genes alone. University of Chicago Press.
Tyler, T. (2011). Memetics: Memes and the Science of Cultural Evolution. Tim Tyler.
Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition.Behavioral and brain sciences, 28(5), 675-690.
Yudkowsky, E. (2008A). 37 ways words can be wrong. Available at http://lesswrong.com/lw/od/37_ways_that_words_can_be_wrong/
Inferring Values from Imperfect Optimizers
One approach to constructing a Friendly artificial intelligence is to create a piece of software that looks at large amounts of evidence about humans, and attempts to infer their values. I've been doing some thinking about this problem, and I'm going to talk about some approaches and problems that have occurred to me.
In a naive approach, we might define the problem like this: take some unknown utility function, U, and plug it into a mathematically clean optimization process (like AIXI) O. Then, look at your data set and take the information about the inputs and outputs of humans, and find the simplest U that best explains human behavior.
Unfortunately, this won't work. The best possible match for U is one that models not just those elements of human utility we're interested in, but also all the details of our broken, contradictory optimization process. The U we derive through this process will optimize for confirmation bias, scope insensitivity, hindsight bias, the halo effect, our own limited intelligence and inefficient use of evidence, and just about everything else that's wrong with us. Not what we're looking for.
Okay, so let's try putting a bandaid on it - let's go back to our original problem setup. However, we'll take our original O, and use all of the science on cognitive biases at our disposal to handicap it. We'll limit its search space, saddle it with a laundry list of cognitive biases, cripple its ability to use evidence, and in general make it as human-like as we possibly can. We could even give it akrasia by implementing hyperbolic discounting of reward. Then we'll repeat the original process to produce U'.
If we plug U' into our AI, the result will be that it will optimize like a human who had suddenly been stripped of all the kinds of stupidity that we programmed into our modified O. This is good! Plugged into a solid CEV infrastructure, this might even be good enough to produce a future that's a nice place to live. However, it's not quite ideal. If we miss a cognitive bias, then it'll be incorporated into the learned utility functions, and we may never be rid of it. What would be nice would be if we could get the AI to learn about cognitive biases, exhaustively, and update in the future if it ever discovered a new one.
If we had enough time and money, we could do this the hard way: acquire a representative sample of the human population, and pay them to perform tasks with simple goals under tremendous surveillance, and have the AI derive the human optimization process from the actions taken towards a known goal. However, if we assume that the human optimization process can be defined as a function over the state of the human brain, we should not trust the completeness of any such process learned from less data than the entropy of the human brain, which is on the order of tens of petabytes of extremely high quality evidence. If we want to be confident in the completeness of our model, we may need more experimental evidence than it is really practical to accumulate. Which isn't to say that this approach is useless - if we can hit close enough to the mark, then the AI may be able to run more exhaustive experimentation later and refine its own understanding of human brains to be closer to the ideal.
But it'd really be nice if our AI could do unsupervised learning to figure out the details of human optimization. Then we could simply dump the internet into it, and let it grind away at the data and spit out a detailed, complete model of human decision-making, from which our utility function could be derived. Unfortunately, this does not seem to be a tractable problem. It's possible that some insight could be gleaned by examining outliers with normal intelligence, but deviant utility functions (I am thinking specifically of sociopaths), but it's unclear how much insight can be produced by these methods. If anyone has suggestions for a more efficient way of going about it, I'd love to hear it. As it stands, it might be possible to get enough information from this to supplement a supervised learning approach - the closer we get to a perfectly accurate model, the higher the probability of Things Going Well.
Anyways, that's where I am right now. I just thought I'd put up my thoughts and see if some fresh eyes see anything I've been missing.
Cheers,
Niger
Universal agents and utility functions
I'm Anja Heinisch, the new visiting fellow at SI. I've been researching replacing AIXI's reward system with a proper utility function. Here I will describe my AIXI+utility function model, address concerns about restricting the model to bounded or finite utility, and analyze some of the implications of modifiable utility functions, e.g. wireheading and dynamic consistency. Comments, questions and advice (especially about related research and material) will be highly appreciated.
Introduction to AIXI
Marcus Hutter's (2003) universal agent AIXI addresses the problem of rational action in a (partially) unknown computable universe, given infinite computing power and a halting oracle. The agent interacts with its environment in discrete time cycles, producing an action-perception sequence with actions (agent outputs)
and perceptions (environment outputs)
chosen from finite sets
and
. The perceptions are pairs
, where
is the observation part and
denotes a reward. At time k the agent chooses its next action
according to the expectimax principle:
Here M denotes the updated Solomonoff prior summing over all programs that are consistent with the history
[1] and which will, when run on the universal Turing machine T with successive inputs
, compute outputs
, i.e.
AIXI is a dualistic framework in the sense that the algorithm that constitutes the agent is not part of the environment, since it is not computable. Even considering that any running implementation of AIXI would have to be computable, AIXI accurately simulating AIXI accurately simulating AIXI ad infinitem doesn't really seem feasible. Potential consequences of this separation of mind and matter include difficulties the agent may have predicting the effects of its actions on the world.
Utility vs rewards
So, why is it a bad idea to work with a reward system? Say the AIXI agent is rewarded whenever a human called Bob pushes a button. Then a sufficiently smart AIXI will figure out that instead of furthering Bob’s goals it can also threaten or deceive Bob into pushing the button, or get another human to replace Bob. On the other hand, if the reward is computed in a little box somewhere and then displayed on a screen, it might still be possible to reprogram the box or find a side channel attack. Intuitively you probably wouldn't even blame the agent for doing that -- people try to game the system all the time.
You can visualize AIXI's computation as maximizing bars displayed on this screen; the agent is unable to connect the bars to any pattern in the environment, they are just there. It wants them to be as high as possible and it will utilize any means at its disposal. For a more detailed analysis of the problems arising through reinforcement learning, see Dewey (2011).
Is there a way to bind the optimization process to actual patterns in the environment? To design a framework in which the screen informs the agent about the patterns it should optimize for? The answer is, yes, we can just define a utility function
that assigns a value to every possible future history
and use it to replace the reward system in the agent specification:
When I say "we can just define" I am actually referring to the really hard question of how to recognize and describe the patterns we value in the universe. Contrasted with the necessity to specify rewards in the original AIXI framework, this is a strictly harder problem, because the utility function has to be known ahead of time and the reward system can always be represented in the framework of utility functions by setting
For the same reasons, this is also a strictly safer approach.
Infinite utility
The original AIXI framework must necessarily place upper and lower bound on the rewards that are achievable, because the rewards are part of the perceptions and is finite. The utility function approach does not have this problem, as the expected utility
is always finite as long as we stick to a finite set of possible perceptions, even if the utility function is not bounded. Relaxing this constraint and allowing to be infinite and the utility to be unbounded creates divergence of expected utility (for a proof see de Blanc 2008). This closely corresponds to the question of how to be a consequentialist in an infinite universe, discussed by Bostrom (2011). The underlying problem here is that (using the standard approach to infinities) these expected utilities will become incomparable. One possible solution to this problem could be to use a larger subfield than
of the surreal numbers, my favorite[2] so far being the Levi-Civita field generated by the infinitesimal
:
with the usual power-series addition and multiplication. Levi-Civita numbers can be written and approximated as
(see Berz 1996), which makes them suitable for representation on a computer using floating point arithmetic. If we allow the range of our utility function to be , we gain the possibility of generalizing the framework to work with an infinite set of possible perceptions, therefore allowing for continuous parameters. We also allow for a much broader set of utility functions, no longer excluding the assignment of infinite (or infinitesimal) utility to a single event. I recently met someone who argued convincingly that his (ideal) utility function assigns infinite negative utility to every time instance that he is not alive, therefore making him prefer life to any finite but huge amount of suffering.
Note that finiteness of is still needed to guarantee the existence of actions with maximal expected utility, and the finite (but dynamic) horizon
remains a very problematic assumption, as described in Legg (2008).
Modifiable utility functions
Any implementable approximation of AIXI implies a weakening of the underlying dualism. Now the agent's hardware is part of the environment and at least in the case of a powerful agent, it can no longer afford to neglect the effect its actions may have on its source code and data. One question that has been asked is whether AIXI can protect itself from harm. Hibbard (2012) shows that an agent similar to the one described above, equipped with the ability to modify its policy responsible for choosing future actions, would not do so, given that it starts out with the (meta-)policy to always use the optimal policy, and the additional constraint to change only if that leads to a strict improvement. Ring and Orseau (2011) study under which circumstances a universal agent would try to tamper with the sensory information it receives. They introduce the concept of a delusion box, a device that filters and distorts the perception data before it is written into the part of the memory that is read during the calculation of utility.
A further complication to take into account is the possibility that the part of memory that contains the utility function may get rewritten, either by accident, by deliberate choice (programmers trying to correct a mistake), or in an attempt to wirehead. To analyze this further we will now consider what can happen if the screen flashes different goals in different time cycles. Let
denote the utility function the agent will have at time k.
Even though we will only analyze instances in which the agent knows at time k, which utility function it will have at future times
(possibly depending on the actions
before that), we note that for every fixed future history
the agent knows the utility function
that is displayed on the screen because the screen is part of its perception data
.
This leads to three different agent models worthy of further investigation:
- Agent 1 will optimize for the goals that are displayed on the screen right now and act as if it would continue to do so in the future. We describe this with the utility function
- Agent 2 will try to anticipate future changes to its utility function and maximize the utility it experiences at every time cycle as shown on the screen at that time. This is captured by
- Agent 3 will, at time k, try to maximize the utility it derives in hindsight, displayed on the screen at the time horizon
Of course arbitrary mixtures of these are possible.
The type of wireheading that is of interest here is captured by the Simpleton Gambit described by Orseau and Ring (2011), a Faustian deal that offers the agent maximal utility in exchange for its willingness to be turned into a Simpleton that always takes the same default action at all future times. We will first consider a simplified version of this scenario: The Simpleton future, where the agent knows for certain that it will be turned into a Simpleton at time k+1, no matter what it does in the remaining time cycle. Assume that for all possible action-perception combinations the utility given by the current utility function is not maximal, i.e. holds for all
. Assume further that the agents actions influence the future outcomes, at least from its current perspective. That is, for all
there exist
with
. Let
be the Simpleton utility function, assigning equal but maximal utility
to all possible futures. While Agent 1 will optimize as before, not adapting its behavior to the knowledge that its utility function will change, Agent 3 will be paralyzed, having to rely on whatever method its implementation uses to break ties. Agent 2 on the other hand will try to maximize only the utility
.
Now consider the actual Simpleton Gambit: At time k the agent gets to choose between changing, , resulting in
and
(not changing), leading to
for all
. We assume that
has no further effects on the environment. As before, Agent 1 will optimize for business as usual, whether or not it chooses to change depends entirely on whether the screen specifically mentions the memory pointer to the utility function or not.
Agent 2 will change if and only if the utility of changing compared to not changing according to what the screen currently says is strictly smaller than the comparative advantage of always having maximal utility in the future. That is,
is strictly less than
This seems quite analogous to humans, who sometimes tend to choose maximal bliss over future optimization power, especially if the optimization opportunities are meager anyhow. Many people do seem to choose their goals so as to maximize the happiness felt by achieving them at least some of the time; this is also advice that I have frequently encountered in self-help literature, e.g. here. Agent 3 will definitely change, as it only evaluates situations using its final utility function.
Comparing the three proposed agents, we notice that Agent 1 is dynamically inconsistent: it will optimize for future opportunities, that it predictably will not take later. Agent 3 on the other hand will wirehead whenever possible (and we can reasonably assume that opportunities to do so will exist in even moderately complex environments). This leaves us with Agent model 2 and I invite everyone to point out its flaws.
[1] Dotted actions/ perceptions, like denote past events, underlined perceptions
denote random variables to be observed at future times.
[2] Bostrom (2011) proposes using hyperreal numbers, which rely heavily on the axiom of choice for the ultrafilter to be used and I don't see how those could be implemented.
= 783df68a0f980790206b9ea87794c5b6)
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)