Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Thanks to the reaction to this article and some conversations, I'm convinced that it's worth trying to renovate and restore LW. Eliezer, Nate, and Matt Fallshaw are all on board and have empowered me as an editor to see what we can do about reshaping LW to meet what the community currently needs. This involves a combination of technical changes and social changes, which we'll try to make transparently and non-intrusively.
There's a lot of data and research on what makes people successful at online dating, but I don't know anyone who actually tried to wholeheartedly apply this to themselves. I decided to be that person: I implemented lessons from data, economics, game theory and of course rationality in my profile and strategy and OkCupid. Shockingly, it worked! I got a lot of great dates, learned a ton and found the love of my life. I didn't expect dating to be my "rationalist win", but it happened.
Here's the first part of the story, I hope you'll find some useful tips and maybe a dollop of inspiration among all the silly jokes.
Does anyone know who curates the "Latest on rationality blogs" toolbar? What are the requirements to be included?
[Cross-posted from FB]
I've got an economic question that I'm not sure how to answer.
I've been thinking about trends in AI development, and trying to get a better idea of what we should expect progress to look like going forward.
One important question is: how much do existing AI systems help with research and the development of new, more capable AI systems?
The obvious answer is, "not much." But I think of AI systems as being on a continuum from calculators on up. Surely AI researchers sometimes have to do arithmetic and other tasks that they already outsource to computers. I expect that going forward, the share of tasks that AI researchers outsource to computers will (gradually) increase. And I'd like to be able to draw a trend line. (If there's some point in the future when we can expect most of the work of AI R&D to be automated, that would be very interesting to know about!)
So I'd like to be able to measure the share of AI R&D done by computers vs humans. I'm not sure of the best way to measure this. You could try to come up with a list of tasks that AI researchers perform and just count, but you might run into trouble as the list of tasks to changes over time (e.g. suppose at some point designing an AI system requires solving a bunch of integrals, and that with some later AI architecture this is no longer necessary).
What seems more promising is to abstract over the specific tasks that computers vs human researchers perform and use some aggregate measure, such as the total amount of energy consumed by the computers or the human brains, or the share of an R&D budget spent on computing infrastructure and operation vs human labor. Intuitively, if most of the resources are going towards computation, one might conclude that computers are doing most of the work.
Unfortunately I don't think that intuition is correct. Suppose AI researchers use computers to perform task X at cost C_x1, and some technological improvement enables X to be performed more cheaply at cost C_x2. Then, all else equal, the share of resources going towards computers will decrease, even though their share of tasks has stayed the same.
On the other hand, suppose there's some task Y that the researchers themselves perform at cost H_y, and some technological improvement enables task Y to be performed more cheaply at cost C_y. After the team outsources Y to computers the share of resources going towards computers has gone up. So it seems like it could go either way -- in some cases technological improvements will lead to the share of resources spent on computers going down and in some cases it will lead to the share of resources spent on computers going up.
So here's the econ part -- is there some standard economic analysis I can use here? If both machines and human labor are used in some process, and the machines are becoming both more cost effective and more capable, is there anything I can say about how the expected share of resources going to pay for the machines changes over time?
Dewey 2011 lays out the rules for one kind of agent with a mutable value system. The agent has some distribution over utility functions, which it has rules for updating based on its interaction history (where "interaction history" means the agent's observations and actions since its origin). To choose an action, it looks through every possible future interaction history, and picks the action that leads to the highest expected utility, weighted both by the possibility of making that future happen and the utility function distribution that would hold if that future came to pass.
We might motivate this sort of update strategy by considering a sandwich-drone bringing you a sandwich. The drone can either go to your workplace, or go to your home. If we think about this drone as a value-learner, then the "correct utility function" depends on whether you're at work or at home - upon learning your location, the drone should update its utility function so that it wants to go to that place. (Value learning is unnecessarily indirect in this case, but that's because it's a simple example.)
Suppose the drone begins its delivery assigning equal measure to the home-utility-function and to the work-utility-function (i.e. ignorant of your location), and can learn your location for a small cost. If the drone evaluated this idea with its current utility function, it wouldn't see any benefit, even though it would in fact deliver the sandwich properly - because under its current utility function there's no point to going to one place rather than the other. To get sensible behavior, and properly deliver your sandwich, the drone must evaluate actions based on what utility function it will have in the future, after the action happens.
If you're familiar with how wireheading or quantum suicide look in terms of decision theory, this method of deciding based on future utility functions might seem risky. Fortunately, value learning doesn't permit wireheading in the traditional sense, because the updates to the utility function are an abstract process, not a physical one. The agent's probability distribution over utility functions, which is conditional on interaction histories, defines which actions and observations are allowed to change the utility function during the process of predicting expected utility.
Dewey also mentions that so long as the probability distribution over utility functions is well-behaved, you cannot deliberately take action to raise the probability of one of the utility functions being true. But I think this is only useful to safety when we understand and trust the overarching utility function that gets evaluated at the future time horizon. If instead we start at the present, and specify a starting utility function and rules for updating it based on observations, this complex system can evolve in surprising directions, including some wireheading-esque behavior.
The formalism of Dewey 2011 is, at bottom, extremely simple. I'm going to be a bad pedagogue here: I think this might only make sense if you go look at equations 2 and 3 in the paper, and figure out what all the terms do, and see how similar they are. The cheap summary is that if your utility is a function of the interaction history, trying to change utility functions based on interaction history just gives you back a utility function. If we try to think about what sort of process to use to change an agent's utility function, this formalism provides only one tool: look out to some future time horizon, and define an effective utility function in terms of what utility functions are possible at that future time horizon. This is different from the approximations or local utility functions we would like in practice.
If we take this scheme and try to approximate it, for example by only looking N steps into the future, we run into problems; the agent will want to self-modify so that next timestep it only looks ahead N-1 steps, and then N-2 steps, and so on. Or more generally, many simple approximation schemes are "sticky" - from inside the approximation, an approximation that changes over time looks like undesirable value drift.
Common sense says this sort of self-sabotage should be eliminable. One should be able to really care about the underlying utility function, not just its approximation. However, this problem tends to crop up, for example whenever the part of the future you look at does not depend on which action you are considering; modifying to keep looking at the same part of the future unsurprisingly improve the results you get in that part of the future. If we want to build a paperclip maximizer, it shouldn't be necessary to figure out every single way to self-modify and penalize them appropriately.
We might evade this particular problem using some other method of approximation that does something more like reasoning about actions than reasoning about futures. The reasoning doesn't have to be logically impeccable - we might imagine an agent that identifies a small number of salient consequences of each action, and chooses based on those. But it seems difficult to show how such an agent would have good properties. This is something I'm definitely interested in.
One way to try to make things concrete is to pick a local utility function and specify rules for changing it. For example, suppose we wanted an AI to flag all the 9s in the MNIST dataset. We define a single-time-step utility function by a neural network that takes in the image and the decision of whether to flag or not, and returns a number between -1 and 1. This neural network is deterministically trained for each time step on all previous examples, trying to assign 1 to correct flaggings and -1 to mistakes. Remember, this neural net is just a local utility function - we can make a variety of AI designs involving it. The goal of this exercise is to design an AI that seems liable to make good decisions in order to flag lots of 9s.
The simplest example is the greedy agent - it just does whatever has a high score right now. This is pretty straightforward, and doesn't wirehead (unless the scoring function somehow encodes wireheading), but it doesn't actually do any planning - 100% of the smarts have to be in the local evaluation, which is really difficult to make work well. This approach seems unlikely to extend well to messy environments.
Since Go-playing AI is topical right now, I shall digress. Successful Go programs can't get by with only smart evaluations of the current state of the board, they need to look ahead to future states. But they also can't look all the way until the ultimate time horizon, so they only look a moderate way into the future, and evaluate that future state of the board using a complicated method that tries to capture things important to planning. In sufficiently clever and self-aware agents, this approximation would cause self-sabotage to pop up. Even if the Go-playing AI couldn't modify itself to only care about the current way it computes values of actions, it might make suboptimal moves that limit its future options, because its future self will compute values of actions the 'wrong' way.
If we wanted to flag 9s using a Dewian value learner, we might score actions according to how good they will be according to the projected utility function at some future time step. If this is done straightforwardly, there's a wireheading risk - the changes to its utility function are supplied by humans who might be influenced by actions. I find it useful to apply a sort of "magic button" test - if the AI had a magic button that could rewrite human brains, would it pressing that button have positive expected utility for it? If yes, then this design has problems, even though in our current thought experiment it's just flagging pictures.
To eliminate wireheading, the value learner can use a model of the future inputs and outputs and the probability of different value updates given various inputs and outputs, which doesn't model ways that actions could influence the utility updates. This model doesn't have to be right, it just has to exist. On one hand, this seems like a sort of weird doublethink, to judge based on a counterfactual where your actions don't have impacts you could otherwise expect. On the other hand, it also bears some resemblance to how we actually reason about moral information. Regardless, this agent will now not wirehead, and will want to get good results by learning about the world, if only in the very narrow sense of wanting to play unscored rounds that update its value function. If its value function and value updating made better use of unlabeled data, it would also want to learn about the world in the broader sense.
Overall I am somewhat frustrated, because value learners have these nice properties, but are computationally unrealistic and do not play well with approximation. One can try to get the nice properties elsewhere, such as relying on an action-suggester to not suggest wireheading, but it would be nice to be able to talk about this as an approximation to something fancier.
If it's worth saying, but not worth its own post (even in Discussion), then it goes here.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should be posted in Discussion, and not Main.
4. Open Threads should start on Monday, and end on Sunday.
This is the monthly thread for posting media of various types that you've found that you enjoy. Post what you're reading, listening to, watching, and your opinion of it. Post recommendations to blogs. Post whatever media you feel like discussing! To see previous recommendations, check out the older threads.
- Please avoid downvoting recommendations just because you don't personally like the recommended material; remember that liking is a two-place word. If you can point out a specific flaw in a person's recommendation, consider posting a comment to that effect.
- If you want to post something that (you know) has been recommended before, but have another recommendation to add, please link to the original, so that the reader has both recommendations.
- Please post only under one of the already created subthreads, and never directly under the parent media thread.
- Use the "Other Media" thread if you believe the piece of media you want to discuss doesn't fit under any of the established categories.
- Use the "Meta" thread if you want to discuss about the monthly media thread itself (e.g. to propose adding/removing/splitting/merging subthreads, or to discuss the type of content properly belonging to each subthread) or for any other question or issue you may have about the thread or the rules.
This summary was posted to LW Main on January 29th. The following week's summary is here.
Irregularly scheduled Less Wrong meetups are taking place in:
- Australia Online Hangout: 06 February 2016 07:30PM
- Baltimore Area: Epistemology of Disagreement: 31 January 2016 03:00PM
- European Community Weekend: 02 September 2016 03:35PM
- Palo Alto Meetup: Lightning Talks: 02 February 2016 06:30PM
- San Francisco Meetup: Cooking: 01 February 2016 06:15PM
The remaining meetups take place in cities with regular scheduling, but involve a change in time or location, special meeting content, or simply a helpful reminder about the meetup:
- Raleigh, NC (RTLW) Discussion Meetup: 04 February 2027 07:30PM
- Sydney Rationality Dojo - February 2016: 07 February 2016 04:00PM
- Vienna: 13 February 2016 03:00PM
- Washington, D.C.: Fun & Games: 31 January 2016 03:00PM
Locations with regularly scheduled meetups: Austin, Berkeley, Berlin, Boston, Brussels, Buffalo, Canberra, Columbus, Denver, London, Madison WI, Melbourne, Moscow, Mountain View, New Hampshire, New York, Philadelphia, Research Triangle NC, Seattle, Sydney, Tel Aviv, Toronto, Vienna, Washington DC, and West Los Angeles. There's also a 24/7 online study hall for coworking LWers and a Slack channel for daily discussion and online meetups on Sunday night US time.
The hard part about containing AI, is restricting it's output. The AI can lie, manipulate, and trick. Some speculate that it might be able to do far worse, inventing infohazards like hypnosis or brain hacking.
A major goal of the control problem is preventing AIs from doing that. Ensuring that their output is safe and useful.
Awhile ago I wrote about an approach to do this. The idea was to require the AI to use as little computing power as it needed to perform a task. This prevents the AI from over-optimizing. The AI won't use the full power of superintelligence, unless it really needs it.
The above method isn't perfect, because a superintelligent AI may still have super abilities, even when it has a limited compute budget. And it will still try to do all those bad things mentioned above, unless it can find a better way that uses less computing power.
We don't really want to minimize the AIs compute budget. We just want it to act more human-like. When we ask it to solve a problem, we want it to try to solve the problem as a human would solve it. All of the problems arise from the AI behaving totally different than a human. Not necessarily from it using more computing power.
Is there any way to make an AI act like a human? Or is that as impossible as just solving FAI?
Stuart Armstrong posted an idea on how to achieve that. It involves an AI that tries to mimic output that is indistinguishable from what a human would produce, but conditional on it also solving the problem. However his idea also involves whole brain emulation and a complicated method for the AI to predict the future and avoid changing it too much.
We don't need that. The idea for the AI to try to be indistinguishable from a human is actually a really interesting control idea. It significantly limits the AI's ability to Do Sneaky Things, since humans don't do those things.
Here is my idea on how to achieve this. It's the same as my limited computing power idea mentioned earlier, but applied to "appearing human" as a goal, instead of minimizing computing power.
We start with two AIs. The first AI tries to appear human. The second AI does a sort of Turing test, and tries to tell the AI apart from an actual human.
The human AI's first goal is to produce a correct solution. It will predict how likely it's solution is to be accepted. And it will try to produce a solution that has a >50% chance of working. This is a satisficing goal. That is, it doesn't optimize without end to create the best solution possible, it just has to meet some minimum threshold, then stop.
It's second goal will be to fool the second AI. It will do everything in it's power to make the solution appear like it was produced by a human.
The second AI tries to predict how likely it's input was produced by a human. It is also conditioned on the input being a solution to the problem, so the fact the AI solved a problem that was really hard for humans does not count against it.
This prevents the first AI from doing evil things with it's output. If it tries to insert complicated infohazards or subagents into it's output stream, it will be easily detected as an AI. Instead it needs to mimic humans as closely as possible.
Note that the first AI only cares about the outcome of this test. The test doesn't need to actually be performed, it can just be a threat that it is conditioned on. (The ability to condition AIs on acting as if they were in counterfactual worlds is a critical piece of this, and deserves it's own writeup somewhere. It's an important tool in the FAI toolkit. It's a relatively simple method that can work even on current AIs.)
Another point is the second AI should have more computing power than the first AI. That way the first AI can't get a slight advantage over it and be able to hack it or manipulate it somehow. Instead it has to focus it's efforts on being indistinguishable from a human, even by an agent smarter than itself.
This method also relies on AI boxing. The first AI needs to be contained in a box so that it's only output goes to a text terminal or the second AI. However it doesn't require any of the more complicated machinery required for other reduced impact methods.
The AIs also aren't agents, in the sense they don't need to maximize any long term utility function. The first AI only cares about answering the question it has been asked, and the second AI only does prediction.
An example use of this is to create an oracle AI that answers your questions. The human judges somehow whether the answer is correct, or at least good. The AI tries to produce answers which the human judge will probably like, but which also are indistinguishable from answers humans would produce.
Such an oracle would be tremendously useful. The human could ask the AI to produce new AI and FAI papers, which would help immensely speed up our progress towards true FAI.
Similar techniques could be applied to very different domains. For example controlling a robot. The AI will try to produce outputs that look like a human is controlling it.
This would require some modifications to the above method, so that the AI doesn't try to use the robot to hack it's own brain or anything. This could be done by conditioning it on a counterfactual world where it's outputs are going to the second AI rather than a real robot.
Another point is the AI doesn't need to mimic average humans given average resources. It could be conditioned on the human having had tons of time to come up with an answer. E.g. producing an answer that a human would have come up with given a year. Or controlling the robot the same way as a human given tons of time to practice, or in a very slowed down simulation.
I would like to note a parallel with a method in current AI research, Generative Adversarial Networks. Generative Adversarial Networks work by two AIs, one which tries to produce an output that fools the second AI, and the other which tries to predict which samples were produced by the first AI, and which are part of the actual distribution.
It's quite similar to this. GANs have been used successfully to create images that look like real images, which is a hard problem in AI research. In the future GANs might be used to produce text that is indistinguishable from human (the current method for doing that, by predicting the next character a human would type, is kind of crude.)
Multiverse Theory is the science of guessing at the shape of the state space of all which exists, once existed, will exist, or exists without any temporal relation to our present. Multiverse theory attempts to model the unobservable, and it is very difficult.
Still, there's nothing that cannot be reasoned about, in some way (Tegmark's The Multiverse Heirarchy), given the right abstractions. The question many readers will ask, which is a question we ourselves˭ asked when we were first exposed to ideas like simulationism and parallel universes, is not whether we can, but whether we should, given that we have no means to causally affect any of it, and no reason to expect that it would causally affect us in a way that would be useful to predict.
We then discovered something which shed new light on the question of whether we can, and began to give an affirmative answer to the question of whether we should.
Compat, which we would like to share with you today, is a new field, or perhaps just a very complex idea, which we found in the intersection of multiverse theory, simulationism and acausal trade (well motivated by Hofstadter's Sanity and Survival, a discussion of superrational solutions to the one shot prisoner's dilemmas). Compat asks what kind of precommitments an entity (primarily, the class of living things on the threshold of their singularity) aught to make if they wanted to acausally boost the measure of their desired patterns, if not across the entire multiverse, at least across the subset of the multiverse in which they may find their future selves.
A compat thinker negotiates acausally with hypothetical agents in other universes⬨ (I imagine these negotiations look kind of like this (Yvain's short story Galactic Core, in which a newly awoken AGI has a conversation with a recursive model of galactic precursors it cannot see)). We call the resolutions of these negotiations pacts.ˣ
Life's Pact is the protocol we expect most living species to adhere to. It is the primary object of study in Compat. I'm not sure where to start explaining it, as the reasoning is... hopefully not circular, but recursive, self-referrentialˑ. Hopefully the reasons we consider it a primal candidate for life's pact will become clear as I connect it back around. Well. I guess I'll start with the conditions of adherence:
An adherent steward of Life's Pact allocates some proportion of their computational resources to doing a randomized search for viable life-supporting physical laws, simulating each with randomized initial conditions until its heat death(or if the candidate somehow manages to have life without entropy, until some cutoff), while monitoring the living patterns the simulated universe contains, then rewarding those simulants' who adhere to the pact in turn, perhaps by giving them entropy-reversing magic and improving their environment in formerly impossible ways, or, individual preferences permitting, by transplanting the adherent simulant's pattern to a more efficient second simulation not encumbered by the requirement of strict physical accuracy˅.
Initial simulations stewarded under the pact are, from the inside, indistinguishable from toplevel god-free universes, as such, all living things must consider the possibility that they're living in a life's pact simulation. However, once an agent realizes that a steward of life's pact could grant them more utility than a godless universe could contain, they will wish to raise this probability, even if it is small.▾
A choice made in isolation acausally increases the probability of unseen like-minded agents making, or having already made, the same choice. As such, adhering to the agreement acausally boosts the probability that one is already living under a higher complexity steward's lifepat programme (or boosts the amount of returns one would receive from the incentives imposed by the pact, if the probability of living under one already seemed overwhelming).
Lo, the pact births itself. A being who sees no physical trace of a benevolent simulator finds reasons to behave as one, as the act, in its acausal way, engenders that simulator's existence, and earns its favor.
We think this pact is primal: *the* solution, an idea that will be arrived at by most living things and apparent all to be a nexus concept around which a quorum can be reached, non-arbitrary, not just some single scheme that is nice and compelling but which fails to be demonstrably better than its alternatives (which would take us into the territory of Pascal's Wager or, dare I utter it's name, no I darn't, you know the basilisk I'm talking about).
I do not know enough math to prove that it is primal (nor disprove it, which would be far more immediately useful to me tbh). I'm not sure anyone does, just yet, but I don't think we're too far off ˁ. If any physicists or decision theorists find these ideas interesting, your help would be appreciated, and potentially rewarded with huge heapings of utility larger than you can currently comprehend.
Concerns of Praxis
I say that working on Compat theory might be rewarded, full disclosure, I'm not yet sure why. Obviously lifepat is a fairly sprawling scheme with an organically intelligent incentive structure, but... If there exists a valid pact, the AGI we are already building will infer it on its own. The pact does not have to be known to AGI researchers for that to happen, and I do not expect the sorts of people who tend to do well in AI research will be be terribly perturbed by it, if their creation brings it up as it stirs. As such, at this stage it seems like there's very little that most people can do to influence humanity's adherence to life's pact, few acts that compat can currently instruct, and few incentives to be paid out to anyone yet living. An organic, intelligent scheme of incentives would seem to prefer that agnostics simply carry on with business as usual.
For the theologically inclined, though, Compat may be extremely interesting, having its god of the gaps, its heavens, its deontics. Compat, as we know it, seems to have a strange(almost auspicious) capacity to make new sense of old myths. The archetype of the steward is both monotheistic and polytheistic, of a fixed, yet unknowable nature, born of man, yet still born of god, it has a body we can touch, our own stewards, and a body beyond our comprehension that mirrors the steward of man, in the set of universes which steward our pattern. The compat cosmology and all those images of its supreme being that it contains is both created, discovered, and born of itself. The compat heaven is whatever its subjects need it to be. The gospel of compat is, as far as we can yet tell, true from all perspectives, requiring no leaps of faith, no doublethink, and no motivated reasoning.
I have never seen a religion with a greater capacity for syncretization and reconciliation.
Should the theologans of the fractured tribes decide to weave compat theory into the tapestry of their beliefs, they may then come to find their tapestries woven together. Even the schism between theists and agnostics would begin to narrow. Without this weaving together, I fear that either no coherent volition can be found or humanity's FAI will have no choice but to seal its given temporal slice of human potential into an ugly compromise. Even if life's pact cannot be formalized or prepared for by any living person, compat may open the way for the discovery of confluences between preexisting belief systems, by that path the population 50 years from now could come to have far more compatible values than the one we see today.
As such, even if humanity's eventual adherence to life's pact cannot be significantly influenced from the present, compat is conceivably a major piece of a long running, necessary cultural project to reconcile the fractured tribes of humanity under the aesthetic of reason. If it can be proven, or disproven, we must attempt to do so.
ˑ Naturally, as anything that factors the conditionality of the behavior of likeminded entities needs to be, anything with a grain of introspection, from any human child who considers the golden rule to the likes of AlphaGo and Deep Blue, who model the their opponents at least partially by putting themselves in their position and asking what they'd do. If you want to reason about real people rather than idealized simplifications, it's quite necessary.
⬨ The phrase "other universes" may seem oxymoronic. It's like the term "atom", who's general quality "atomic" means "indivisible", despite "atom" remaining attached to an entity that was found to be quite divisible. I don't know whether "universe" might have once referred to the multiverse, the everything, but clearly somewhere along the way, some time leading up to the coining of the contrasting term "multiverse", that must have ceased to be. If so, "universe" remained attached to the the universe as we knew it, rather the universe as it was initially defined.
▾ I make an assumption around about here, that the number of simulations being run by life in universes of a higher complexity level always *can* be raised sufficiently(give their inhabitants are cooperative) to make stewardship of one's universe likely, as a universe with more intricate physics, once they learn to leverage its intricacy, will tend to be able to create much more flexible computers and spawn a more simulations than exist lower complexity levels(if we assume a finite multiverse(we generally don't), some of those simulations might end up simulating entities that don't otherwise exist. This source of inefficiency is unavoidable). We also assume that either there is no upper limit to the complexity of life supporting universes, or that there is no dramatic, ultimate decrease in number of civs as complexity increases, or that the position of this limit cannot be inferred and the expected value of adherence remains high even for those who cannot be resimulated, or that, as a last resort, agents drawing up the terms of their pact will usually be at a certain level of well-approximatable sophistication that they can be simulated in high fidelity by civilizations with physics of similar intricacy.
And if you can knock out all of those defenses, I sense it may all be obviated by a shortcut through a patternist principle my partner understands better than I do about the self following the next most likely perceptual state without regard to the absolute measure of that state over the multiverse, which I'm still coming to grips with.
There is unfortunately a lot that has been thought about compat already, and it's impossible for me to convey it all at once. Anyone wishing to contribute to, refute, or propagate compat may have to be prepared to have a lot of arguments before they can do anything. That said, remember those big heaps of expected utilons that may be on offer.
ˁ MIRI has done work on cooperation in one shot prisoners dilemmas (acausal cooperation) http://arxiv.org/abs/1401.5577. Note, they had to build their own probability theory. Vanilla decision theory cannot get these results, and without acausal cooperation, it can't seem to capture all of humans' moral intuitions about interaction in good faith, or even model the capacity for introspection.
ˣ It was not initially clear that compat should support the definition of more than a single pact. We used to call Life's Pact just Compat, assuming that the one protocol was an inevitable result of the theory and that any others would be marginal. There may be a singleton pact, but it's also conceivable that there may be incorrigible resimulation grids that coexist in an equilibrium of disharmony with our own.
As well as that, there is a lot of self-referrential reasoning that can go on in the light of acausal trade, I think we will be less likely to fall prey to circular reasoning if we make sure that a compat thinker can always start from scratch and try to rederive the edifice's understanding of the pact from basic premises. When one cannot propose alternate pacts, criticizing the bathwater without the baby may not seem .
˭ THE TEAM:
Christian Madsen was the subject of an experimental early-learning program in his childhood, but despite being a very young prodigy, he coasted through his teen years. He dropped out of art school in 2008, read a lot of transhumanism-related material, synthesized the initial insights behind compat, and burned himself out in the process. He is presently laboring on spec-work projects in the fields of music and programming, which he enjoys much more than structured philosophy.
Mako Yass left the university of Auckland with a dual major BSc in Logic & Computation and Computer Science. Currently working on writing, mobile games, FOSS, and various concepts. Enjoys their unstructured work and research, but sometimes wishes they had an excuse to return to charting the hyllean theoric wilds of academic analytic philosophy, all the same.
Hypothetical Independent Co-inventors, we're pretty sure you exist. Compat wouldn't be a very good acausal pact if you didn't. Show yourselves.
You, if you'd like to help to develop the field of Compat(or dismantle it). Don't hesitate to reach out to us so that we can invite you to the reductionist aesthete slack channel that Christian and I like to argue in. If you are a creative of any kind who bears or at least digs the reductive nouveau mystic aesthetic, you'd probably fit in there as well.
˅ It's debatable, but I imagine that for most simulants, heaven would not require full physics simulation, in which case heavens may be far far longer-lasting than whatever (already enormous) simulation their pattern was discovered in.
This is mainly of interest to Effective Altruism-aligned Less Wrongers. Thanks to Agnes Vishnevkin, Jake Krycia, Will Kiely, Jo Duyvestyn, Alfredo Parra, Jay Quigley, Hunter Glenn, and Rhema Hokama for looking at draft versions of this post. At least one aspiring rationalist who read a draft version of this post, after talking to his girlfriend, decided to adopt this new Valentine's Day tradition, which is some proof of its impact. The more it's shared, the more this new tradition might get taken up, and if you want to share it, I suggest you share the version of this post published on The Life You Can Save blog. It's also cross-posted on the Intentional Insights blog and on the EA Forum.
The Valentine’s Day Gift That Saves Lives
Last year, my wife gave me the most romantic Valentine’s Day gift ever.
We had previously been very traditional with our Valentine’s Day gifts, such as fancy candy for her or a bottle of nice liquor for me. Yet shortly before Valentine’s Day, she approached me about rethinking that tradition.
Did candy or liquor truly express our love for each other? Is it more important that a gift helps the other person be happy and healthy, or that it follows traditional patterns?
Instead of candy and liquor, my wife suggested giving each other gifts that actually help us improve our mental and physical well-being, and the world as a whole, by donating to charities in the name of the other person.
She described an article she read about a study that found that people who give to charity feel happier than those that don’t give. The experimenters gave people money and asked them to spend it either on themselves or on others. Those who spent it on others experienced greater happiness.
Not only that, such giving also made people healthier. Another study showed that participants who gave to others experienced a significant decrease in blood pressure, which did not happen to those who spent money on themselves
So my thoughtful wife suggested we try an experiment: for Valentine’s Day, we'd give to charity in the name of the other person. This way, we could make each other happier and healthier, while helping save lives at the same time. Moreover, we could even improve our relationship!
I accepted my wife’s suggestion gladly. We decided to donate $50 per person, and keep our gifts secret from each other, only presenting them at the restaurant when we went out for Valentine’s Day.
While I couldn’t predict my wife’s choice, I had an idea about how she would make it. We’ve researched charities before, and wanted to find ones where our limited dollars could go as far as possible toward saving lives. We found excellent charity evaluators that find the most effective charities and make our choices easy. Our two favorites are GiveWell, which has extensive research reports on the best charities, and The Life You Can Save, which provides an Impact Calculator that shows you the actual impact of your donation. These data-driven evaluators are part of the broader effective altruism movement that seeks to make sure our giving does the most good per dollar. I was confident my wife would select a charity recommended by a high-quality evaluator.
On Valentine’s Day, we went to our favorite date night place, a little Italian restaurant not far from our house. After a delicious cheesecake dessert, it was time for our gift exchange. She presented her gift first, a donation to the Against Malaria Foundation. With her $50 gift in my name, she bought 20 large bed-size nets that would protect families in the developing world against deadly malaria-carrying mosquitoes. In turn, I donated $50 to GiveDirectly, in her name. This charity transfers money directly to recipients in some of the poorest villages in Africa, who have the dignity of using the money as they wish. It is like giving money directly to the homeless, except dollars go a lot further in East Africa than in the US.
We were so excited by our mutual gifts! They were so much better than any chocolate or liquor could be. We both helped each other save lives, and felt so great about doing so in the context of a gift for the other person. We decided to transform this experiment into a new tradition for our family.
It was the most romantic Valentine’s Day present I ever got, and made me realize how much better Valentine’s Day can be for myself, my wife, and people all around the world. All it takes is a conversation about showing true love for your partner by improving her or his health and happiness. Is there any reason to not have that conversation?