Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.
Rationalists like to live in group houses. We are also as a subculture moving more and more into a child-having phase of our lives. These things don't cooperate super well - I live in a four bedroom house because we like having roommates and guests, but if we have three kids and don't make them share we will in a few years have no spare rooms at all. This is frustrating in part because amenable roommates are incredibly useful as alloparents if you value things like "going to the bathroom unaccompanied" and "eating food without being screamed at", neither of which are reasonable "get a friend to drive for ten minutes to spell me" situations. Meanwhile there are also people we like living around who don't want to cohabit with a small child, which is completely reasonable, small children are not for everyone.
For this and other complaints ("househunting sucks", "I can't drive and need private space but want friends accessible", whatever) the ideal solution seems to be somewhere along the spectrum between "a street with a lot of rationalists living on it" (no rationalist-friendly entity controls all those houses and it's easy for minor fluctuations to wreck the intentional community thing) and "a dorm" (sorta hard to get access to those once you're out of college, usually not enough kitchens or space for adult life). There's a name for a thing halfway between those, at least in German - "baugruppe" - buuuuut this would require community or sympathetic-individual control of a space and the money to convert it if it's not already baugruppe-shaped.
Maybe if I complain about this in public a millionaire will step forward or we'll be able to come up with a coherent enough vision to crowdfund it or something. I think there is easily enough demand for a couple of ten-to-twenty-adult baugruppen (one in the east bay and one in the south bay) or even more/larger, if the structures materialized. Here are some bulleted lists.
- Units that it is really easy for people to communicate across and flow between during the day - to my mind this would be ideally to the point where a family who had more kids than fit in their unit could move the older ones into a kid unit with some friends for permanent sleepover, but still easily supervise them. The units can be smaller and more modular the more this desideratum is accomplished.
- A pricing structure such that the gamut of rationalist financial situations (including but not limited to rent-payment-constraining things like "impoverished app academy student", "frugal Google engineer effective altruist", "NEET with a Patreon", "CfAR staffperson", "not-even-ramen-profitable entrepreneur", etc.) could live there. One thing I really like about my house is that Spouse can pay for it himself and would by default anyway, and we can evaluate roommates solely on their charming company (or contribution to childcare) even if their financial situation is "no". However, this does require some serious participation from people whose financial situation is "yes" and a way to balance the two so arbitrary numbers of charity cases don't bankrupt the project.
- Variance in amenities suited to a mix of Soylent-eating restaurant-going takeout-ordering folks who only need a fridge and a microwave and maybe a dishwasher, and neighbors who are not that, ideally such that it's easy for the latter to feed neighbors as convenient.
- Some arrangement to get repairs done, ideally some compromise between "you can't do anything to your living space, even paint your bedroom, because you don't own the place and the landlord doesn't trust you" and "you have to personally know how to fix a toilet".
- I bet if this were pulled off at all it would be pretty easy to have car-sharing bundled in, like in Benton House That Was which had several people's personal cars more or less borrowable at will. (Benton House That Was may be considered a sort of proof of concept of "20 rationalists living together" but I am imagining fewer bunk beds in the baugruppe.) Other things that could be shared include longish-term storage and irregularly used appliances.
- Dispute resolution plans and resident- and guest-vetting plans which thread the needle between "have to ask a dozen people before you let your brother crash on the couch, let alone a guest unit" and "cannot expel missing stairs". I think there are some rationalist community Facebook groups that have medium-trust networks of the right caution level and experiment with ways to maintain them.
- Bikeshedding. Not that it isn't reasonable to bikeshed a little about a would-be permanent community edifice that you can't benefit from or won't benefit from much unless it has X trait - I sympathize with this entirely - but too much from too many corners means no baugruppen go up at all even if everything goes well, and that's already dicey enough, so please think hard on how necessary it is for the place to be blue or whatever.
- Location. The only really viable place to do this for rationalist population critical mass is the Bay Area, which has, uh, problems, with new construction. Existing structures are likely to be unsuited to the project both architecturally and zoningwise, although I would not be wholly pessimistic about one of those little two-story hotels with rooms that open to the outdoors or something like that.
- Principal-agent problems. I do not know how to build a dormpartment building and probably neither do you.
- Community norm development with buy-in and a good match for typical conscientiousness levels even though we are rules-lawyery contrarians.
Please share this wherever rationalists may be looking; it's definitely the sort of thing better done with more eyes on it.
In the deep dark lurks of the internet, several proactive lesswrong and diaspora leaders have been meeting each day. If we could have cloaks and silly hats; we would.
We have been discussing the great diversification, and noticed some major hubs starting to pop up. The ones that have been working together include:
- Lesswrong slack
- SlateStarCodex Discord
- Reddit/Rational Discord
- Lesswrong Discord
- Exegesis (unofficial rationalist tumblr)
The ones that we hope to bring together in the future include (on the willingness of those servers):
- Lesswrong IRC (led by Gwern)
- Slate Star Codex IRC
- AGI slack
- Transhumanism Discord
- Artificial Intelligence Discord
How will this work?
About a year ago, the lesswrong slack tried to bridge across to the lesswrong IRC. That was bad. From that experience we learnt a lot that can go wrong, and have worked out how to avoid those mistakes. So here is the general setup.
Each server currently has it's own set of channels, each with their own style of talking and addressing problems, and sharing details and engaging with each other. We definitely don't want to do anything that will harm those existing cultures. In light of this, taking the main channel from one server and mashing it into the main channel of another server is going to reincarnate into HELL ON EARTH. and generally leave both sides with the sentiment that "<the other side> is wrecking up <our> beautiful paradise". Some servers may have a low volume buzz at all times, other servers may become active for bursts, it's not good to try to marry those things.
I am in <exegesis, D/LW, R/R, SSC> what does this mean?
If you want to peek into the lesswrong slack and see what happens in their #open channel. You can join or unmute your respective channel and listen in, or contribute (two way relay) to their chat. Obviously if everyone does this at once we end up spamming the other chat and probably after a week we cut the bridge off because it didn't work. So while it's favourable to increase the community; be mindful of what goes on across the divide and try not to anger our friends.
I am in Lesswrong-Slack, what does this mean?
We have new friends! Posts in #open will be relayed to all 4 children rooms where others can contribute if they choose. Mostly they have their own servers to chat on, and if they are not on an info-diet already, then maybe they should be. We don't anticipate invasion or noise.
Why do they get to see our server and we don't get to see them?
So glad you asked - we do. There is an identical set up for their server into our bridge channels. in fact the whole diagram looks something like this:
Pretty right? No it's not. But that's in the backend.
For extra clarification, the rows are the channels that are linked. Which is to say that Discord-SSC, is linked to a child channel in each of the other servers. The last thing we want to do is impact this existing channels in a negative way.
But what if we don't want to share our open and we just want to see the other side's open? (/our talk is private, what about confidential and security?)
Oh you mean like the prisoners dilemma? Where you can defect (not share) and still be rewarded (get to see other servers). Yea it's a problem. Tends to be when one group defects, that others also defect. There is a chance that the bridge doesn't work. That this all slides, and we do spam each other, and we end up giving up on the whole project. If it weren't worth taking the risk we wouldn't have tried.
We have not rushed into this bridge thing, we have been talking about it calmly and slowly and patiently for what seems like forever. We are all excited to be taking a leap, and keen to see it take off.
Yes, security is a valid concern, walled gardens being bridged into is a valid concern, we are trying our best. We are just as hesitant as you, and being very careful about the process. We want to get it right.
So if I am in <server1> and I want to talk to <server3> I can just post in the <bridge-to-server2> room and have the message relayed around to server 3 right?
Whilst that is correct, please don't do that. You wouldn't like people relaying through your main to talk to other people. Also it's pretty silly, you can just post in your <servers1> main and let other people see it if they want to.
This seems complicated, why not just have one room where everyone can go and hang out?
- How do you think we ended up with so many separate rooms
- Why don't we all just leave <your-favourite server> and go to <that other server>? It's not going to happen
Why don't all you kids get off my lawn and stay in your own damn servers?
Thank's grandpa. No one is coming to invade, we all have our own servers and stuff to do, we don't NEED to be on your lawn, but sometimes it's nice to know we have friends.
<server2> shitposted our server, what do we do now?
This is why we have mods, why we have mute and why we have ban. It might happen but here's a deal; don't shit on other people and they won't shit on you. Also if asked nicely to leave people alone, please leave people alone. Remember anyone can tap out of any discussion at any time.
I need a picture to understand all this.
Great! Friends on exegesis made one for us.
Who are our new friends:
Lesswrong slack has been active since 2015, and has a core community. The slack has 50 channels for various conversations on specific topics, the #open channel is for general topics and has all kinds of interesting discoveries shared here.
Discord-Exegesis (private, entry via tumblr)
Exegesis is a discord set up by a tumblr rationalist for all his friends (not just rats). It took off so well and became such a hive in such a short time that it's now a regular hub.
Following Exegesis's growth, a discord was set up for lesswrong, it's not as active yet, but has the advantage of a low barrier to entry and it's filled with lesswrongers.
Scott posted a link on an open thread to the SSC discord and now it holds activity from users that hail from the SSC comment section. it probably has more conversation about politics than other servers but also has every topic relevant to his subscribers.
reddit rational discord grew from the rationality and rational fiction subreddit, it's quite busy and covers all topics.
As at the publishing of this post; the bridge is not live, but will go live when we flip the switch.
Meta: this took 1 hour to write (actualy time writing) and half way through I had to stop and have a voice conference about it to the channels we were bridging.
Cross posted to lesswrong: http://lesswrong.com/lw/oqz
In this post, I'll argue that Joyce's equilibrium CDT (eCDT) can be made into FDT (functional decision theory) with the addition of an intermediate step - a step that should have no causal consequences. This would show that eCDT is unstable under causally irrelevant changes, and is in fact a partial version of FDT.
Joyce's principle is:
Full Information. You should act on your time-t utility assessments only if those assessments are based on beliefs that incorporate all the evidence that is both freely available to you at t and relevant to the question about what your acts are likely to cause.
When confronted by a problem with a predictor (such as Death in Damascus or the Newcomb problem), this allows eCDT to recursively update their probabilities of the behaviour of the predictor, based on their own estimates of their own actions, until this process reaches equilibrium. This allows it to behave like FDT/UDT/TDT on some (but not all) problems. I'll argue that you can modify the setup to make eCDT into a full FDT.
Death in Damascus
In this problem, Death has predicted whether the agent will stay in Damascus (S) tomorrow, or flee to Aleppo (F). And Death has promised to be in the same city as the agent (D or A), to kill them. Having made its prediction, Death then travels to that city to wait for the agent. Death is known to be a perfect predictor, and the agent values survival at $1000, while fleeing costs $1.
Then eCDT fleeing to Aleppo with probability 999/2000. To check this, let x be the probability of fleeing to Aleppo (F), and y the probability of Death being there (A). The expected utility is then
- 1000(x(1-y)+(1-x)y)-x (1)
Differentiating this with respect to x gives 999-2000y, which is zero for y=999/2000. Since Death is a perfect predictor, y=x and eCDT's expected utility is 499.5.
The true expected utility, however, is -999/2000, since Death will get the agent anyway, and the only cost is the trip to Aleppo.
The eCDT decision process seems rather peculiar. It seems to allow updating of the value of y dependent on the value of x - hence allow acausal factors to be considered - but only in a narrow way. Specifically, it requires that the probability of F and A be equal, but that those two events remain independent. And it then differentiates utility according to the probability of F only, leaving that of A fixed. So, in a sense, x correlates with y, but small changes in x don't correlate with small changes in y.
That's somewhat unsatisfactory, so consider the problem now with an extra step. The eCDT agent no longer considers whether to stay or flee; instead, it outputs X, a value between 0 and 1. There is a uniform random process Z, also valued between 0 and 1. If Z<X, then the agent flees to Aleppo; if not, it stays in Damascus.
This seems identical to the original setup, for the agent. Instead of outputting a decision as to whether to flee or stay, it outputs the probability of fleeing. This has moved the randomness in the agent's decision from inside the agent to outside it, but this shouldn't make any causal difference, because the agent knows the distribution of Z.
Death remains a perfect predictor, which means that it can predict X and Z, and will move to Aleppo if and only if Z<X.
Now let the eCDT agent consider outputting X=x for some x. In that case, it updates its opinion of Death's behaviour, expecting that Death will be in Aleppo if and only if Z<x. Then it can calculate the expected utility of setting X=x, which is simply 0 (Death will always find the agent) minus x (the expected cost of fleeing to Aleppo), hence -x. Among the "pure" strategies, X=0 is clearly the best.
Now let's consider mixed strategies, where the eCDT agent can consider a distribution PX over values of X (this is a sort of second order randomness, since X and Z already give randomness over the decision to move to Aleppo). If we wanted the agent to remain consistent with the previous version, the agent then models Death as sampling from PX, independently of the agent. The probability of fleeing is just the expectation of PX; but the higher the variance of PX, the harder it is for Death to predict where the agent will go. The best option is as before: PX will set X=0 with probability 1001/2000, and X=1 with probability 999/2000.
But is this a fair way of estimating mixed strategies?
Average Death in Aleppo
Consider a weaker form of Death, Average Death. Average Death cannot predict X, but can predict PX, and will use that to determine its location, sampling independently from it. Then, from eCDT's perspective, the mixed-strategy behaviour described above is the correct way of dealing with Average Death.
But that means that the agent above is incapable of distinguishing between Death and Average Death. Joyce argues strongly for considering all the relevant information, and the distinction between Death and Average Death is relevant. Thus it seems when considering mixed strategies, the eCDT agent must instead look at the pure strategies, compute their value (-x in this case) and then look at the distribution over them.
One might object that this is no longer causal, but the whole equilibrium approach undermines the strictly causal aspect anyway. It feels daft to be allowed to update on Average Death predicting PX, but not on Death predicting X. Especially since moving from PX to X is simply some random process Z' that samples from the distribution PX. So Death is allowed to predict PX (which depends on the agent's reasoning) but not Z'. It's worse than that, in fact: Death can predict PX and Z', and the agent can know this, but the agent isn't allowed to make use of this knowledge.
Given all that, it seems that in this situation, the eCDT agent must be able to compute the mixed strategies correctly and realise (like FDT) that staying in Damascus (X=0 with certainty) is the right decision.
Let's recurse again, like we did last summer
This deals with Death, but not with Average Death. Ironically, the "X=0 with probability 1001/2000..." solution is not the correct solution for Average Death. To get that, we need to take equation (1), set x=y first, and then differentiate with respect to x. This gives x=1999/4000, so setting "X=0 with probability 2001/4000 and X=1 with probability 1999/4000" is actually the FDT solution for Average Death.
And we can make the eCDT agent reach that. Simply recurse to the next level, and have the agent choose PX directly, via a distribution PPX over possible PX.
But these towers of recursion are clunky and unnecessary. It's simpler to state that eCDT is unstable under recursion, and that it's a partial version of FDT.
Could utility functions be for narrow AI only, and downright antithetical to AGI? That's a quite fundamental question and I'm kind of afraid there's an obvious answer that I'm just too uninformed to know about. But I did give this some thought and I can't find the fault in the following argument, so maybe you can?
Eliezer Yudkowsky says that when AGI exists, it will have a utility function. For a long time I didn't understand why, but he gives an explanation in AI Alignment: Why It's Hard, and Where to Start. You can look it up there, but the gist of the argument I got from it is:
- (explicit) If an agent's decisions are incoherent, the agent is behaving foolishly.
- Example 1: If an agent's preferences aren't ordered, the agent prefers A to B, B to C but also C to A, it behaves foolishly.
- Example 2: If an agent allocates resources incoherently, it behaves foolishly.
- Example 3: If an agent's preferences depend on the probability of the choice even having to be made, it behaves foolishly.
- Example 1: If an agent's preferences aren't ordered, the agent prefers A to B, B to C but also C to A, it behaves foolishly.
- (implicit) An AGI shouldn't behave foolishly, so its decisions have to be coherent.
- (explicit) Making coherent decisions is the same thing as having a utility function.
I accept that if all of these were true, AGI should have a utility function. I also accept points 1 and 3. I doubt point 2.
Before I get to why, I should state my suspicion why discussions of AGI really focus on utility functions so much. Utility functions are fundamental to many problems of narrow AI. If you're trying to win a game, or to provide a service using scarce computational resources, a well-designed utility function is exactly what you need. Utility functions are essential in narrow AI, so it seems reasonable to assume they should be essential in AGI because... we don't know what AGI will look like but it sounds similar to narrow AI, right?
So that's my motivation. I hope to point out that maybe we're confused about AGI because we took a wrong turn way back when we decided it should have a utility function. But I'm aware it is more likely I'm just too dumb to see the wisdom of that decision.
The reasons for my doubt are the following.
- Humans don't have a utility function and make very incoherent decisions. Humans are also the most intelligent organisms on the planet. In fact, it seems to me that the less intelligent an organism is, the easier its behavior can be approximated with model that has a utility function!
- Apes behave more coherently than humans. They have a far smaller range of behaviors. They switch between them relatively predictably. They do have culture - one troop of chimps will fish for termites using a twig, while another will do something like a rain dance - but their cultural specifics number in the dozens, while those of humans are innumerable.
- Cats behave more coherently than apes. There are shy cats and bold ones, playful ones and lazy ones, but once you know a cat, you can predict fairly precisely what kind of thing it is going to do on a random day.
- Earthworms behave more coherently than cats. There aren't playful earthworms and lazy ones, they basically all follow the nutrients that they sense around them and occasionally mate.
- And single-celled organisms are so coherent we think we can even model them them entirely on standard computing hardware. Which, if it succeeds, means we actually know e.coli's utility function to the last decimal point.
- Apes behave more coherently than humans. They have a far smaller range of behaviors. They switch between them relatively predictably. They do have culture - one troop of chimps will fish for termites using a twig, while another will do something like a rain dance - but their cultural specifics number in the dozens, while those of humans are innumerable.
- The randomness of human decisions seems essential to human success (on top of other essentials such as speech and cooking). Humans seem to have a knack for sacrificing precious lifetime for fool's errands that very occasionally create benefit for the entire species.
A few occasions where such fool's errands happen to work out will later look like the most intelligent things people ever did - after hindsight bias kicks in. Before Einstein revolutionized physics, he was not obviously more sane than those contemporaries of his who spent their lives doing earnest work in phrenology and theology.
And many people trying many different things, most of them forgotten and a few seeming really smart in hindsight - that isn't a special case that is only really true for Einstein, it is the typical way humans have randomly stumbled into the innovations that accumulate into our technological superiority. You don't get to epistemology without a bunch of people deciding to spend decades of their lives thinking about why a stick looks bent when it goes through a water surface. You don't settle every little island in the Pacific without a lot of people deciding to go beyond the horizon in a canoe, and most of them dying like the fools that they are. You don't invent rocketry without a mad obsession with finding new ways to kill each other.
- An AI whose behavior is determined by a utility function has a couple of problems that human (or squid or dolphin) intelligence doesn't have, and they seem to be fairly intrinsic to having a utility function in the first place. Namely, the vast majority of possible utility functions lead directly into conflict with all other agents.
To define a utility function is to define a (direction towards a) goal. So a discussion of an AI with one, single, unchanging utility function is a discussion of an AI with one, single, unchanging goal. That isn't just unlike the intelligent organisms we know, it isn't even a failure mode of intelligent organisms we know. The nearest approximations we have are the least intelligent members of our species.
- Two agents with identical utility functions are arguably functionally identical to a single agent that exists in two instances. Two agents with utility functions that are not identical are at best irrelevant to each other and at worst implacable enemies.
This enormously limits the interactions between agents and is again very different from the intelligent organisms we know, which frequently display intelligent behavior in exactly those instances where they interact with each other. We know communicating groups (or "hive minds") are smarter than their members, that's why we have institutions. AIs with utility functions as imagined by e.g. Yudkowsky cannot form these.
They can presumably create copies of themselves instead, which might be as good or even better, but we don't know that, because we don't really understand whatever it is exactly that makes institutions more intelligent than their members. It doesn't seem to be purely multiplied brainpower, because a person thinking for ten hours often doesn't find solutions that ten persons thinking together find in an hour. So if an AGI can multiply its own brainpower, that doesn't necessarily achieve the same result as thinking with others.
Now I'm not proposing an AGI should have nothing like a utility function, or that it couldn't temporarily adopt one. Utility functions are great for evaluating progress towards particular goals. Within well-defined areas of activity (such as playing Chess), even humans can temporarily behave as if they had utility functions, and I don't see why AGI shouldn't.
I'm also not saying that something like a paperclip maximizer couldn't be built, or that it could be stopped once underway. The AI alignment problem remains real.
I do contend that the paperclip maximizer wouldn't be an AGI, it would be narrow AI. It would have a goal, it would work towards it, but it would lack what we look for when we look for AGI. And whatever that is, I propose we don't find it within the space of things that can be described with (single, unchanging) utility functions.
And there are other places we could look. Maybe some of it is in whatever it is exactly that makes institutions more intelligent than their members. Maybe some of it is in why organisms (especially learning ones) play - playfulness and intelligence seem correlated, and playfulness has that incoherence that may be protective against paperclip-maximizer-like failure modes. I don't know.
Attractor Theory: A Model of Minds and Motivation
[Epistemic status: Moderately strong. Attractor Theory is a model based on the well-researched concept of time-inconsistent preferences combined with anecdotal evidence that extends the theory to how actions affect our preferences in general. See the Caveats at the end for a longer discussion on what this model is and isn’t.]
<Cross-posted from mindlevelup>
I’ve thinking about minds and motivation somewhat on/off for about a year now, and I think I now have a model that merges some related ideas together into something useful. The model is called Attractor Theory, and it brings together ideas from Optimizing Your Mindstate, behavioral economics, and flow.
Attractor Theory is my attempt to provide a way of looking at the world that hybridizes ideas from the Resolve paradigm (where humans Actually Try and exert their will) and the “click-whirr” paradigm (where humans are driven by “if-then” loops and proceduralized habits).
As a brief summary, Attractor Theory basically states that you should consider any action you take as being easier to continue than to start, as well as having meta-level effects on changing your perception of which actions feel desirable.
Here’s a metaphor that provides most of the intuitions behind Attractor Theory:
Imagine that you are in a hamster ball:
As a human inside this ball, you can kinda roll around by exerting energy. But it’s hard to do so all of the time — you’d likely get tired. Still, if you really wanted to, you could push the ball and move.
These are Utilons. They represent productivity hours, lives saved, HPMOR fanfictions written, or anything else you care about maximizing. You are trying to roll around and collect as many Utilons as possible.
But the terrain isn’t actually smooth. Instead, there are all these Attractors that pull you towards them. Attractors are like valleys, or magnets, or point charges. Or maybe electrically charged magnetic valleys. (I’m probably going to Physics Hell for that.)
The point is that they draw you towards them, and it’s hard to resist their pull.
Also, Attractors have an interesting property: Once you’re being pulled in by one, this actually modifies other Attractors. This usually manifests by changing how strongly other ones are pulling you in. Sometimes, though, this even means that some Attractors will disappear, and new ones may appear.
As a human, your goal is to navigate this tangle of Utilons and Attractors from your hamster ball, trying to collect Utilons.
Now you could just try to take a direct path to all the nearest Utilons, but that would mean exerting a lot of energy to fight the pull of Attractors that pull you in Utilon-sparse directions.
Instead, given that you can’t avoid Attractors (they’re everywhere!) and that you want to get as many Utilons as possible, the best thing to do seems to be to strategically choose which Attractors you’re drawn to and selectively choose when to exert energy to move from one to another to maximize your overall trajectory.
In the above metaphor, actions and situations serve as Attractors, which are like slippery slopes that pull you in. Your agency is represented by the “meta-human” that inhabits the ball, which has some limited control when it comes to choosing which Attractor-loops to dive into and which ones to pop out of.
So the default view of humans and decisions seems to be something like viewing actions as time-chunks that we can just slot into our schedule. Attractor Theory attempts to present a model that moves away from that and shifts our intuitions to:
a) think less about our actions in a vacuum / individually
b) consider starting / stopping costs more
c) see our preferences in a more mutable light
It’s my hope that thinking about actions in as “things that draw you in” can better improve our intuitions about global optimization:
My point here is that, phenomenologically, it feels like our actions change the sorts of things we might want. Every time we take an action, this will, in turn, prime how we view other actions, often in somewhat predictable ways. I might not know exactly how they’ll change, but we can get good, rough ideas from past experience and our imaginations.
For example, the set of things that feel desirable to me after running a marathon may differ greatly from the set of things after I read a book on governmental corruption.
(I may still have core values, like wanting everyone to be happy, which I place higher up in my sense of self, which aren’t affected by these, but I’m mainly focusing on how object-level actions feel for this discussion. There’s a longer decision-theoretic discussion here that I’ll save for a later post.)
When you start seeing your actions in terms of, not just their direct effects, but also their effects on how you can take further actions, I think this is useful. It changes your decision algorithm to be something like:
“Choose actions such that their meta-level effects on my by my taking them allow me to take more actions of this type in the future and maximize the number of Utilons I can earn in the long run.”
By phrasing it this way, it makes it more clear that most things in life are a longer-term endeavor that involve trying to globally optimize, rather than locally. It also provides a model for evaluating actions on a new axis — the extent to which is influences your future, which seems like an important thing to consider.
(While it’s arguable that a naive view of maximization should by default take this into account from a consequentialist lens, I think making it explicitly clear, as the above formulation does, is a useful distinction.)
This allows us to better evaluate actions which, by themselves, might not be too useful, but do a good job of reorienting ourselves into a better state of mind. For example, spending a few minutes outside to get some air might not be directly useful, but it’ll likely help clear my mind, which has good benefits down the line.
Along the same lines, you want to view actions, not as one-time deals, but a sort of process that actively changes how you perceive other actions. In fact, these effects should somtimes perhaps be as important a consideration as time or effort when looking at a task.
Attractor Theory also conceptually models the idea of precommitment:
Humans often face situations where we fall prey to “in the moment” urges, which soon turn to regret. These are known as time-inconsistent preferences, where what we want quickly shifts, often because we are in the presence of something that really tempts us.
An example of this is the dieter who proclaims “I’ll just give in a little today” when seeing a delicious cake on the restaurant menu, and then feeling “I wish I hadn’t done that” right after gorging themselves.
Precommitment is the idea that you can often “lock-in” your choices beforehand, such that you will literally be unable to give into temptation when the actual choice comes before you, or entirely avoid the opportunity to even face the choice.
An example from the above would be something like having a trustworthy friend bring food over instead of eating out, so you can’t stuff yourself on cake because you weren’t even the one who ordered food.
There’s seems to be a general principle here of going “upstream”, such that you’re trying to target places where you have the most control, such that you can improve your experiences later down the line. This seems to be a useful idea, whether the question is about finding leverage or self-control.
Attractor Theory views all actions and situations as self-reinforcing slippery slopes. As such, it more realistically models the act of taking certain actions as leading you to other Attractors, so you’re not just looking at things in isolation.
In this model, we can reasonably predict, for example, that any video on YouTube will likely lead to more videos because the “sucked-in-craving-more-videos Future You” will have different preferences than “needing-some-sort-of-break Present You”.
This view allows you to better see certain “traps”, where an action will lead you deeper and deeper down an addiction/reward cycle, like a huge bag of chips or a webcomic. These are situations where, after the initial buy-in, it becomes incredibly attractive to continue down the same path, as these actions make reinforce themselves, making it easy to continue on and on…
Under the Attractor metaphor, your goal, then, is to focus on finding ways of being drawn to certain actions and avoidong others. You wan to find ways that you can avoid specific actions which you could lead you down bad spirals, even if the initial actions themselves may not be that distractiong.
The result is chaining together actions and their effects on how you perceive things in an upstream way, like precommitment.
Exploring, Starting, and Stopping:
Local optima is also visually represented by this model: We can get caught in certain chains of actions that do a good job of netting Utilons. Similar to the above traps, it can be hard to try new things once we’ve found an effective route already.
Chances are, though, that there’s probably even more Utilons to be had elsewhere. In which case, being able to break out to explore new areas could be useful.
Attractor Theory also does a good job of modeling how actions seem much harder to start than to stop. Moving from one Attractor to a disparate one can be costly in terms of energy, as you need to move against the pull of the current Attractor.
Moving from one Attractor to a disparate one can be costly in terms of energy, as you need to move against the pull of the current Attractor.
Once you’re pulled in, though, it’s usually easier to keep going with the flow. So using this model ascribes costs to starting and places less of a cost on continuing actions.
By “pulled in”, I mean making it feel effortless or desirable to continue with the action. I’m thinking of the feeling you get when you have a decent album playing music, and you feel sort of tempted to switch it to a better album, except that, given that this good song is already playing, you don’t really feel like switching.
Given the costs between switching, you want to invest your efforts and agency into, perhaps not always choosing the immediate Utilon-maximizing action moment-by-moment but by choosing the actions / situations whose attractors pull you in desirable directions, or make it such that other desirable paths are now easier to take.
Summary and Usefulness:
Attractor Theory attempts to retain willpower as a coherent idea, while also hopefully more realistically modeling how actions can affect our preferences with regards to other actions.
It can serve as an additional intuition pump behind using willpower in certain situations. Thinking about “activation energy” in terms of putting in some energy to slide into positive Attractors removes the mental block I’ve recently had on using willpower. (I’d been stuck in the “motivation should come from internal cooperation” mindset.)
The meta-level considerations when looking at how Attractors affect how other Attractors affect us provides a clearer mental image of why you might want to precommit to avoid certain actions.
For example, when thinking about taking breaks, I now think about which actions can help me relax without strongly modifying my preferences. This means things like going outside, eating a snack, and drawing as far better break-time activities than playing an MMO or watching Netflix.
This is because the latter are powerful self-reinforcing Attractors that also pull me towards more reward-seeking directions, which might distract me from my task at hand. The former activities can also serve as breaks, but they don’t do much to alter your preferences, and thus, help keep you focused.
I see Attractor Theory as being useful when it comes to thinking upstream and providing an alternative view of motivation that isn’t exactly internally based.
Hopefully, this model can be useful when you look at your schedule to identify potential choke-points / bottlenecks can arise, as a result of factors you hadn’t previously considered, when it comes to evaluating actions.
Attractor Theory assumes that different things can feel desirable depending on the situation. It relinquishes some agency by assuming that you can’t always choose what you “want” because of external changes to how you perceive actions. It also doesn’t try to explain internal disagreements, so it’s still largely at odds with the Internal Double Crux model.
I think this is fine. The goal here isn’t exactly to create a wholly complete prescriptive model or a descriptive one. Rather, it’s an attempt to create a simplified model of humans, behavior, and motivation into a concise, appealing form your intuitions can crystallize, similar to the System 1 and System 2 distinction.
I admit that if you tend to use an alternate ontology when it comes to viewing how your actions relate to the concept of “you”, this model might be less useful. I think that’s also fine.
This is not an attempt to capture all of the nuances / considerations in decision-making. It’s simply an attempt to partially take a few pieces and put them together in a more coherent framework. Attractor Theory merely takes a few pieces that I’d previously had as disparate nodes and chunks them together into a more unified model of how we think about doing things.
Note: This post is in error, I've put up a corrected version of it here. I'm leaving the text in place, as historical record. The source of the error is that I set Pa(S)=Pe(D) and then differentiated with respect to Pa(S), while I should have differentiated first and then set the two values to be the same.
Nate Soares and Ben Levinstein have a new paper out on "Functional Decision theory", the most recent development of UDT and TDT.
It's good. Go read it.
This post is about further analysing the "Death in Damascus" problem, and to show that Joyce's "equilibrium" version of CDT (causal decision theory) is in a certain sense intermediate between CDT and FDT. If eCDT is this equilibrium theory, then it can deal with a certain class of predictors, which I'll call distribution predictors.
Death in Damascus
In the original Death in Damascus problem, Death is a perfect predictor. It finds you in Damascus, and says that it's already planned it's trip for tomorrow - and it'll be in the same place you will be.
You value surviving at $1000, and can flee to Aleppo for $1.
Classical CDT will put some prior P over Death being in Damascus (D) or Aleppo (A) tomorrow. And then, if P(A)>999/2000, you should stay (S) in Damascus, while if P(A)<999/2000, you should flee (F) to Aleppo.
FDT estimates that Death will be wherever you will, and thus there's no point in F, as that will just cost you $1 for no reason.
But it's interesting what eCDT produces. This decision theory requires that Pe (the equilibrium probability of A and D) be consistent with the action distribution that eCDT computes. Let Pa(S) be the action probability of S. Since Death knows what you will do, Pa(S)=Pe(D).
The expected utility is 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F). At equilibrium, this is 2000.Pe(A)(1-Pe(A))-Pe(A). And that quantity is maximised when Pe(A)=1999/4000 (and thus the probability of you fleeing is also 1999/4000).
This is still the wrong decision, as paying the extra $1 is pointless, even if it's not a certainty to do so.
So far, nothing interesting: both CDT and eCDT fail. But consider the next example, on which eCDT does not fail.
Statistical Death in Damascus
Let's assume now that Death has an assistant, Statistical Death, that is not a prefect predictor, but is a perfect distribution predictor. It can predict the distribution of your actions, but not your actual decision. Essentially, you have access to a source of true randomness that it cannot predict.
It informs you that its probability over whether to be in Damascus or Aleppo will follow exactly the same distribution as yours.
Classical CDT follows the same reasoning as before. As does eCDT, since Pa(S)=Pe(D), as before, since Statistical Death follows the same distribution as you do.
But what about FDT? Well, note that FDT will reach the same conclusion as eCDT. This is because 1000.Pa(S)Pe(A)+1000.Pa(F)Pe(D)-Pa(F) is the correct expected utility, the Pa(S)=Pe(D) assumption is correct for Statistical Death, and (S,F) is independent of (A,D) once the action probabilities have been fixed.
So on the Statistical Death problem, eCDT and FDT say the same thing.
Factored joint distribution versus full joint distributions
What's happening is that there is a joint distribution over (S,F) (your actions) and (D,A) (Death's actions). FDT is capable of reasoning over all types of joint distributions, and fully assessing how its choice of Pa acausally affects Death's choice of Pe.
But eCDT is only capable of reasoning over ones where the joint distribution factors into a distribution over (S,F) times a distribution over (D,A). Within the confines of that limitation, it is capable of (acausally) changing Pe via its choice of Pa.
Death in Damascus does not factor into two distributions, so eCDT fails on it. Statistical Death in Damascus does so factor, so eCDT succeeds on it. Thus eCDT seems to be best conceived of as a version of FDT that is strangely limited in terms of which joint distributions its allowed to consider.
Home appliances, such as washing machines, are apparently much less durable now than they were decades ago.
Perhaps this is a kind of mirror image of "cost disease". In many sectors (education, medicine), we pay much more now for a product that is no better than what we got decades ago at a far lower cost, even accounting for inflation. It takes more money to buy the same level of quality. Scott Alexander (Yvain) argues that the cause of cost disease is a mystery. There are several plausible accounts, but they don't cover all the cases in a satisfying way. (See the link for more on the mystery of cost disease.)
Now, what if the mysterious cause of cost disease were to set to work in a sector where price can't go up, for whatever reason? Then you would expect quality to take a nosedive. If price per unit quality goes up, but total price can't go up, then quality must go down. So maybe the mystery of crappy appliances is just cost disease in another guise.
In the spirit of inadequate accounts of cost disease, I offer this inadequate account of crappy appliances:
As things get better globally, they get worse locally.
Global goodness provides a buffer against local badness. This makes greater local badness tolerable. That is, the cheapest tolerable thing gets worse. Thus, worse and worse things dominate locally as things get better globally.
This principle applies in at least two ways to washing machines:
Greater global wealth: Consumers have more money, so they can afford to replace washing machines more frequently. Thus, manufacturers can sell machines that require frequent replacement.
Manufacturers couldn't get away with this if people were poorer and could buy only one machine every few decades. If you're poor, you prioritize durability more. In the aggregate, the market will reward durability more. But a rich market accepts less durability.
Better materials science: Globally, materials science has improved. Hence, at the local level, manufacturers can get away with making worse materials.
Rich people might tolerate a washer that lasts 3 years, give or take. But even they don't want a washer that breaks in one month. If you build washers, you need to be sure that nearly every single one lasts a full month, at least. But, with poor materials science, you have to overshoot by a lot to ensure of that. Maybe you have to aim for a mean duration of decades to guarantee that the minimum duration doesn't fall below one month. On the other hand, with better materials science, you can get the distribution of duration to cluster tightly around 3 years. You still have very few washers lasting only one month, but the vast majority of your washers are far less durable than they used to be.
Maybe this is just Nassim Taleb's notion of antifragility. I haven't read the book, but I gather that the idea is that individuals grow stronger in environments that contain more stressors (within limits). Conversely, if you take away the stressors (i.e., make the environment globally better), then you get more fragile individuals (i.e., things are locally worse).
You should always cooperate with an identical copy of yourself in the prisoner's dilemma. This is obvious, because you and the copy will reach the same decision.
That justification implicitly assumes that you and your copy as (somewhat) antagonistic: that you have opposite aims. But the conclusion doesn't require that at all. Suppose that you and your copy were instead trying to ensure that one of you got maximal reward (it doesn't matter which). Then you should still jointly cooperate because (C,C) is possible, while (C,D) and (D,C) are not (I'm ignoring randomising strategies for the moment).
Now look at the Newcomb problem. You decision enters twice: once when you decide how many boxes to take, and once when Omega is simulating or estimating you to decide how much money to put in box B. You would dearly like your two "copies" (one of which may just be an estimate) to be out of sync - for the estimate to 1-box while the real you two-boxes. But without any way of distinguishing between the two, you're stuck with taking the same action - (1-box,1-box). Or, seeing it another way, (C,C).
This also makes the Newcomb problem into an anti-coordination game, where you and your copy/estimate try to pick different options. But, since this is not possible, you have to stick to the diagonal. This is why the Newcomb problem can be seen both as an anti-coordination game and a prisoners' dilemma - the differences only occur in the off-diagonal terms that can't be reached.
Notes for future OT posters:
1. Please add the 'open_thread' tag.
2. Check if there is an active Open Thread before posting a new one. (Immediately before; refresh the list-of-threads page before posting.)
3. Open Threads should start on Monday, and end on Sunday.
4. Unflag the two options "Notify me of new top level comments on this article" and "
I've posted about this once before, but here's a more developed version of the idea. Does this pose a serious problem for the simulation hypothesis, or does it merely complicate the idea?
1. Which room am I in?
Imagine two rooms, A and B. At a timeslice t2, there are exactly 1000 people in room B and only 1 person in room A. Neither room contains any clues as to which it is; i.e., no one can see anyone else in room B. If you were placed in one of these rooms with only the information above, which would you guess that you were in? The correct answer appears to be room B. After all, if everyone were to bet that they are in room B, almost everyone would win, whereas if everyone were to bet that they are in room A, almost everyone would lose.
Now imagine that you are told that during a time segment t1 to t2, a total of 100 trillion people had sojourned in room A and only 1 billion in room B. How does this extra information influence your response? The question posed above is not which room you are likely to have been in, all things considered, but which room you are currently in at t2. Insofar as betting odds guide rational belief, it still follows that if everyone at t2 were to bet that they are in room A, almost everyone would lose. This differs from what appears to be the correct conclusion if one reasons across time, from t1 to t2. Thus, we can imagine that at some future moment t3 everyone who ever sojourned in either room A or B is herded into another room C and then asked whether their journeys from t1 to t3 took them through room A or B. In this case, most people would win the bet if they were to point at room A rather than room B.
Let’s complicate this situation. Since more people in total pass through room A than room B, imagine that people are swapped in and out of room A faster than room B. Once in either room, a blindfold is removed and the occupant is asked which room they are in. After they answer, the blindfold is put back on. Thus, there are more total instances of removing blindfolds in room A than room B between t1 and t2. Should this fact change your mind about where you are at exactly t2? Surely one could argue that the directly relevant information is that pertaining to each individual timeslice, rather than the historical details of occupants being swapped in and out of rooms. After all, the bet is being made at a particular timeslice about a particular timeslice, and the fact is that most people who bet at t2 that they are in room B at t2 will win some cash, whereas those who bet that they are in room A will lose.
2. The simulation argument
Nick Bostrom (2003) argues that at least one of the following disjuncts is true: (1) civilizations like ours tend to self-destruct before reaching technological maturity, (2) civilizations like ours tend to reach technological maturity but refrain from running a large number of ancestral simulations, or (3) we are almost certainly in a simulation. The third disjunct corresponds to the “simulation hypothesis.” It is based on the following premises: first, assume the truth of functionalism, i.e., that physical systems that exhibit the right functional organization will give rise to conscious mental states like ours. Second, consider the computational power that could be available to future humans. Bostrom provides a convincing analysis that future humans will have at least the capacity to run a large number of ancestral simulations—or, more generally, simulations in which minds sufficiently “like ours” exist.
The final step of the argument proceeds as follows: if (1) and (2) are false, then we do not self-destruct before reaching a state of technological maturity and do not refrain from running a large number of ancestral simulations. It follows that we run a large number of ancestral simulations. If so, we have no independent knowledge of whether we exist in vivo or in machina. A “bland” version of the principle of indifference thus tells us to distribute our probabilities equally among all the possibilities. Since the number of sims would far exceed the number of non-sims in this scenario, we should infer that we are almost certainly simulated. As Bostrom writes, “it may also be worth to ponder that if everybody were to place a bet on whether they are in a simulation or not, then if people use the bland principle of indifference, and consequently place their money on being in a simulation if they know that that’s where almost all people are, then almost everyone will win their bets. If they bet on not being in a simulation, then almost everyone will lose. It seems better that the bland indifference principle be heeded” (Bostrom 2003).
Now, let us superimpose the scenario of Section 1 onto the simulation argument. Imagine that our posthuman descendants colonize the galaxy and their population grows to 100 billion individuals in total. Imagine further that at t2 they are running 100 trillion simulations, each of which contains 100 billion individuals. Thus, the total number of sims equals 10^25. If one of our posthuman descendants were asked whether she is a sim or non-sim, she should therefore answer that she is almost certainly a sim. Alternatively, imagine that at t2 our posthuman descendants decide to run only a single simulation in the universe that contains a mere 1 billion sims, ceteris paribus. Given this situation: if one of our posthuman descendants were asked whether she is a sim given this information, she should quite clearly answer that she is most likely a non-sim.
With this in mind, consider a final possible scenario: our posthuman descendants decide to run simulations with relatively small populations in a serial fashion, that is, one at a time. These simulations could be sped up a million times to enable complete recapitulations of our evolutionary history (as per Bostrom). The result is that at any given timeslice the total number of non-sims will far exceed the total number of sims—yet across time the total number of sims will accumulate and eventually far exceed the total number of non-sims. The result is that if one takes a bird’s-eye view of our posthuman civilization from its inception to its decline (say, because of the entropy death of the cosmos), and if one were asked whether she is more likely to have existed in vivo or in machina, it appears that she should answer “I was a sim.”
But this might not be the right way to reason about the situation. Consider that history is nothing more than a series of timeslices, one after the other. Since the ratio of non-sims to sims favors the former at every possible timeslice, one might argue that one should always answer the question, “Are you right now more likely to exist in vivo or in machina?” with “I probably exist in vivo.” Again, the difficulty that skeptics of this answer must overcome is the ostensible fact that if everyone were to bet on being simulated at any given timeslice—even billions of years after the first serial simulation is run—then nearly everyone would lose, whereas if everyone were to bet that they are a non-sim, then almost everyone would win.
The tension here emerges from the difference between timeslice reasoning and the sort of “atemporal” reasoning that Bostrom employs. If the former is epistemically robust, then Bostrom’s tripartite argument fails because none of the disjuncts are true. This is because the scenario above entails (a) we survive to reach technological maturity, and (b) we run a large number of ancestor simulations, yet (c) we do not have reason to believe that we are in a simulation at any particular moment. The latter proposition depends, of course, upon how we run the simulations (serially versus in parallel) and, relatedly, how we decide to reason about our metaphysical status at each moment in time.
In conclusion, I am unsure about whether this constitutes a refutation of Bostrom or merely complicates the picture. At the very least, I believe it does the latter, requiring more work on the topic.
Bostrom, Nick. 2003. Are You Living in a Computer Simulation? Philosophical Quarterly. 53(211): 243-255.