Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

More marbles and Sleeping Beauty

1 Manfred 23 November 2014 02:00AM


Previously I talked about an entirely uncontroversial marble game: I flip a coin, and if Tails I give you a black marble, if Heads I flip another coin to either give you a white or a black marble.

The probabilities of seeing the two marble colors are 3/4 and 1/4, and the probabilities of Heads and Tails are 1/2 each.

The marble game is analogous to how a 'halfer' would think of the Sleeping Beauty problem - the claim that Sleeping Beauty should assign probability 1/2 to Heads relies on the claim that your information for the Sleeping Beauty problem is the same as your information for the marble game - same possible events, same causal information, same mutual exclusivity and exhaustiveness relations.

So what's analogous to the 'thirder' position, after we take into account that we have this causal information? Is it some difference in causal structure, or some non-causal anthropic modification, or something even stranger?

As it turns out, nope, it's the same exact game, just re-labeled.

In the re-labeled marble game you still have two unknown variables (represented by flipping coins), and you still have a 1/2 chance of black and Tails, a 1/4 chance of black and Heads, and a 1/4 chance of white and Heads.

And then to get the thirds, you ask the question "If I get a black marble, what is the probability of the faces of the first coin?" Now you update to P(Heads|black)=1/3 and P(Tails|black)=2/3.


Okay, enough analogies. What's going on with these two positions in the Sleeping Beauty problem?

1:            2:

Here are two different diagrams, which are really re-labelings of the same diagram. The first labeling is the problem where P(Heads|Wake) = 1/2. The second labeling is the problem where P(Heads|Wake) = 1/3. The question at hand is really - which of these two math problems corresponds to the word problem / real world situation?

As a refresher, here's the text of the Sleeping Beauty problem that I'll use: Sleeping Beauty goes to sleep in a special room on Sunday, having signed up for an experiment. A coin is flipped - if the coin lands Heads, she will only be woken up on Monday. If the coin lands Tails, she will be woken up on both Monday and Tuesday, but with memories erased in between. Upon waking up, she then assigns some probability to the coin landing Heads, P(Heads|Wake).

Diagram 1:  First a coin is flipped to get Heads or Tails. There are two possible things that could be happening to her, Wake on Monday or Wake on Tuesday. If the coin landed Heads, then she gets Wake on Monday. If the coin landed Tails, then she could either get Wake on Monday or Wake on Tuesday (in the marble game, this was mediated by flipping a second coin, but in this case it's some unspecified process, so I've labeled it [???]).  Because all the events already assume she Wakes, P(Heads|Wake) evaluates to P(Heads), which just as in the marble game is 1/2.

This [???] node here is odd, can we identify it as something natural? Well, it's not Monday/Tuesday, like in diagram 2 - there's no option that even corresponds to Heads & Tuesday. I'm leaning towards the opinion that this node is somewhat magical / acausal, just hanging around because of analogy to the marble game. So I think we can take it out. A better causal diagram with the halfer answer, then, might merely be Coin -> (Wake on Monday / Wake on Tuesday), where Monday versus Tuesday is not determined at all by a causal node, merely informed probabilistically to be mutually exclusive and exhaustive.

Diagram 2:  A coin is flipped, Heads or Tails, and also it could be either Monday or Tuesday. Together, these have a causal effect on her waking or not waking - if Heads and Monday, she Wakes, but if Heads and Tuesday, she Doesn't wake. If Tails, she Wakes. Her pre-Waking prior for Heads is 1/2, but upon waking, the event Heads, Tuesday, Don't Wake gets eliminated, and after updating P(Heads|Wake)=1/3.

There's a neat asymmetry here. In diagram 1, when the coin was Heads she got the same outcome no matter the value of [???], and only when the coin was Tails were there really two options. In Diagram 2, when the coin is Heads, two different things happen for different values of the day, while if the coin is Tails the same thing happens no matter the day.


Do these seem like accurate depictions of what's going on in these two different math problems? If so, I'll probably move on to looking closer at what makes the math problem correspond to the word problem.

Moderate Dark Arts to Promote Rationaly: Ends Justify Means?

3 Gleb_Tsipursky 17 November 2014 07:00PM

I'd like the opinion of Less Wrongers on the extent to which it is appropriate to use Dark Arts as a means of promoting rationality.

I and other fellow aspiring rationalists in the Columbus, OH Less Wrong meetup have started up a new nonprofit organization, Intentional Insights, and we're trying to optimize ways to convey rational thinking strategies widely and thus raise the sanity waterline. BTW, we also do some original research, as you can see in this Less Wrong article on "Agency and Life Domains," but our primary focus is promoting rational thinking widely, and all of our research is meant to accomplish that goal.

To promote rationality as widely as possible, we decided it's appropriate to speak the language of System 1, and use graphics, narrative, metaphors, and orientation toward pragmatic strategies to communicate about rationality to a broad audience. Some example are our blog posts about gaining agency, about research-based ways to find purpose and meaning, about dual process theory and other blog posts, as well as content such as videos on evaluating reality and on finding meaning and purpose in life.

Our reasoning is that speaking the language of System 1 would help us to reach a broad audience who are currently not much engaged in rationality, but could become engaged if instrumental and epistemic rationality strategies are presented in such a way as to create cognitive ease. We think the ends of promoting rationality justify the means of using such moderate Dark Arts - although the methods we use do not convey 100% epistemic rationality, we believe the ends of spreading rationality are worthwhile, and that once broad audiences who engage with our content realize the benefits of rationality, they can be oriented to pursue more epistemic accuracy over time. However, some Less Wrongers disagreed with this method of promoting rationality, as you can see in some of the comments on this discussion post introducing the new nonprofit. Some commentators expressed the belief that it is not appropriate to use methods that speak to System 1.

So I wanted to bring up this issue for a broader discussion on Less Wrong, and get a variety of opinions. What are your thoughts about the utility of using moderate Dark Arts of the type I described above if the goal is to promote rationality - do the ends justify the means? How much Dark Arts, if any, is it appropriate to use to promote rationality?


I just increased my Altruistic Effectiveness and you should too

8 AABoyles 17 November 2014 03:45PM

I was looking at the marketing materials for a charity (which I'll call X) over the weekend, when I saw something odd at the bottom of their donation form:

Check here to increase your donation by 3% to defray the cost of credit card processing.

It's not news to me that credit card companies charge merchants a cut of every transaction.  But the ramifications of this for charitable contributions had never sunk in. I use my credit card for all of the purchases I can (I get pretty good cash-back rates). Automatically drafting from my checking account (like a check, only without the check) costs X nothing. So I've increased the effectiveness of my charitable contributions by a small (<3%) amount by performing what amounts to a paperwork tweak.

If you use a credit card for donations, please think about making this tweak as well!

The Atheist's Tithe

6 Alsadius 13 November 2014 05:22PM

I made a comment on another site a week or two ago, and I just realized that the line of thought is one that LW would appreciate, so here's a somewhat expanded version. 

There's a lot of discussion around here about how to best give to charities, and I'm all for this. Ensuring donations are used well is important, and organizations like GiveWell that figure out how to get the most bang for your buck are doing very good work. An old article on LW (that I found while searching to make sure I wasn't being redundant by posting this) makes the claim that the difference between a decent charity and an optimal one can be two orders of magnitude, and I believe that. But the problem with this is, effective altruism only helps if people are actually giving money. 

People today don't tend to give very much to charity. They'll buy a chocolate bar for the school play or throw a few bucks in at work, but less than 2% of national income is donated even in the US, and the US is incredibly charitable by developed-world standards(the corresponding rate in Germany is about 0.1%, for example). And this isn't something that can be solved with math, because the general public doesn't speak math, it needs to be solved with social pressure. 

The social pressure needs to be chosen well. Folks like Jeff Kaufman and Julia Wise giving a massive chunk of their income to charity are of course laudable, but 99%+ of people will regard the thought of doing so with disbelief and a bit of horror - it's simply not going to happen on a large scale, because people put themselves first, and don't think they could possibly part with so much of their income. We need to settle for a goal that is not only attainable by the majority of people, but that the majority of people know in their guts is something they could do if they wanted. Not everyone will follow through, but it should be set at a level that inspires guilt if they don't, not laughter. 

Since we're trying to make it something people can live up to, it has to be proportional giving, not absolute - Bill Gates and Warren Buffett telling each other to donate everything over a billion is wonderful, but doesn't affect many other people. Conversely, telling people that everything over $50k should be donated will get the laugh reaction from ordinary-wealthy folks like doctors and accountants, who are the people we most want to tie into this system. Also, even if it was workable, it creates some terrible disincentives to working extra-hard, which is a bad way to structure a system - we want to maximize donations, not merely ask people to suffer for its own sake. 

Also, the rule needs to be memorable - we can't give out The Income Tax Act 2: Electric Boogaloo as our charitable donation manual, because people won't read it, won't remember it, and certainly won't pressure anyone else into following it. Ideally it should be extremely simple. And it'd be an added bonus if the amount chosen didn't seem arbitrary, if there was already a pre-existing belief that the number is generally appropriate for what part of your income should be given away. 

There's only one system that meets all these criteria - the tithe. Give away 10% of your income to worthy causes(not generally religion, though the religious folk of the world can certainly do so), keep 90% for yourself. It's practical, it's simple, it's guilt-able, it scales to income, it preserves incentives to work hard and thereby increase the total base of donations, and it's got a millennia-long tradition(which means both that it's proven to work and that people will believe it's a reasonable thing to expect).

Encouraging people to give more than that, or to give better than the default, are both worthwhile, but just like saving for retirement, the first thing to do is put enough money in, and only *then* worry about marginal changes in effectiveness. After all, putting Germany on the tithe rule is just as much of an improvement to charitable effectiveness as going from a decent charity to an excellent one, and it scales in a completely different way, so they can be worked on in parallel. 

This is a rule that I try to follow myself, and sometimes encourage others to do while I'm wearing my financial-advisor hat. (And speaking with that hat: If you're a person who will actually follow through on this, avoid chipping in a few dollars here and there when people ask, and save up for bigger donations. That way you get tax receipts, which lower your effective cost of donation, as well as letting you pick better charities). 

Deriving probabilities from causal diagrams

4 Manfred 13 November 2014 12:28AM

What this is: an attempt to examine how causal knowledge gets turned into probabilistic predictions.

I'm not really a fan of any view of probability that involves black boxes. I want my probabilities (or more practically, the probabilities of toy agents in toy problems I consider) to be derivable from what I know in a nice clear way, following some desideratum of probability theory at every step.

Causal knowledge sometimes looks like a black box, when it comes to assigning probabilities, and I would like to crack open that box and distribute the candy inside to smiling children.

What this is not: an attempt to get causal diagrams from constraints on probabilities.

That would be silly - see Pearl's article that was recently up here. Our reasonable desire is the reverse: getting the constraints on probabilities from the causal diagrams.


The Marble Game

Consider marbles. First, I use some coin-related process to get either Heads or Tails. If Tails, I give you a black marble. If Heads, I use some other process to choose between giving you a black marble or a white marble.

Causality is an important part of the marble game. If I manually interfere with the process that gives Heads or Tails, this can change the probability you should assign of getting a black marble. But if I manually interfere with the process that gives you white or black marbles, this won't change your probability of seeing Heads or Tails.


What I'd like versus what is

The fundamental principle of putting numbers to beliefs, that always applies, is to not make up information. If I don't know of any functional differences between two events, I shouldn't give them different probabilities. But going even further - if I learn a little information, it should only change my probabilities a little.

The general formulation of this is to make your probability distribution consistent with what you know, in the way that contains the very least information possible (or conversely, the maximum entropy). This is how to not make up information.

I like this procedure; if we write down pieces of knowledge as mathematical constraints, we can find correct distribution by solving a single optimization problem. Very elegant. Which is why it's a shame that this isn't at all what we do for causal problems.

Take the marble game. To get our probabilities, we start with the first causal node, figure out the probability of Heads without thinking about marbles at all (that's easy, it's 1/2), and then move on to the marbles while taking the coin as given (3/4 for black and 1/4 for white).

One cannot do this problem without using causal information. If we neglect the causal diagram, our information is the following: A: We know that Heads and Tails are mutually exclusive and exhaustive (MEE), B: we know that getting a black marble and getting a white marble are MEE, and C: we know that if the coin is Tails, you'll get a black marble.

This leaves three MEE options: Tails and Black (TB), HB, and HW. Maximizing entropy, they all get probability 1/3.

One could alternately think of it like this: if we don't have the causal part of the problem statement (the causal diagram D), we don't know whether the coin causes the marble choice, or the marble causes the coin choice - why not pick a marble first, and if it's W we give you an H coin, but if it's B we flip the coin? Heck, why have one cause the other at all? Indeed, you should recover the 1/3 result if you average over all the consistent causal diagrams.

So my question is - what causal constraints is our distribution subject to, and what is it optimizing? Not piece by piece, but all at once?


Rephrasing the usual process

One method is to just do the same steps as usual, but to think of the rationale in terms of knowledge / constraints and maximum entropy.

We start with the coin, and we say "because the coin's result isn't caused by the marbles, no information pertaining to marbles matters here. Therefore, P(H|ABCD) is just P(H|A) = 1/2" (First application of maximum entropy). Then we move on to the marbles, and applying information B and C, plus maximum entropy a second time, we learn that P(B|ABCD) = 3/4. All that our causal knowledge really meant for our probabilities was the equation P(H|ABCD)=P(H|A).

Alternatively, what if we only wanted to maximize something once, but let causal knowledge change the thing we were maximizing? We can say something like "we want to minimize the amount of information about the state of the coin, since that's the first causal node, and then minimize the amount of information about it's descendant node, the marble." Although this could be represented as one equation using linear multipliers, it's clearly the same process just with different labels.


Is it even possible to be more elegant?

Both of these approaches are... functional. I like the first one a lot better, because I don't want to even come close to messing with the principle of maximum entropy / minimal information. But I don't like that we never get to apply this principle all at once. Can we break our knowledge down more so that everything happen nice and elegantly?

The way we stated our knowledge above was as P(H|ABCD) = P(H|A). But this is equivalent to the statement that there's a symmetry between the left and right branches coming out of the causal node. We can express this symmetry using the equivalence principle as P(H)=P(T), or as P(HB)+P(HW)=P(TB).

But note that this is just hiding what's going on, because the equivalence principle is just a special case of the maximum entropy principle - we might as well just require that P(H)=1/2 but still say that at the end we're "maximizing entropy subject to this constraint."


Answer: Probably not

The general algorithm followed above is, for each causal node, to insert the condition that the probabilities of outputs of that node, given the starting information including the causal diagram, are equal to the probabilities given only the starting information related to that node or its parents - information about the descendants does not help determine probabilities of the parents.

Link: Snowden interviewed

5 polymathwannabe 12 November 2014 06:48PM

"The atomic bomb was the moral moment for physicists. Mass surveillance is the same moment for computer scientists, when they realize that the things they produce can be used to harm a tremendous number of people."


The Truth and Instrumental Rationality

9 the-citizen 01 November 2014 11:05AM

One of the central focuses of LW is instrumental rationality. It's been suggested, rather famously, that this isn't about having true beliefs, but rather its about "winning". Systematized winning. True beliefs are often useful to this goal, but an obsession with "truthiness" is seen as counter-productive. The brilliant scientist or philosopher may know the truth, yet be ineffective. This is seen as unacceptable to many who see instrumental rationality as the critical path to achieving one's goals. Should we all discard our philosophical obsession with the truth and become "winners"?

The River Instrumentus

You are leading a group of five people away from deadly threat which is slowly advancing behind you. You come to a river. It looks too dangerous to wade through, but through the spray of the water you see a number of stones. They are dotted across the river in a way that might allow you to cross. However, the five people you are helping are extremely nervous and in order to convince them to cross, you will not only have to show them its possible to cross, you will also need to look calm enough after doing it to convince them that it's safe. All five of them must cross, as they insist on living or dying together.

Just as you are about to step out onto the first stone it splutters and moves in the mist of the spraying water. It looks a little different from the others, now you think about it. After a moment you realise its actually a person, struggling to keep their head above water. Your best guess is that this person would probably drown if they got stepped on by five more people. You think for a moment, and decide that, being a consequentialist concerned primarily with the preservation of life, it is ultimately better that this person dies so the others waiting to cross might live. After all, what is one life compared with five?

However, given your need for calm and the horror of their imminent death at your hands (or feet), you decide it is better not to think of them as a person, and so you instead imagine them being simply a stone. You know you'll have to be really convincingly calm about this, so you look at the top of the head for a full hour until you utterly convince yourself that the shape you see before you is factually indicitative not of a person, but of a stone. In your mind, tops of heads aren't people - now they're stones. This is instrumentally rational - when you weigh things up the self-deception ultimately increases the number of people who will likely live, and there is no specific harm you can identify as a result.

After you have finished convincing yourself you step out onto the per... stone... and start crossing. However, as you step out onto the subsequent stones, you notice they all shift a little under your feet. You look down and see the stones spluttering and struggling. You think to yourself "lucky those stones are stones and not people, otherwise I'd be really upset". You lead the five very greatful people over the stones and across the river. Twenty dead stones drift silently downstream.

When we weigh situations on pure instrumentality, small self deception makes sense. The only problem is, in an ambiguous and complex world, self-deceptions have a notorious way of compounding eachother, and leave a gaping hole for cognitive bias to work its magic. Many false but deeply-held beliefs throughout human history have been quite justifiable on these grounds. Yet when we forget the value of truth, we can be instrumental, but we are not instrumentally rational. Rationality implies, or ought to imply, a value of the truth.

Winning and survival

In the jungle of our evolutionary childhood, humanity formed groups to survive. In these groups there was a hierachy of importance, status and power. Predators, starvation, rival groups and disease all took the weak on a regular basis, but the groups afforded a partial protection. However, a violent or unpleasant death still remained a constant threat. It was of particular threat to the lowest and weakest members of the group. Sometimes these individuals were weak because they were physically weak. However, over time groups that allowed and rewarded things other than physical strength became more successful. In these groups, discussion played a much greater role in power and status. The truely strong individuals, the winners in this new arena were one's that could direct converstation in their favour - conversations about who will do what, about who got what, and about who would be punished for what. Debates were fought with words, but they could end in death all the same.

In this environment, one's social status is intertwined with one's ability to win. In a debate, it was not so much a matter of what was true, but of what facts and beliefs achieved one's goals. Supporting the factual position that suited one's own goals was most important. Even where the stakes where low or irrelevant, it payed to prevail socially, because one's reputation guided others limited cognition about who was best to listen to. Winning didn't mean knowing the most, it meant social victory. So when competition bubbled to the surface, it payed to ignore what one's opponent said and instead focus on appearing superior in any way possible. Sure, truth sometimes helped, but for the charismatic it was strictly optional. Politics was born.

Yet as groups got larger, and as technology began to advance for the first time, there appeared a new phenomenon. Where a group's power dynamics meant that it systematically had false beliefs, it became more likely to fail. The group that believing that fire spirits guided a fire's advancement fared poorly compared with those who checked the wind and planned their means of escape accordingly. The truth finally came into its own. Yet truth, as opposed to simple belief by politics, could not be so easily manipulated for personal gain. The truth had no master. In this way it was both dangerous and liberating. And so slowly but surely the capacity for complex truth-pursuit became evolutionarily impressed upon the human blueprint.

However, in evolutionary terms there was little time for the completion of this new mental state. Some people had it more than others. It also required the right circumstances for it to rise to the forefront of human thought. And other conditions could easily destroy it. For example, should a person's thoughts be primed with an environment of competition, the old ways came bubbling up to the surface. When a person's environment is highly competitive, it reverts to its primitive state. Learning and updating of views becomes increasingly difficult, because to the more primitive aspects of a person's social brain, updating one's views is a social defeat.

When we focus an organisation's culture on winning, there can be many benefits. It can create an air of achievement, to a degree. Hard work and the challenging of norms can be increased. However, we also prime the brain for social conflict. We create an environment where complexity and subtlety in conversation, and consequently in thought, is greatly reduced. In organisations where the goals and means are largely intellectual, a competitive environment creates useless conversations, meaningless debates, pointless tribalism, and little meaningful learning. There are many great examples, but I think you'd be best served watching our elected representatives at work to gain a real insight.

Rationality and truth

Rationality ought to contain an implication of truthfulness. Without it, our little self-deceptions start to gather and compond one another. Slowly but surely, they start to reinforce, join, and form an unbreakable, unchallengable yet utterly false belief system. I need not point out the more obvious examples, for in human society, there are many. To avoid this on LW and elsewhere, truthfulness of belief ought to inform all our rational decisions, methods and goals. Of course true beliefs do not guarantee influence or power or achievement, or anything really. In a world of half-evolved truth-seeking equipment, why would we expect that?  What we can expect is that, if our goals are anything to do with the modern world in all its complexity, the truth isn't sufficient, but it is neccessary.

Instrumental rationality is about achieving one's goals, but in our complex world goals manifest in many ways - and we can never really predict how a false belief will distort our actions to utterly destroy our actual achievements. In the end, without truth, we never really see the stones floating down the river for what they are.

Cross-temporal dependency, value bounds and superintelligence

6 joaolkf 28 October 2014 03:26PM

In this short post I will attempt to put forth some potential concerns that should be relevant when developing superintelligences, if certain meta-ethical effects exist. I do not claim they exist, only that it might be worth looking for them since their existence would mean some currently irrelevant concerns are in fact relevant. 


These meta-ethical effects would be a certain kind of cross-temporal dependency on moral value. First, let me explain what I mean by cross-temporal dependency. If value is cross-temporal dependent it means that value at t2 could be affected by t1, independently of any causal role t1 has on t2. The same event X at t2 could have more or less moral value depending onwhether Z or Y happened at t1. For instance, this could be the case on matters of survival. If we kill someone and replace her with a slightly more valuable person some would argue there was a lost rather than a gain of moral value; whereas if a new person with moral value equal to the difference of the previous two is created where there was none, most would consider an absolute gain. Furthermore, some might consider small, gradual and continual improvements are better than abrupt and big ones. For example, a person that forms an intention and a careful detailed plan to become better, and forceful self-wrought to be better could acquire more value than a person that simply happens to take a pill and instantly becomes a better person - even if they become that exact same person. This is not because effort is intrinsically valuable, but because of personal continuity. There are more intentions, deliberations and desires connecting the two time-slices of the person who changed through effort than there are connecting the two time-slices of the person who changed by taking a pill. Even though both persons become equally morally valuable in isolated terms, they do so from different paths that differently affects their final value.

More examples. You live now in t1. If suddenly in t2 you were replaced by an alien individual with the same amount of value as you would otherwise have in t2, then t2 may not have the exact same amount of value as it would otherwise have, simply in virtue of the fact that in t1 you were alive and the alien's previous time slice was not. 365 individuals with a 1 day life do not amount for the same value as a single individual living through 365 days. Slice history in 1 day periods, each day the universe contains one unique advanced civilization with the same overall total moral value, each civilization being completely alien and ineffable to another, each civilization only lives for one day, and then it would be gone forever. This universe does not seem to hold the same moral value as the one where only one of those civilizations flourishes for eternity. On all these examples the value of a period of time seems to be affected by the existence or not of certain events at other periods. They indicate that there is, at least, some cross-temporal dependency.


Now consider another type of effect, bounds on value. There could be a physical bound – transfinite or not - on the total amount of moral value that can be present per instant. For instance, if moral value rests mainly on sentient well-being, which can be categorized as a particular kind of computation, and there is a bound on the total amount of such computation which can be performed per instant, then there is a bound on the amount of value per instant. If, arguably, we are currently extremely far from such bound, and this bound will eventually be reached by a superintelligence (or any other structure), then the total moral value of the universe would be dominated by the value of this physical bound, given that regions where the physical bound wasn't reached would make negligible contributions. How much faster the bound can be reached, also how much more negligible pre-bound values are.


Finally, if there is a form of value cross-temporal dependence where preceding events leading to a superintelligence could alter the value of this physical bound, then we not only ought to make sure we safely construct a superintelligence, but also that we do so following the path that maximizes such bound. It might be the case that an overly abrupt superintelligence would decrease such bound, thus all future moral value would be diminished by the fact there was a huge discontinuity in the past in the events leading to this future. Even small decreases on such bound would have dramatic effects. Although I do not know of any plausible cross-temporal effect of such kind, it seems this question deserves at least a minimal amount of though. Both cross-temporal dependency and bounds on value seem plausible (in fact I believe some form of them are true), so it is not at all prima facie inconceivable that we could have cross-temporal effects changing the bound up or down.

[Link]"Neural Turing Machines"

16 Prankster 31 October 2014 08:54AM

The paper.

Discusses the technical aspects of one of Googles AI projects. According to a pcworld the system "apes human memory and programming skills" (this article seems pretty solid, also contains link to the paper). 

The abstract:

We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.


(First post here, feedback on the appropriateness of the post appreciated)

Why is the A-Theory of Time Attractive?

6 Tyrrell_McAllister 31 October 2014 11:11PM

I've always been puzzled by why so many people have such strong intuitions about whether the A-theory or the B-theory1 of time is true.  [ETA: I've written "A-theory" and "B-theory" as code for "presentism" and "eternalism", but see the first footnote.]  It seems like nothing psychologically important turns on this question.  And yet, people often have a very strong intuition supporting one theory over the other.  Moreover, this intuition seems to be remarkably primitive.  That is, whichever theory you prefer, you probably felt an immediate affinity for that conception of time as soon as you started thinking about time at all.  The intuition that time is A-theoretic or B-theoretic seems pre-philosophical, whichever intuition you have.  This intuition will then shape your subsequent theoretical speculations about time, rather than vice-verse.

Consider, by way of contrast, intuitions about God.  People often have a strong pre-theoretical intuition about whether God exists.  But it is easy to imagine how someone could form a strong emotional attachment to the existence of God early in life.  Can emotional significance explain why people have deeply felt intuitions about time?  It seems like the nature of time should be emotionally neutral2.

Now, strong intuitions about emotionally neutral topics aren't so uncommon.  For example, we have strong intuitions about how addition behaves for large integers.  But usually, it seems, such intuitions are nearly unanimous and can be attributed to our common biological or cultural heritage.  Strong disagreeing intuitions about neutral topics seem rarer.

Speaking for myself, the B-theory has always seemed just obviously true.  I can't really make coherent sense out of the A-theory.  If I had never encountered the A-theory, the idea that time might work like that would not have occurred to me.  Nonetheless, at the risk of being rude, I am going to speculate about how A-theorists got that way.  (B-theorists, of course, just follow the evidence ;).)

I wonder if the real psycho-philosophical root of the A-theory is the following. If you feel strongly committed to the A-theory, maybe you are being pushed into that position by two conflicting intuitions about your own personal identity.

Intuition 1: On the one hand, you have a notion of personal identity according to which you are just whatever is accessible to your self-awareness right now, plus maybe whatever metaphysical "supporting machinery" allows you to have this kind of self-awareness.

Intuition 2: On the other hand, you feel that you must identify yourself, in some sense, with you-tomorrow.  Otherwise, you can give no "rational" account of the particular way in which you care about and feel responsible for this particular tomorrow-person, as opposed to Brittany-Spears-tomorrow, say.

But now you have a problem.  It seems that if you take this second intuition seriously, then the first intuition implies that the experiences of you-tomorrow should be accessible to you-now.  Obviously, this is not the case.  You-tomorrow will have some particular contents of self-awareness, but those contents aren't accessible to you-now.  Indeed, entirely different contents completely fill your awareness now — contents which will not be accessible in this direct and immediate way to you-tomorrow.

So, to hold onto both intuitions, you must somehow block the inference made in the previous paragraph.  One way to do this is to go through the following sequence:

  1. Take the first intuition on board without reservation.
  2. Take the second intuition on board in a modified way: "identify" you-now with you-tomorrow, but don't stop there.  If you left things at this point, the relationship of "identity" would entail a conduit through which all of your tomorrow-awareness should explode into your now, overlaying or crowding out your now-awareness.  You must somehow forestall this inference, so...
  3. Deny that you-tomorrow exists!  At least, deny that it exists in the full sense of the word.  Thus, metaphorically, you put up a "veil of nonexistence" between you-tomorrow and you-now.  This veil of nonexistence explains the absence of the tomorrow-awareness from your present awareness. The tomorrow-awareness is absent because it simply doesn't exist!  (—yet!)  Thus, in step (2), you may safely identify you-now with you-tomorrow.  You can go ahead and open that conduit to the future, without any fear of what would pour through into the now, because there simply is nothing on the other side.

One potential problem with this psychological explanation is that it doesn't explain the significance of "becoming".  Some A-theorists report that a particular basic experience of "becoming" is the immediate reason for their attachment to the A-theory.  But the story above doesn't really have anything to do with "becoming", at least not obviously.  (This is because I can't make heads or tails of "becoming".)

Second, intuitions about time, even in their primitive pre-reflective state, are intuitions about everything in time.  Yet the story above is exclusively about oneself in time.  It seems that it would require something more to pass from intuitions about oneself in time to intuitions about how the entire universe is in time.


1 [ETA: In this post, I use the words "A-theory" and "B-theory" as a sloppy shorthand for "presentism" and "eternalism", respectively.  The point is that these are theories of ontology ("Does the future exist?"), and not just theories about how we should talk about time.  This shouldn't seem like merely a semantic or vacuous dispute unless, as in certain caricatures of logical positivism, you think that the question of whether X exists is always just the question of whether X can be directly experienced.]

2 Some people do seem to be attached to the A-theory because they think that the B-theory takes away their free will by implying that what they will choose is already the case right now.  This might explain the emotional significance of the A-theory of time for some people.  But many A-theorists are happy to grant, say, that God already knows what they will do.  I'm trying to understand those A-theorists who aren't bothered by the implications of the B-theory for free will.

View more: Next