## The Bay Area Solstice

18 03 December 2014 10:33PM

As the holiday season approaches, we continue our tradition of celebrating the winter solstice.

This event is the offspring of Raemon's New York Solstice. The core of the event is a collection of songs old and new, silly and profound, led by the well-calibrated Bayesian choir. There will be bean bag chairs and candles. There will be campfire and chocolates (in case of dementors).

When: The Bay Area Solstice will be held on 13 December at 7:00 PM.

Where: We've rented the Humanist Hall, at 390 27th St, Oakland, CA 94612.

All humanists or transhumanists are welcome. We'll be diving our minds into the nature of the universe, both good and bad. We'll stare into the abyss of death, and into the radiance of our ability to remove it. We will recognize each other as allies and agents.

We're glad to provide aspiring rationalists with an alternative or addition to any holiday celebrations. There is an expected attendance of around 80 people.

Get your tickets here! And if you'd like to help us put it together, PM me.

## Mathematical Measures of Optimization Power

3 24 November 2012 10:55AM

In explorations of AI risk, it is helpful to formalize concepts. One particularly important concept is intelligence. How can we formalize it, or better yet, measure it? “Intelligence” is often considered mysterious or is anthropomorphized. One way to taboo “intelligence” is to talk instead about optimization processes. An optimization process (OP, also optimization power) selects some futures from a space of possible futures. It does so according to some criterion; that is, it optimizes for something. Eliezer Yudkowsky spends a few of the sequence posts discussing the nature and importance of this concept for understanding AI risk. In them, he informally describes a way to measure the power of an OP. We consider mathematical formalizations of this measure.

Here's EY's original description of his measure of OP.

Put a measure on the state space - if it's discrete, you can just count. Then collect all the states which are equal to or greater than the observed outcome, in that optimization process's implicit or explicit preference ordering. Sum or integrate over the total size of all such states. Divide by the total volume of the state space. This gives you the power of the optimization process measured in terms of the improbabilities that it can produce - that is, improbability of a random selection producing an equally good result, relative to a measure and a preference ordering.

If you prefer, you can take the reciprocal of this improbability (1/1000 becomes 1000) and then take the logarithm base 2. This gives you the power of the optimization process in bits.

Let's say that at time $t=0$ we have a formalism to specify all possible world states $w \in W$ at some future time $t=1$. Perhaps it is a list of particle locations and velocities, or perhaps it is a list of all possible universal wave functions. Or maybe we're working in a limited domain, and it's a list of all possible next-move chess boards. Let's also assume that we have a well-justified prior $p(w)$ over these states being the next ones to occur in the absence of an OP (more on that later).

We order $W$ according to the OP's preferences. For the moment, we actually don't care about the density, or “measure” of our ordering. Now we have a probability distribution over $W$. The integral from $w=a$ to $w=b$ over this represents the probability that the worldstate at $t=1$ will be better than $a$, and worse than $b$. When time continues, and the OP acts to bring about some worldstate $\omega$, we can calculate the probability of an equal or better outcome occurring;

$\int_\omega^\infty p(w)dw$

This is a simple generalization of what EY describes above. Here are some things I am confused about.

Finding a specification for all possible worldstates is hard, but it's been done before. There are many ways to reasonably represent this. What I can't figure out is how to specify possible worldstates “in the absence of an OP”. This phrase hides tons of complexity. How can we formally construct this counterfactual? Is the matter that composes the OP no longer present? Is it present but “not acting”? What constitutes a null action? Are we considering the expected worldstate distribution as if the OP never existed? If the OP is some kind of black-box AI agent, it's easier to imagine this. But if the OP is evolution, or a forest fire, it's harder to imagine. Furthermore, is the specification dualist, or is the agent part of the worldstates? If it's dualist, this is a fundamental falseness which can have lots of bad implications. If the agent is part of the worldstates, how do we represent them “in absence of an OP”?

But for the rest of this article, let's pretend we have such a specification. There's also a loss from ignoring the cardinal utility of the worldstates. Let's say you have the two distributions of utility over sets $W$, representing two different OPs. In both, the OP choose a $w$ with the same utility $u$. The distributions are the same on the left side of $u$, and the second distribution has a longer tail on the right. It seems like the OP in distribution 1 was more impressive; the second OP missed all the available higher utility. We could make the expected utility of the second distribution arbitrarily high, while maintaining the same fraction of probability mass above the achieved worldstate. Conversely, we could instead extend the left tail of the second distribution, and say that the second OP was more impressive because it managed to avoid all the bad worlds.

Perhaps it is more natural to consider two distributions; the distribution of utility over entire world futures assuming the OP isn't present, versus the distribution after the OP takes its action. So instead of selecting a single possibility with certainty, the probabilities have just shifted.

How should we reduce this distribution shift to a single number which we call OP? Any shift of probability mass upwards in utility should increase the measure of OP, and vice versa. I think also that an increase in the expected utility (EU) of these distributions should be measured as a positive OP, and vice versa. EU seems like the critical metric to use. Let's generalize a little further, and say that instead of measuring OP between two points in time, we let the time difference go to zero, and measure instantaneous OP. Therefore we're interested in some equation which has the same sign as

$\frac{dEU}{dt} = EU'$.

Besides that, I'm not exactly sure which specific equation should equal OP. I seem to have two contradicting desires;

1a) The sign of $EU'$ should be the sign of the OP.

b) Negative $EU$ and $EU'$ should be possible.

2) Constant positive OP should imply exponentially increasing $EU$.

Criterion 1) feels pretty obvious. Criterion 2) feels like a recognition of what is “natural” for OPs; to improve upon themselves, so that they can get better and better returns. The simplest differential equation that represents positive feedback yields exponentials, and is used across many domains because of its universal nature.

$y' = cy$

This intuition certainly isn't anthropocentric, but it might be this-universe biased. I'd be interested in seeing if it is natural in other computable environments.

If we just use $OP = EU'$, then criterion 2) is not satisfied. If we use $OP = \log EU'$, then decreases in EU are not defined, and constant EU is negative infinite OP, violating 1). If we use $OP = EU'/EU$, then 2) is satisfied, but negative and decreasing EU give positive OP, violating 1a). If we use $OP = EU''/EU'$, then 2) is still satisfied, but $EU = at + b$ gives $OP = 0$, violating 1a). Perhaps the only consistent equation would be $OP = EU'/|EU|$. But seriously, who uses absolute values? I can't recall a fundamental equation that relied on them. They feel totally ad hoc. Plus, there's this weird singularity at $EU = 0$. What's up with that?

Classically, utility is invariant up to positive affine transformations. Criterion 1) respects this because the derivative removes the additive constant, but 2) doesn't. It is still scale invariant, but it has an intrinsic zero. This made me consider the nature of “zero utility”. At least for humans, there is an intuitive sign to utility. We wouldn't say that stubbing your toe is 1,000,000 utils, and getting a car is 1,002,000 utils. It seems to me, especially after reading Omohundro's “Basic AI Drives”, that there is in some sense an intrinsic zero utility for all OPs.

All OPs need certain initial conditions to even exist. After that, they need resources. AIs need computer hardware and energy. Evolution needed certain chemicals and energy. Having no resources makes it impossible, in general, to do anything. If you have literally zero resources, you are not a "thing" which "does". So that is a type of intrinsic zero utility. Then what would having negative utility mean? It would mean the OP anti-exists. It's making it even less likely for it to be able to start working toward its utility function. What would exponentially decreasing utility mean? It would mean that it is a constant OP for the negative of the utility function that we are considering. So, it doesn't really have negative optimization power; if that's the result of our calculation, we should negate the utility function, and say it has positive OP. And that singularity at $EU = 0$? When you go from the positive side, getting closer and closer to 0 is really bad, because you're destroying the last bits of your resources; your last chance of doing any optimization. And going from negative utility to positive is infinite impressive, because you bootstrapped from optimizing away from your goal to optimizing toward your goal.

So perhaps we should drop the part of 1b) that says negative EU can exist. Certainly world-states can exist that are terrible for a given utility function, but if an OP with that utility function exists, then the expected utility of the future is positive.

If this is true, then it seems there is more to the concept of utility than the von Neumann-Morgenstern axioms.

How do people feel about criterion 2), and my proposal that $OP = EU'/|EU|$ ?

## Modifying Universal Intelligence Measure

2 18 September 2012 11:44PM

In 2007, Legg and Hutter wrote a paper using the AIXI model to define a measure of intelligence. It's pretty great, but I can think of some directions of improvement.

• Reinforcement learning. I think this term and formalism are historically from much simpler agent models which actually depended on being reinforced to learn. In its present form (Hutter 2005 section 4.1) it seems arbitrarily general, but it still feels kinda gross to me. Can we formalize AIXI and the intelligence measure in terms of utility functions, instead? And perhaps prove them equivalent?
• Choice of Horizon. AIXI discounts the future by requiring that total future reward is bounded, and therefore so does the intelligence measure. This seems to me like a constraint that does not reflect reality, and possibly an infinitely important one. How could we remove this requirement? (Much discussion on the "Choice of the Horizon" in Hutter 2005 section 5.7).
• Unknown utility function. When we reformulate it in terms of utility functions, let's make sure we can measure its intelligence/optimization power without having to know its utility function. Perhaps by using an average of utility functions weighted by their K-complexity.
• AI orientation. Finally, and least importantly, it tests agents across all possible programs, even those which are known to be inconsistent with our universe. This might okay if your agent is a playing arbitrary games on a computer, but if you are trying to determine how powerful an agent will be in this universe, you probably want to replace the Solomonoff prior with the posterior resulting from updating the Solomonoff prior with data from our universe.

Any thought or research on this by others? I imagine lots of discussion has occurred over these topics; any referencing would be appreciated.

## An Intuitive Explanation of Solomonoff Induction

54 11 July 2012 08:05AM

This is the completed article that Luke wrote the first half of. My thanks go to the following for reading, editing, and commenting; Luke Muehlhauser, Louie Helm, Benjamin Noble, and Francelle Wax.

People disagree about things. Some say that television makes you dumber; other say it makes you smarter. Some scientists believe life must exist elsewhere in the universe; others believe it must not. Some say that complicated financial derivatives are essential to a modern competitive economy; others think a nation's economy will do better without them. It's hard to know what is true.

And it's hard to know how to figure out what is true. Some argue that you should assume the things you are most certain about and then deduce all other beliefs from your original beliefs. Others think you should accept at face value the most intuitive explanations of personal experience. Still others think you should generally agree with the scientific consensus until it is disproved.

Wouldn't it be nice if determining what is true was like baking a cake? What if there was a recipe for finding out what is true? All you'd have to do is follow the written directions exactly, and after the last instruction you'd inevitably find yourself with some sweet, tasty truth!

In this tutorial, we'll explain the closest thing we’ve found so far to a recipe for finding truth: Solomonoff induction.

There are some qualifications to make. To describe just one: roughly speaking, you don't have time to follow the recipe. To find the truth to even a simple question using this recipe would require you to follow one step after another until long after the heat death of the universe, and you can't do that.

But we can find shortcuts. Suppose you know that the exact recipe for baking a cake asks you to count out one molecule of H2O at a time until you have exactly 0.5 cups of water. If you did that, you might not finish the cake before the heat death of the universe. But you could approximate that part of the recipe by measuring out something very close to 0.5 cups of water, and you'd probably still end up with a pretty good cake.

Similarly, once we know the exact recipe for finding truth, we can try to approximate it in a way that allows us to finish all the steps sometime before the sun burns out.

This tutorial explains that best-we've-got-so-far recipe for finding truth, Solomonoff induction. Don’t worry, we won’t be using any equations, just qualitative descriptions.

Like Eliezer Yudkowsky's Intuitive Explanation of Bayes' Theorem and Luke Muehlhauser's Crash Course in the Neuroscience of Human Motivation, this tutorial is long. You may not have time to read it; that's fine. But if you do read it, we recommend that you read it in sections.

## Should LW have a separate AI section?

9 10 July 2012 01:42AM

LessWrong seems to have two main topics for discussion; rationality and AI. This, of course, is caused by Eliezer Yudkowsky's sequences (and interests), which are mostly about rationality, but also include a lot of writing about AI. LessWrong currently has two sections, Main and Discussion. This is meant to separate the purpose and quality of the post. I think usability of the site has improved greatly since adding the Discussion section. Should split the discussion section, and have one for rationality and one for AI?

• It would make LW more pleasant for rationality enthusiasts who don't want to sort through lots of AI discussions.
• It would make LW more pleasant for AI enthusiasts who don't want to sort through lots of rationality discussions.
• It would not split up the community as much as a new AI community site would.
• It would increase LW's capacity for discussion.
• Newcomers could learn about rationality, while people who want higher quality of discussion on AI topics could have it.

• Many posts about both subjects are highly relevant about both subjects
• It would make LW less pleasant for enthusiasts of both subjects.
• It would split up the community more than doing nothing.
• Everybody has their ideas for separate sections, and we can't do them all.
What do you guys think?

## How Bayes' theorem is consistent with Solomonoff induction

9 09 July 2012 10:16PM

You've read the introduction to Bayes' theorem. You've read the introduction to Solomonoff induction. Both describe fundamental theories of epistemic rationality. But how do they fit together?

It turns out that it’s pretty simple. Let’s take a look at Bayes’ theorem.

$P(H|E)=\frac{P(E|H)P(H)}{\sum_{i}P(E|H_i)P(H_i)}$

For a review:

• H is the particular hypothesis in question.
• E is the evidence, or data that we have observed.
• P(H) is the “prior” probability of the hypothesis alone.
• P(E|H) is the probability of seeing the evidence given that the hypothesis is true.
• P(H|E) is the probability that the hypothesis is true, given that you’ve seen the evidence.
• Hi is an arbitrary element in the set of all hypotheses.

In terms of Solomonoff induction:

• H is the set of all binary sequences which we input to the universal Turing machine.
• E is the binary sequence of data that we are given to match.
• P(H) is $2^{-l(H)}$ where l(H) is the length of the binary sequence H. This is a basic premise of Solomonoff induction.
• P(E|H) is the probability that we will see data sequence E, given that we run program H on the universal Turing machine. Because this is deterministic it is either 1, if H outputs E, or 0, if H does not output E.
• P(H|E) is still what we’re looking for. Of course, if H does not output E, then P(E|H) = 0, which means P(H|E) = 0. This makes sense; if a program does not output the data you have, it cannot be the true program which output the data you have.
• Hi is an arbitrary binary sequence.

The denominator is the same meaning as the numerator, except as a sum for every possible hypothesis. This essentially normalizes the probability in the numerators. Any hypotheses that do not match the data E exactly will cause P(E|Hi) = 0, and therefore that term will contribute nothing to the sum. If the hypothesis does output E exactly, then P(E|Hi) = 1, and the matching hypothesis contributes its weight to the renormalizing sum in the denominator.

Let's see an example with these things substituted. Here, the set of Hi is the set of hypotheses that match.

$P(H|E)=\frac{2^{-l(H)}}{\sum_{i}2^{-l(H_i)}}$

In summary; Bayes’ theorem says that once we find all matching hypotheses, we can find their individual probability by dividing their individual weight of $2^{-l(H)}$ by the weights of all the matching hypotheses.

This is intuitive, and matches Bayes’ theorem both mathematically and philosophically. Updating will occur when you get more bits of evidence E. This will eliminate some of the hypotheses Hi, which will cause the renormalization in the denominator to get smaller.

## Computation Hazards

15 13 June 2012 09:49PM
##### This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.

#### Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

#### Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.

#### Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

### Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

### Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

### References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.

## How do you notice when you're procrastinating?

4 02 March 2012 09:25AM

I'm going to steal Anna's idea and change it to the instrumental side of rationality. In Luke's algorithm for beating procrastination, Step 1 is to Notice You Are Procrastinating. I'm not so sure this is easy. For me, the knowledge sort of fades in and out without being explicitly grabbed by my consciousness. If I actually held onto that fact, the moment that I was evading a task, and made it clear to myself that I was doing the sub-optimal, and the consequences involved, I think it would go a long way towards getting me to actually get things done.

What do you use to catch it? How do you notice you're procrastinating? Leave your ideas below (one idea per comment), and upvote the comments that you either: (a) use; or (b) will now try using.

## [LINK] The NYT on Everyday Habits

6 18 February 2012 08:23AM

The New York Times just published this article on how companies use data mining and the psychology of habit formation to effectively target ads.

The process within our brains that creates habits is a three-step loop. First, there is a cue, a trigger that tells your brain to go into automatic mode and which habit to use. Then there is the routine, which can be physical or mental or emotional. Finally, there is a reward, which helps your brain figure out if this particular loop is worth remembering for the future. Over time, this loop — cue, routine, reward; cue, routine, reward — becomes more and more automatic. The cue and reward become neurologically intertwined until a sense of craving emerges.

It has some decent depth of discussion, including an example of the author actually using the concepts to stop a bad habit. The article is based on an upcoming book by the same author titled The Power of Habit.

I haven't seen emphasis of this particular phenomenon—habits consisting of a cue, routine, and reward—on Lesswrong. Do people think it's a valid, scientifically supported phenomenon? The article gives this impression but, of course, doesn't cite specific academic work on it. It ties in to the System 1/System 2 theory easily as a System 1 process. How much of the whole System 1 can be explained as an implementation of this cue, routine, reward process?

And most importantly, how can this fit into the procrastination equation as a tool to subvert akrasia and establish good habits?

Let's look at each of the four factors. If you've formed a habit, it means that the reward happened consistently, which means you have high expectancy. Given that it is a reward, the value is at least positive, but probably not large. Since habits mostly work on small time scales, delay is probably very small. And maybe increased habit formation means your impulsiveness is low. Each of these effects would increase motivation. In addition, because it's part of System 1, there is little energy cost to performing the habit, like there would be with many other conscious actions.

Does this explanation sound legitimate, or like an argument for the bottom line?

Personally, I can tell that context is a strong cue for behavior at work, school, and home. When I go into work, I'm automatically motivated to perform well, and that motivation remains for several hours. When I go into class, I'm automatically ready to focus on difficult material, or even enthusiastically take a test. Yet when I go home, something about the context switches that off, and I can't seem to get anything done at all. It might be worth significant experimentation to find out what cues trigger both modes, and change my contexts to induce what I want.

What do you think?

Edit: this phenomenon has been covered on LW in the form of operant conditioning in posts by Yvain.

## [LINK] Learning enhancement using "transcranial direct current stimulation"

7 26 January 2012 04:18PM

Article here;

http://www.ox.ac.uk/media/science_blog/brainboosting.html

Recent research in Oxford and elsewhere has shown that one type of brain stimulation in particular, called transcranial direct current stimulation or TDCS, can be used to improve language and maths abilities, memory, problem solving, attention, even movement.

Critically, this is not just helping to restore function in those with impaired abilities. TDCS can be used to enhance healthy people’s mental capacities. Indeed, most of the research so far has been carried out in healthy adults.

The article goes on to discuss the ethics of the technique.

View more: Next