You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

[Link] Chalmers on Computation: A first step From Physics to Metaethics?

0 john_ku 18 November 2014 10:39AM

A Computational Foundation for the Study of Cognition by David Chalmers

Abstract from the paper:

Computation is central to the foundations of modern cognitive science, but its role is controversial. Questions about computation abound: What is it for a physical system to implement a computation? Is computation sufficient for thought? What is the role of computation in a theory of cognition? What is the relation between different sorts of computational theory, such as connectionism and symbolic computation? In this paper I develop a systematic framework that addresses all of these questions.

Justifying the role of computation requires analysis of implementation, the nexus between abstract computations and concrete physical systems. I give such an analysis, based on the idea that a system implements a computation if the causal structure of the system mirrors the formal structure of the computation. This account can be used to justify the central commitments of artificial intelligence and computational cognitive science: the thesis of computational sufficiency, which holds that the right kind of computational structure suffices for the possession of a mind, and the thesis of computational explanation, which holds that computation provides a general framework for the explanation of cognitive processes. The theses are consequences of the facts that (a) computation can specify general patterns of causal organization, and (b) mentality is an organizational invariant, rooted in such patterns. Along the way I answer various challenges to the computationalist position, such as those put forward by Searle. I close by advocating a kind of minimal computationalism, compatible with a very wide variety of empirical approaches to the mind. This allows computation to serve as a true foundation for cognitive science.

See my welcome thread submission for a brief description of how I conceive of this as the first step towards formalizing friendliness.

Truth Tables

-7 johnlawrenceaspden 12 December 2013 10:16PM

Summary: There's a short program that can run all possible programs, OMG

Alternative Summary: Just about any universe that can exist must contain ours, OMG.

I was thinking about diagonalization arguments. Does this make any sense? Can anyone debug it for me?

Self Evident Truths


The universe is computable.

All computations can be performed by Turing Machines.

The mind is made out of atoms.

The name of the empty string is epsilon.

Truths


Binary Strings are enumerable : epsilon, 0, 1, 00, 01, 10, 11, 000, ....

There's a way of converting binary strings to Turing machines and back again. It gets all Turing machines.

Consequence


Consider tables TT(n), which are numbered by binary strings along the top and side, representing turing machines and inputs respectively.

Construct a series of finite triangular tables which are defined iteratively as follows:

TT(1)

To construct the first table, we need one element, the top left, which corresponds to the turing machine epsilon working on the input string epsilon.

Convert the string epsilon to its corresponding turing machine, which has no states and no transition rules, and which therefore halts immediately without accepting.


The value of TT(1)(epsilon,epsilon) is therefore F0 (Fail after no steps).

      eps
eps F0

That's it for TT(1).

 

TT(2)

TT(2) will have three cells, the topleftmost three

To construct the second table, consider again the element at the top left. Since it represents a halted state, copy it verbatim from the last table.


Then consider the element corresponding to (epsilon,0) , or  row epsilon, column 0, or (TM(epsilon), "0")

Again TM(epsilon) is either a machine with no states, or an invalid specification, and so it halts immediately on the input 0.

For the third step in constructing the second triangle, consider (TM("0"), epsilon).

TM("0") is another dud. It halts immediately without accepting.

So TT(2) is the table

      eps 0
eps F0  F0
0    F0

 

Towards Infinity...

It should be reasonably clear how to continue the construction of these tables

For instance TT(3) looks like

     eps 0  1
eps  F0  F0 F0
0    F0  F0
1    F0 


Eventually we will reach a row whose string represents a TM which does something other than halt immediately without accepting.

As an example of what to do then, consider the string 1001111, which is the 207th string.

In the usual encoding, this string will represent the TM with start state 0, and transition function delta(0,B)=(0,0,L).

The first time we consider the 207th row will be when we calculate TT(207), the 207th triangular table.

The first cell we consider in that row will be ("1001111", epsilon). Since 1001111 represents a valid machine, we construct the Instantaneous ID (epsilon,0,epsilon), which represents a machine in state 0 with a blank tape.

That's the value of TT(207)("1001111",epsilon).

When we construct TT(208), we take that ID from TT(207), and execute one step. In this case, the machine head moves one step left, writing a zero, and so the instantaneous ID becomes
(epsilon,0,0) (State zero, tape reads ...0...., head just to the left of the zero)

And so on. The values of TT(n)("1001111",epsilon) are undefined for n<206, since the tables are not that large, but for n=207 onwards, they are:
(epsilon,0,epsilon), (epsilon,0,0), (epsilon,0,00), (epsilon,0,000), and so on, with the tape head moving ever leftward, leaving a run of zeros behind it.

Other strings may represent TMs that do things that are even more interesting.


But Not Beyond, (Or Even As Far As)

 

As we construct the successive tables, they become larger and larger.  Some cells
will stabilize after a finite number of steps, either accepting or
rejecting their input strings, at which point their contents become
either F??? or A??? where ??? is the number of steps taken.

Some cells will loop. By comparing the instantaneous state of the table with the state of the cells in all previous tables, loops can be detected. We can mark them as loops, continue computing as before, or re-use the values in the previous tables to avoid performing the computations again.

And some cells, like the ones at ("1001111", epsilon)  will keep producing new instantaneous IDs, without looping.

But every TT(n) is a finitely computable object. Indeed the program to compute them all is very short and will run on any computer worthy of the name.

I Find This A Bit Worrying, Because:


As we continue to construct the successive tables, we will perform every conceivable computation.

We will simulate in precise detail every possible world, universe and multiverse. Even though our computer is not quantum, we will simulate all quantum computations.

In particular, if you believe that you are a computation, or that simulation of your brain is equivalent to your existence, then you will be present in the computation, with exactly as much free will, and with your behaviour as precisely determined, as it ever was or will be.

Some of you will live in universes in which artificial intelligences rise and successfully paperclip everything.

Some of you will live in universes where friendly AIs are built, if that is possible.

It will not be possible for these copies to tell which copy they are, and so they will not be able to tell what is about to happen, or what has happened. Any more than you can.

There will be hells and paradises.

In some universes, copies of you will set programs running to calculate the successive triangular tables TT(n), and they will keep adding memory to their computers as needed (only actually one extra storage location per step of computation, at worst).

And so the sequence of finite truth tables will contain itself, as well as everything else. Everything that has ever happened will happen again. You will be reborn and live and possibly die.

You will not know whether you are in the 'base universe', or in the computation. If that even means anything.

And every computation that occurs as part of this great computation is utterly "Beyond the Reach of God".

Exercises


1/ It will take me about a day, a packet of cigars and a machine full of coffee to write this program and start it running. That is what I am doing now. When I start it running, will I have done a bad thing? If someone were to stop me before the program started running, would that make any difference to anything important?

2/ Can anything except what is computed by this program be said to exist in any sense? Continua, and sets of all sets, and so on, are very problematic. And if the human mind is itself a computation , contained in a universe which itself is a computation, how can we think of or interact with any non-computable thing?

3/ Does the ultimate Truth Table, which the finite tables TT(n) approach as the process continues, exist? What does that question mean? The values of many of its cells are determined. Many of them are not computable. The ratio is unclear.

4/ We can perform the initial stages of this computation with a finite computer with finite memory. At no point does the amount of memory required become infinite. If the computational power of the universe is infinite, then it can contain not only itself but every other thing. If the computation power of the universe is finite, where does that number come from?

5/ The program is very short. Any randomly chosen computation has a good chance of being it. It would probably be very hard to construct an interesting universe which did not contain every possible universe and person.

6/ Does it make any sense to talk about 'not being part of this computation'?

Computation Hazards

14 Alex_Altair 13 June 2012 09:49PM
This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the "simulated" people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.

Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.

Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer's mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part I" Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., "A Formal Theory of Inductive Inference, Part II" Information and Control, Vol 7, No. 2 pp 224-254, June 1964.

2011 Buhl Lecture, Scott Aaronson on Quantum Complexity

8 p4wnc6 09 July 2011 04:43AM

I was planning to post this in the main area, but my thoughts are significantly less well-formed than I thought they were. Anyway, I hope that interested parties find it nonetheless.

In the Carnegie Mellon 2011 Buhl Lecture, Scott Aaronson gives a remarkably clear and concise review of P, NP, other fundamentals in complexity theory, and their quantum extensions. In particular, beginning around the 46 minute mark, a sequence of examples is given in which the intuition from computability theory would have accurately predicted physical results (and in some cases this actually happened, so it wasn't just hindsight bias). 

In previous posts we have learned about Einstein's arrogance and Einstein's speed. This pattern of results flowing from computational complexity to physical predictions seems odd to me in that context. Here we are using physical computers to derive abstractions about the limits of computation, and from there we are successfully able to intuit limits of physical computation (e.g. brains computing abstractions of the fundamental limits of brains computing abstractions...) At what point do we hit the stage where individual scientists can rationally know that results from computational complexity theory are more fundamental than traditional physics? It seems like a paradox wholly different than Einstein rationally knowing (from examining bits of theory-space evidence rather than traditional-experiment-space evidence) that relativity would hold true. In what sort of evidence space can physical brain computation yielding complexity limits count as bits of evidence factoring into expected physical outcomes (such as the exponential smallness of the spectral gap of NP-hard-Hamiltonians from the quantum adiabatic theorem)?

Maybe some contributors more well-versed in complexity theory can steer this in a useful direction.