You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

[Link] Chalmers on Computation: A first step From Physics to Metaethics?

0 john_ku 18 November 2014 10:39AM

A Computational Foundation for the Study of Cognition by David Chalmers

Abstract from the paper:

Computation is central to the foundations of modern cognitive science, but its role is controversial. Questions about computation abound: What is it for a physical system to implement a computation? Is computation sufficient for thought? What is the role of computation in a theory of cognition? What is the relation between different sorts of computational theory, such as connectionism and symbolic computation? In this paper I develop a systematic framework that addresses all of these questions.

Justifying the role of computation requires analysis of implementation, the nexus between abstract computations and concrete physical systems. I give such an analysis, based on the idea that a system implements a computation if the causal structure of the system mirrors the formal structure of the computation. This account can be used to justify the central commitments of artificial intelligence and computational cognitive science: the thesis of computational sufficiency, which holds that the right kind of computational structure suffices for the possession of a mind, and the thesis of computational explanation, which holds that computation provides a general framework for the explanation of cognitive processes. The theses are consequences of the facts that (a) computation can specify general patterns of causal organization, and (b) mentality is an organizational invariant, rooted in such patterns. Along the way I answer various challenges to the computationalist position, such as those put forward by Searle. I close by advocating a kind of minimal computationalism, compatible with a very wide variety of empirical approaches to the mind. This allows computation to serve as a true foundation for cognitive science.

See my welcome thread submission for a brief description of how I conceive of this as the first step towards formalizing friendliness.

Why Politics are Important to Less Wrong...

6 OrphanWilde 21 February 2013 04:24PM

...and no, it's not because of potential political impact on its goals.  Although that's also a thing.

The Politics problem is, at its root, about forming a workable set of rules by which society can operate, which society can agree with.

The Friendliness Problem is, at its root, about forming a workable set of values which are acceptable to society.

Politics as a process (I will use "politics" to refer to the process of politics henceforth) doesn't generate values; they're strictly an input, by which the values of society are converted into rules which are intended to maximize them.  While this is true, it is value agnostic; it doesn't care what the values are, or where they come from.  Which is to say, provided you solve the Friendliness Problem, it provides a valuable input into politics.

Politics is also an intelligence.  Not in the "self aware" sense, or even in the "capable of making good judgments" sense, but in the sense of an optimization process.  We're each nodes in this alien intelligence, and we form what looks, to me, suspiciously like a neural network.

The Friendliness Problem is equally applicable to Politics as it is to any other intelligence.  Indeed, provided we can provably solve the Friendliness Problem, we should be capable of creating Friendly Politics.  Friendliness should, in principle, be equally applicable to both.  Now, there are some issues with this - politics is composed of unpredictable hardware, namely, people.  And it may be that the neural architecture is fundamentally incompatible with Friendliness.  But that is discussing the -output- of the process.  Friendliness is first an input, before it can be an output.

More, we already have various political formations, and can assess their Friendliness levels, merely in terms of the values that went -into- them.

Which is where I think politics offers a pretty strong hint to the possibility that the Friendliness Problem has no resolution:

We can't agree on which political formations are more Friendly.  That's what "Politics is the Mindkiller" is all about; our inability to come to an agreement on political matters.  It's not merely a matter of the rules - which is to say, it's not a matter of the output: We can't even come to an agreement about which values should be used to form the rules.

This is why I think political discussion is valuable here, incidentally.  Less Wrong, by and large, has been avoiding the hard problem of Friendliness, by labeling its primary functional outlet in reality as a mindkiller, not to be discussed.

Either we can agree on what constitutes Friendly Politics, or not.  If we can't, I don't see much hope of arriving at a Friendliness solution more broadly.  Friendly to -whom- becomes the question, if it was ever anything else.  Which suggests a division in types of Friendliness; Strong Friendliness, which is a fully generalized set of human values, and acceptable to just about everyone; and Weak Friendliness, which isn't fully generalized, and perhaps acceptable merely to a plurality.  Weak Friendliness survives the political question.  I do not see that Strong Friendliness can.

(Exemplified: When I imagine a Friendly AI, I imagine a hands-off benefactor who permits people to do anything they wish to which won't result in harm to others.  Why, look, a libertarian/libertine dictator.  Does anybody envisage a Friendly AI which doesn't correspond more or less directly with their own political beliefs?)

Evaluating the feasibility of SI's plan

25 JoshuaFox 10 January 2013 08:17AM

(With Kaj Sotala)

SI's current R&D plan seems to go as follows: 

1. Develop the perfect theory.
2. Implement this as a safe, working, Artificial General Intelligence -- and do so before anyone else builds an AGI.

The Singularity Institute is almost the only group working on friendliness theory (although with very few researchers). So, they have the lead on Friendliness. But there is no reason to think that they will be ahead of anyone else on the implementation.

The few AGI designs we can look at today, like OpenCog, are big, messy systems which intentionally attempt to exploit various cognitive dynamics that might combine in unexpected and unanticipated ways, and which have various human-like drives rather than the sort of supergoal-driven, utility-maximizing goal hierarchies that Eliezer talks about, or which a mathematical abstraction like AIXI employs.

A team which is ready to adopt a variety of imperfect heuristic techniques will have a decisive lead on approaches based on pure theory. Without the constraint of safety, one of them will beat SI in the race to AGI. SI cannot ignore this. Real-world, imperfect, safety measures for real-world, imperfect AGIs are needed.  These may involve mechanisms for ensuring that we can avoid undesirable dynamics in heuristic systems,  or AI-boxing toolkits usable in the pre-explosion stage, or something else entirely. 

SI’s hoped-for theory will include a reflexively consistent decision theory, something like a greatly refined Timeless Decision Theory.  It will also describe human value as formally as possible, or at least describe a way to pin it down precisely, something like an improved Coherent Extrapolated Volition.

The hoped-for theory is intended to  provide not only safety features, but also a description of the implementation, as some sort of ideal Bayesian mechanism, a theoretically perfect intelligence.

SIers have said to me that SI's design will have a decisive implementation advantage. The idea is that because strap-on safety can’t work, Friendliness research necessarily involves more fundamental architectural design decisions, which also happen to be general AGI design decisions that some other AGI builder could grab and save themselves a lot of effort. The assumption seems to be that all other designs are based on hopelessly misguided design principles. SI-ers, the idea seems to go, are so smart that they'll  build AGI far before anyone else. Others will succeed only when hardware capabilities allow crude near-brute-force methods to work.

Yet even if the Friendliness theory provides the basis for intelligence, the nitty-gritty of SI’s implementation will still be far away, and will involve real-world heuristics and other compromises.

We can compare SI’s future AI design to AIXI, another mathematically perfect AI formalism (though it has some critical reflexivity issues). Schmidhuber, Hutter, and colleagues think that their AXI can be scaled down into a feasible implementation, and have implemented some toy systems. Similarly, any actual AGI based on SI's future theories will have to stray far from its mathematically perfected origins.

Moreover, SI's future friendliness proof may simply be wrong. Eliezer writes a lot about logical uncertainty, the idea that you must treat even purely mathematical ideas with same probabilistic techniques as any ordinary uncertain belief. He pursues this mostly so that his AI can reason about itself, but the same principle applies to Friendliness proofs as well.

Perhaps Eliezer thinks that a heuristic AGI is absolutely doomed to failure; that a hard takeoff  immediately soon after the creation of the first AGI is so overwhelmingly likely that a mathematically designed AGI is the only one that could stay Friendly. In that case, we have to work on a pure-theory approach, even if it has a low chance of being finished first. Otherwise we'll be dead anyway. If an embryonic AGI will necessarily undergo an intelligence explosion, we have no choice but to "shut up and do the impossible."

I am all in favor of gung-ho knife-between-the teeth projects. But when you think that your strategy is impossible, then you should also look for a strategy which is possible, if only as a fallback. Thinking about safety theory until drops of blood appear on your forehead (as Eliezer puts it, quoting Gene Fowler), is all well and good. But if there is only a 10% chance of achieving 100% safety (not that there really is any such thing), then I'd rather go for a strategy that provides only a 40% promise of safety, but with a 40% chance of achieving it. OpenCog and the like are going to be developed regardless, and probably before SI's own provably friendly AGI. So, even an imperfect safety measure is better than nothing.

If heuristic approaches have a 99% chance of an immediate unfriendly explosion, then that might be wrong. But SI, better than anyone, should know that any intuition-based probability estimate of “99%” really means “70%”. Even if other approaches are long-shots, we should not put all our eggs in one basket. Theoretical perfection and stopgap safety measures can be developed in parallel.

Given what we know about human overconfidence and the general reliability of predictions, the actual outcome will to a large extent be something that none of us ever expected or could have predicted. No matter what happens, progress on safety mechanisms for heuristic AGI will improve our chances if something entirely unexpected happens.

What impossible thing should SI be shutting up and doing? For Eliezer, it’s Friendliness theory. To him, safety for heuristic AGI is impossible, and we shouldn't direct our efforts in that direction. But why shouldn't safety for heuristic AGI be another impossible thing to do?

(Two impossible things before breakfast … and maybe a few more? Eliezer seems to be rebuilding logic, set theory, ontology, epistemology, axiology, decision theory, and more, mostly from scratch. That's a lot of impossibles.)

And even if safety for heuristic AGIs is really impossible for us to figure out now, there is some chance of an extended soft takeoff that will allow for the possibility of us developing heuristic AGIs which will help in figuring out AGI safety, whether because we can use them for our tests, or because they can by applying their embryonic general intelligence to the problem. Goertzel and Pitt have urged this approach.

Yet resources are limited. Perhaps the folks who are actually building their own heuristic AGIs are in a better position than SI to develop safety mechanisms for them, while SI is the only organization which is really working on a formal theory on Friendliness, and so should concentrate on that. It could be better to focus SI's resources on areas in which it has a relative advantage, or which have a greater expected impact.

Even if so, SI should evangelize AGI safety to other researchers, not only as a general principle, but also by offering theoretical insights that may help them as they work on their own safety mechanisms.

In summary:

1. AGI development which is unconstrained by a friendliness requirement is likely to beat a provably-friendly design in a race to implementation, and some effort should be expended on dealing with this scenario.

2. Pursuing a provably-friendly AGI, even if very unlikely to succeed, could still be the right thing to do if it was certain that we’ll have a hard takeoff very soon after the creation of the first AGIs. However, we do not know whether or not this is true.

3. Even the provably friendly design will face real-world compromises and errors in its  implementation, so the implementation will not itself be provably friendly. Thus, safety protections of the sort needed for heuristic design are needed even for a theoretically Friendly design.

Bootstrapping to Friendliness

2 [deleted] 26 April 2012 09:54PM

"All that is necessary for evil to triumph is that good men do nothing."



 

155,000 people are dying, on average, every day.  For those of us who are preference utilitarians, and also believe that a Friendly singularity is possible, and capable of ending this state of affairs, it also puts a great deal of pressure on us.  It doesn't give us leave to be sloppy (because human extinction, even multiplied by a low probability, is a massive negative utility).  But, if we see a way to achieve similar results in a shorter time frame, the cost to human life of not taking it is simply unacceptable.

I have some concerns about CEV on a conceptual level, but I'm leaving those aside for the time being.  My concern is that most of the organizations concerned with a first-mover X-risk are not in a position to be that first mover -- and, furthermore, they're not moving in that direction.  That includes the Singularity Institute.  Trying to operationalize CEV seems like a good way to get an awful lot of smart people bashing their heads against a wall while clever idiots trundle ahead with their own experiments.  I'm not saying that we should be hasty, but I am suggesting that we need to be careful of getting stuck in dark intellectual forests with lots of things that are fun to talk about until an idiot with the tinderbox burns it down.

My point, in short, is that we need to be looking for better ways to do things, and to do them extremely quickly.  We are working on a very, very, existentially tight schedule.    

 

So, if we're looking for quicker paths to a Friendly, first-mover singularity, I'd like to talk about one that seems attractive to me.  Maybe it's a useful idea.  If not, then at least I won't waste any more time thinking about it.  Either way, I'm going to lay it out and you guys can see what you think.  

 

So, Friendliness is a hard problem.  Exactly how hard, we don't know, but a lot of smart people have radically different ideas of how to attack it, and they've all put a lot of thought into it, and that's not a good sign.  However, designing a strongly superhuman AI is also a hard problem.  Probably much harder than a human can solve.  The good news is, we don't expect that we'll have to.  If we can build something just a little bit smarter than we are, we expect that bootstrapping process to take off without obvious limit.

So let's apply the same methodology to Friendliness.  General goal optimizers are tools, after all.  Probably the most powerful tools that have ever existed, for that matter.  Let's say we build something that's not Friendly.  Not something we want running the universe -- but, Friendly enough.  Friendly enough that it's not going to kill us all.  Friendly enough not to succumb to the pedantic genie problem.  Friendly enough we can use it to build what we really want, be it CEV or something else.  

I'm going to sketch out an architecture of what such a system might look like.  Do bear in mind this is just a sketch, and in no way a formal, safe, foolproof design spec.  

So, let's say we have an agent with the ability to convert unstructured data into symbolic relationships that represent the world, with explicitly demarcated levels of abstraction.  Let's say the system has the ability to build Bayesian causal relationships out of its data points over time, and construct efficient, predictive models of the behavior of the concepts in the world.  Let's also say that the system has the ability to take a symbolic representation of a desired future distribution of universes, a symbolic representation of the current universe, and map between them, finding valid chains of causality leading from now to then, probably using a solid decision theory background.  These are all hard problems to solve, but they're the same problems everyone else is solving too.  

This system, if you just specify parameters about the future and turn it loose, is not even a little bit Friendly.  But let's say you do this: first, provide it with a tremendous amount of data, up to and including the entire available internet, if necessary.  Everything it needs to build extremely effective models of human beings, with strongly generalized predictive power.  Then you incorporate one or more of those models (say, a group of trusted people) as a functional components: the system uses them to generalize natural language instructions first into a symbolic graph, and then into something actionable, working out the details of what it meant, rather than what is said.  Then, when the system is finding valid paths of causality, it takes its model of the state of the universe at the end of each course of action, feeds them into its human-models, and gives them a veto vote.  Think of it as the emergency regret button, iterated computationally for each possibility considered by the genie.  Any of them that any of the person-models find unacceptable are disregarded.

(small side note: as described here, the models would probably eventually be indistinguishable from uploaded minds, and would be created, simulated for a short time, and destroyed uncountable trillions of times -- you'd either need to drastically limit the simulation depth of a models, or ensure that everyone who you signed up to be one of the models knew the sacrifice they were making)

So, what you've got, plus or minus some spit and polish, is a very powerful optimization engine that understands what you mean, and disregards obviously unacceptable possibilities.  If you ask it for a truly Friendly AI, it will help you first figure out what you mean by that, then help you build it, then help you formally prove that it's safe.  It would turn itself off if you asked it too, and meant it.  It would also exterminate the human species if you asked it to and meant it.  Not Friendly, but Friendly enough to build something better.  

With this approach, the position of the Friendly AI researcher changes.  Instead of being in an arms race with the rest of the AI field with a massive handicap (having to solve two incredibly hard problems against opponents who only have to solve one), we only have to solve a relatively simpler problem (building a Friendly-enough AI), which we can then instruct to sabotage unFriendly AI projects and buy some time to develop the real deal.  It turns it into a fair fight, one that we might actually win.  




Anyone have any thoughts on this idea?        


Limits on self-optimisation

6 RolfAndreassen 20 January 2012 09:58PM

Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.

Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.

 

Context: FOOM, Friendliness, optimisation processes.

Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for "How do we deal with X" may change in the course of extrapolating; but for plain optimisations the answer cannot change.)

For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to '0' and '1' we must consider that the response to some input can be "does not terminate". The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.

Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs "1" if the two input machines have the same output, and "0" otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate - one that executes an infinite loop - you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.

It seems to follow that you cannot build a fully general proper-optimisation detector.

However, "arbitrary Turing machines" is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.

Friendly to the Extreme: Kids and Adults With Williams Syndrome (Link)

6 XiXiDu 16 June 2011 05:37PM

Williams Syndrome is a rare genetic condition [...].

"Little babies will come up to you, they will stare into your face, and it will be hard to actually disengage from that stare," explained Helen Tager-Flusberg whose lab at Boston University studies the social behavior of children with Williams. "When they're four or five...whether they know you or not, within about five minutes you're their new best friend."

One fascinating study by researchers in Europe found that children with Williams show no racial bias whatsoever. While children, even babies, prefer people of their own race, the neural pathway that imprints for race bias is somehow lacking in children with Williams.

Williams Syndrome is caused by the deletion of roughly 25 genes on chromosome 7. The deletion can occur randomly during the production of a sperm or egg cell. Though there are 20,000 to 25,000 genes in the human genome, even the loss of just 25 genes can have profound effects on a person's physical, behavioral and cognitive make-up.

[...]

In one experiment to test empathy, the adult experimenter bangs her knee on a table and expresses a great deal of pain. In many runs the lab recorded, the typically-developing child just watched and expressed no empathy or concern. But children with Williams often went right over to the experimenter to rub her knee and ask, "What's wrong?"

[...]

What do most children with Williams Syndrome do when presented with the creepy spider? They pet it.

Link: abcnews.go.com/Health/friendly-extreme-meet-kids-adults-williams-syndrome/

Kinect self-awareness-hack (why Friendliness is crucial)

14 ArisKatsaris 25 March 2011 11:15AM

A hilarious sketch about AI from CollegeHumor at http://bit.ly/i96EzL

 

 

The Friendly AI Game

38 bentarm 15 March 2011 04:45PM

At the recent London meet-up someone (I'm afraid I can't remember who) suggested that one might be able to solve the Friendly AI problem by building an AI whose concerns are limited to some small geographical area, and which doesn't give two hoots about what happens outside that area. Cipergoth pointed out that this would probably result in the AI converting the rest of the universe into a factory to make its small area more awesome. In the process, he mentioned that you can make a "fun game" out of figuring out ways in which proposed utility functions for Friendly AIs can go horribly wrong. I propose that we play.

Here's the game: reply to this post with proposed utility functions, stated as formally or, at least, as accurately as you can manage; follow-up comments explain why a super-human intelligence built with that particular utility function would do things that turn out to be hideously undesirable.

There are three reasons I suggest playing this game. In descending order of importance, they are:

  1. It sounds like fun
  2. It might help to convince people that the Friendly AI problem is hard(*).
  3. We might actually come up with something that's better than anything anyone's thought of before, or something where the proof of Friendliness is within grasp - the solutions to difficult mathematical problems often look obvious in hindsight, and it surely can't hurt to try
DISCLAIMER (probably unnecessary, given the audience) - I think it is unlikely that anyone will manage to come up with a formally stated utility function for which none of us can figure out a way in which it could go hideously wrong. However, if they do so, this does NOT constitute a proof of Friendliness and I 100% do not endorse any attempt to implement an AI with said utility function.
(*) I'm slightly worried that it might have the opposite effect, as people build more and more complicated conjunctions of desires to overcome the objections that we've already seen, and start to think the problem comes down to nothing more than writing a long list of special cases but, on balance, I think that's likely to have less of an effect than just seeing how naive suggestions for Friendliness can be hideously broken.

 

Which are the useful areas of AI study?

8 PeterisP 15 January 2011 11:03PM

  I'm stuck wondering on a peculiar question lately - which are the useful areas of AI study? What got me thinking is the opinion occasionally stated (or implied) by Eliezer here that performing general AI research might likely have negative utility, due to indirectly facilitating a chance of unfriendly AI being developed. I've been chewing on the implications of this for quite a while, as acceptance of these arguments would require quite a change in my behavior.

 

Right now I'm about to start my CompSci PhD studies soon, and had initially planned to focus on unsupervised domain-specific knowledge extraction from the internet, as my current research background is mostly with narrow AI issues in computational linguistics, such as machine-learning, formation of concepts and semantics extraction. However, in the last year my expectations of singularity and existential risks of unfriendly AI have lead me to believe that focusing my efforts on Friendly AI concepts would be a more valuable choice; as a few years of studies in the area would increase the chance of me making some positive contribution later on.

 

What is your opinion?

Do studies of general AI topics and research in the area carry a positive or negative utility ? What are the research topics that would be of use to Friendly AI, but still are narrow and shallow enough to make some measurable progress by a single individual/tiny team in the course of a few years of PhD thesis preparation? Are there specific research areas that should be better avoided until more progress has been made on Friendliness research ? 

The value of preserving reality

-1 lukstafi 08 November 2010 11:51PM

A comment to http://singinst.org/blog/2010/10/27/presentation-by-joshua-foxcarl-shulman-at-ecap-2010-super-intelligence-does-not-imply-benevolence/: Given as in the naive reinforcement learning framework (and that can approximate some more complex notions of value) that the value is in the environment, you don't want to be too hasty with the environment lest you destroy a higher value you haven't yet discovered! So you especially wouldn't replace high complexity systems like humans with low entropy systems like computer chips, without first analyzing them.