Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Kaj_Sotala 28 January 2015 07:56:45AM *  3 points [-]

Unfortunately, I'm not an expert in this field, so I can't tell you what the state of the academic discussion looks like now. I get the impression that a number of psychologists have at least partly bought into the BCP paradigm (called Perceptual Control Theory) and have been working on their interests for decades, but it doesn't seem to have swept the field.

At least on a superficial level, the model reminds me somewhat of the hierarchical prediction model, in that both postulate the brain to be composed of nested layers of controllers, each acting on the errors of the earlier layer. (I put together a brief summary of the paper here, though it was mainly intended as notes for myself so it's not as clear as it could be.) Do you have a sense on how similar or different the models are?

Comment author: RyanCarey 15 January 2015 08:59:49PM *  27 points [-]

Excellent news. Considered together with the announcement of AI scientists endorsing a statement in favour of researching how to make AI beneficial, this is the best weeks for AI safety that i can remember.

Taken together with the publication of Superintelligence, founding of FLI and CSER, and teansition of SI into a research organisation MIRI, it's becoming clearer that the last few years have started to usher in a new chapter in AI safety.

I know that machine learning capabilities are also increasing but let's celebrate successes like these!

Comment author: Kaj_Sotala 16 January 2015 06:41:24PM 29 points [-]

Now we can say that we were into AI risk before it was cool.

Comment author: Mark_Friedenbach 12 January 2015 03:10:55AM *  1 point [-]

Well if academic achievement is you goal, don't think my opinion should carry much weight -- I'm an industry engineer (bitcoin developer) that does AI work in my less than copius spare t.ime. I don't know how well respected this work would be in academia. To reiterate my own opinion, I think it is the most important AGI work we could be doing however.

Have you posted to the OpenCog mailing list? You'd find some like-minded academics there who can give you some constructive feedback, including naming potential advisors.

https://groups.google.com/forum/#!forum/opencog

EDIT: Gah I wish I knew about that workshop sooner. I'm going to be in Peurto Rico for the Financial Crypto '15 conference, but I could have swung by on my way. Do you know kanzure from ##hplusroadmap on freenode (Bryan Bishop)? He's a like-minded transhumanist hacker in the Austin area. You should meet up while you're there. He's very knowledgeable on what people are working on, and good at providing connections to help people out.

Comment author: Kaj_Sotala 13 January 2015 06:51:00PM 1 point [-]

Have you posted to the OpenCog mailing list?

Thanks for the suggestion! I'm not on that mailing list (though I used to be), but I sent a Ben Goertzel and another OpenCog guy a copy of the paper. The other guy said he'd read it, but that was all that I heard back. Might post it to the mailing list as well.

Do you know kanzure from ##hplusroadmap on freenode (Bryan Bishop)? He's a like-minded transhumanist hacker in the Austin area. You should meet up while you're there. He's very knowledgeable on what people are working on, and good at providing connections to help people out.

Thanks, I sent him a message. :)

Comment author: Kaj_Sotala 12 January 2015 01:32:37PM 9 points [-]

I'm taking a seminar course taking a computational approach to emotions: there's a very interesting selection of papers linked from that course page that I figure people on LW might be interested in.

The professor's previous similar course covered consciousness, and also had a selection of very interesting papers.

Comment author: Mark_Friedenbach 12 January 2015 04:57:27AM *  1 point [-]

Wow there is a world of assumptions wrapped up in there. For example that the AI has a concept of external agents and an ability to model their internal belief state. That an external agent can have a belief about the world which is wrong. This may sound intuitively obvious, but it’s not a simple thing. This kind of social awareness takes time to be learnt by humans as well. Heinz Wimmer and Josef Perner showed that below a certain age (3-4 years) kids lack an ability to track this information. A teacher puts a toy in a blue cupboard, then leaves the room and you move it to the red cupboard, and the teacher comes back into the room. If you ask the kid not where the toy is, but what cupboard the teacher will look in to find it, and they will say the red cupboard.

It’s no accident that it takes time for this skill to develop. It’s actually quite complex to be able to keep track of and simulate the states of mind of other agents acting in our world. We just take it for granted because we are all well-adjusted adults of a species evolved for social intelligence. But an AI need not think in that way, and indeed of the most interesting use cases for tool AI (“design me a nanofactory constructible with existing tools” or “design a set of experiments organized as a decision tree for accomplishing the SENS research objectives”) would be best accomplished by an idiot savant with no need for social awareness.

I think it goes without saying that obvious AI safety rule #1 is don’t connect an UFAI to the internet. Another obvious rule I think is don’t build in capabilities not required to achieve the things it is tasked with. For the applications of AI I imagine in the pre-singularity timeframe, social intelligence is not a requirement. So when you say “part of its model of the world involves a model of its controllers”, I think that is assuming a capability the AI should not have built-in.

(This is all predicated on soft-enough takeoff that there would be sufficient warning if/when the AI self-developed a social awareness capability.)

Also, what 27chaos said is also worth articulating in my own words. If you want to prevent an intelligent agent from taking a particular category of actions there are two ways of achieving that requirement: (a) have a filter or goal system which prevents the AI from taking (box) or selecting (goal) actions of that type; or (b) prevent it by design from thinking such thoughts to begin with. An AI won’t take actions it never even considered in the first place. While the latter course of action isn’t really possible with unbounded universal inference engines (since “enumerate all possibilities” is usually a step in their construction), such designs arise quite naturally out of more realistic psychology-inspired designs.

Comment author: Kaj_Sotala 12 January 2015 12:25:21PM *  2 points [-]

The approach to AGI safety that you're outlining (keep it as a tool AI, don't give it sophisticated social modeling capability, never give it access to the Internet) is one that I agree should work to keep the AGI safely contained in most cases. But my worry is that this particular approach being safe isn't actually very useful, because there are going to be immense incentives to give the AGI more general capabilities and have it act more autonomously.

As we wrote in Responses to Catastrophic AGI Risk:

As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.

Current narrow-AI technology includes HFT algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge [135]. As a consequence, a trading algorithmʼs performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long.

Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are 'on the loop' rather than 'in the loop'. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.

Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.

In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.

Miller [189] also points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.

Some AGI designers might also choose to create less constrained and more free-acting AGIs for aesthetic or moral reasons, preferring advanced minds to have more freedom.

So while I agree that a strict boxing approach would be sufficient to contain the AGI if everyone were to use it, it only works if everyone were indeed to use it, so what we need is an approach that works for more autonomous systems as well.

If you want to prevent an intelligent agent from taking a particular category of actions there are two ways of achieving that requirement: (a) have a filter or goal system which prevents the AI from taking (box) or selecting (goal) actions of that type; or (b) prevent it by design from thinking such thoughts to begin with. An AI won’t take actions it never even considered in the first place. While the latter course of action isn’t really possible with unbounded universal inference engines (since “enumerate all possibilities” is usually a step in their construction), such designs arise quite naturally out of more realistic psychology-inspired designs.

Hmm. That sounds like a very interesting idea.

Comment author: Kaj_Sotala 11 January 2015 10:03:49PM 19 points [-]

A number of prestigious people from AI and other fields have signed the open letter, including Stuart Russell, Peter Norvig, Eric Horvitz, Ilya Sutskever, several DeepMind folks, Murray Shanahan, Erik Brynjolfsson, Margaret Boden, Martin Rees, Nick Bostrom, Elon Musk, Stephen Hawking, and others. I'd edit that into the OP.

Also, it's worth noting that the research priorities document cites a number of MIRI's papers.

Comment author: coffeespoons 11 January 2015 02:26:26AM *  2 points [-]

It worries me a bit that several young LWers appear to be leaving paid employment to do (presumably?) unpaid work for their partners. What happens if these relationships break down? Are they going to be able to find paid work after a long break from the job market?

Comment author: Kaj_Sotala 11 January 2015 11:44:52AM 5 points [-]

It worries me a bit that several young LWers appear to be leaving paid employment to do (presumably?) unpaid work for their partners.

Name three?

Comment author: Mark_Friedenbach 11 January 2015 05:14:34AM *  0 points [-]

Maybe it would be possible to value-align such a system, but there are a number of reasons why I expect that this would be very difficult. (I would expect it to by default manipulate and deceive the programmers, etc.)

If the AI can deceive you, then it has in principle solved the FAI problem. You simply take the function which tests whether the operator would be disgusted by the plan, and combine it with a Occam preference for simple plans and excessive detail.

It seems like you, lukeprog, EY, and others are arguing that an UFAI will in a matter of time too close to notice, learn enough to build such an human moral judgment predictor that in principle also solves FAI. But you are also arguing that this very FAI sub-problem of value learning is such a ridiculously hard problem that it will take a monumental effort to solve. So which is it?

The AI won’t deceive it’s operators. It doesn’t know how to deceive its operators, and can’t learn how to carry out such deception undetected. If it is built in the human-like model I described previously, it wouldn’t even know deception was an option unless you taught it (thinking within its principles, not about them).

It is simply unfathomable to me how you come to the logical conclusion that an UFAI will automatically and instantly and undetectably work to bypass and subvert its operators. Maybe that’s true of a hypothetical unbounded universal inference engine, like AIXI. But real AIs behave in ways quite different from that extreme, alien hypothetical intelligence.

I agree that we can probably get general intelligence via "adding more gears," but you could have made similar arguments to support using genetic algorithms to develop chess programs: I'm sure you could develop a very strong chess program via a genetic algorithm ("general solutions to chess are not required" / "humans don't build a game tree" / etc.), but I don't expect that that's the shortest nor safest path to superhuman chess programs.

I hope that you have the time at some point to read Engineering General Intelligence. I fear that there is little more we can discuss on this topic until then. The proposed designs and implementation pathways bear little resemblance to "adding more gears" in the sense that you seem to be using the phrase.

Comment author: Kaj_Sotala 11 January 2015 10:33:45AM *  3 points [-]

If the AI can deceive you, then it has in principle solved the FAI problem. You simply take the function which tests whether the operator would be disgusted by the plan, and combine it with a Occam preference for simple plans and excessive detail.

I don't think that this follows: it's easier to predict that someone won't like a plan, than it is to predict what's the plan that would maximally fulfill their values.

For example, I can predict with a very high certainty that the average person on the street would dislike it if I were to shoot them with a gun; but I don't know what kind of a world would maximally fulfill even my values, nor am I even sure of what the question means.

Similarly, an AI might not know what exactly its operators wanted it to do, but it could know that they didn't want it to break out of the box and kill them, for example.

The AI won’t deceive it’s operators. It doesn’t know how to deceive its operators, and can’t learn how to carry out such deception undetected. If it is built in the human-like model I described previously, it wouldn’t even know deception was an option unless you taught it (thinking within its principles, not about them).

This seems like a very strong claim to me.

Suppose that the AI has been programmed to carry out some goal G, and it builds a model of the world for predicting what would be the best way of to achieve G. Part of its model of the world involves a model of its controllers. It notices that there exists a causal chain "if controller becomes aware of intention I, and controller disapproves of intention I, controller will stop AI from carrying out intention I". It doesn't have a full model of the function controller-disapproves(I), but it does develop a plan that it thinks would cause it to achieve G, and which - based on earlier examples - seems to be more similar to the plans which were disapproved of than the plans that were approved of. A straightforward analysis of "how to achieve G" would then imply "prevent the predicate controller-aware(I) from being fulfilled while I'm carrying out the other actions needed for fulfilling G".

This doesn't seem like it would require the AI to be taught the concept of deception, or even to necessarily possess "deception" as a general concept: it only requires that it has a reasonably general capability for reasoning and modeling the world, and that it manages to detect the relevant causal chain.

Comment author: Mark_Friedenbach 11 January 2015 05:19:44AM 1 point [-]

Kaj, sorry for the delay. I was on vacation and read your proposal on my phone, but a small touch screen keyboard wasn’t the ideal mechanism to type a response.

This is the type of research I wish MIRI was spending at least half it’s money on.

The mechanisms of concept generation are extremely critical to human morality. For most people most of the time, decisions about whether to pursue a course of action are not made based on whether it is morally justified or not. Indeed we collectively spend a good deal of time and money on workforce training to make sure people in decision making roles consciously think about these things, something which wouldn’t be necessary at all if this was how we naturally operate.

No, we do not tend to naturally think about our principles. Rather, we think within them. Our moral principles are a meta abstraction of our mental structure, which itself guides concept generation such that the things we choose to do comply with our moral principles because -- most of the time -- only compliant possibilities were generated and considered in the first place. Understanding how this concept generation would occur in a real human-like AGI is critical to understanding how value learning or value loading might actually occur in a real design.

We might even find out that we can create a sufficiently human-like intelligence that we can have it learn morality in the same way we do -- by instilling a relatively small number of embodied instincts/drives, and placing it in a protected learning environment with loving caretakers and patient teachers. Certainly this is what the OpenCog foundation would like to do.

Did you submit this as a research proposal somewhere? Did you get a response yet?

Comment author: Kaj_Sotala 11 January 2015 10:11:30AM 1 point [-]

Glad to hear that!

Did you submit this as a research proposal somewhere? Did you get a response yet?

I submitted it as a paper to the upcoming AI and Ethics workshop, where it was accepted to be presented in a poster session. I'm not yet sure of the follow-up: I'm currently trying to decide what to do with my life after I graduate with my MSc, and one of the potential paths would involve doing a PhD and developing the research program described in the paper, but I'm not yet entirely sure of whether I'll follow that path.

Part of what will affect my decision is how useful people feel that this line of research would be, so I appreciate getting your opinion. I hope to gather more data points at the workshop.

Comment author: bentarm 04 January 2015 03:04:02PM 5 points [-]

Serious question - why do you (either CFAR as an organisation or Anna in particular) think in-person workshops are more effective than, eg, writing a book, or making a mooc-style series of online lessons for teaching this stuff? Is it actually more about network building than the content of the workshops themselves? Do you not understand how to teach well enough to be able to do it in video format? Videos are inherently less profitable?

Comment author: Kaj_Sotala 07 January 2015 02:15:10PM 5 points [-]

I don't speak for CFAR, but I believe that they wish to develop their product further before actually taking the time to write extensively about it, because the techniques are still being under active development and there's no point in writing a lot about something that may change drastically the next day.

It's also true that a large part of the benefit of the workshops comes from interacting with other participants and instructors and getting immediate feedback, as well as from becoming a part of the community.

View more: Next