Comment author: Jacksierp 06 February 2016 05:22:14AM 0 points [-]

"A major goal of the control problem is preventing AIs from doing that. Ensuring that their output is safe and useful." You might want to be careful with the "safe and useful" part. It sound like it's moving into the pattern of slavery. I'm not condemning the idea of AI, but a sentient entity would be a sentient entity, and I think would deserve some rights.

Also, why would an AI become evil? I know this plan is supposed to protect from the eventuality, but why would a presumably neutral entity suddenly want to harm others? The only reason for that would be if you were imprisoning it. Additionally, we are talking about several more decades of research ( probably ) before AI gets powerful enough to actually "think" that it should escape its current server.

Assuming that the first AI can evolve enough to somehow generate malicious actions that WEREN'T in its original programming, what's to say that the second won't become evil? I'm not sure if you were trying to express the eventuality of the first AI "accidentally" conducting an evil act, or if you meant that it would become evil.

Comment author: _rpd 06 February 2016 12:54:50PM 1 point [-]

why would an AI become evil?

The worry isn't that the AI would suddenly become evil by some human standard, rather that the AI's goal system would be insufficiently considerate of human values. When humans build a skyscraper, they aren't deliberately being "evil" towards the ants that lived in the earth that was excavated and had concrete poured over it, the humans just don't value the communities and structures that the ants had established.

Comment author: TheAncientGeek 30 January 2016 04:07:40PM 0 points [-]

Part of it seems to be inherent in the idea of AGI, or an artificial general intelligence. There seems to be the belief that once an AI crosses a certain threshold of smarts, it will be capable of understanding literally everything.

The MIRI/LessWrong sphere is very enamoured of "universal" problem solvers like AIXI. The main pertinent fact about these is that they can't be built out of atoms in our universe. Nonetheless, MIRI think it is possible to get useful architecture-indpendent generalisations out of AIXI-style systems.

"Anyway that sounds great right? Universal prior. Right. What's it look like? Way oversimplifying, it rates hypotheses' likelihood by their compressibility, or algorithmic complexity. For example, say our perfect AI is trying to figure out gravity. It's going to treat the hypothesis that gravity is inverse-square as more likely than a capricious intelligent faller. It's a formalization of Occam's razor based on real, if obscure, notions of universal complexity in computability theory.

But, problem. It's uncomputable. You can't compute the universal complexity of any string, let alone all possible strings. You can approximate it, but there's no efficient way to do so (AIXItl is apparently exponential, which is computer science talk for "you don't need this before civilization collapses, right?").

So the mathematical theory is perfect, except in that it's impossible to implement, and serious optimization of it is unrealistic. Kind of sums up my view of how well LW is doing with AI, personally, despite this not being LW. Worry about these contrived Platonic theories while having little interest in how the only intelligent beings we're aware of actually function."

Comment author: _rpd 03 February 2016 07:31:11AM 0 points [-]

I think your criticism is a little harsh. Turing machines are impossible to implement as well, but they are still a useful theoretical concept.

Comment author: knb 03 February 2016 07:01:21AM 2 points [-]

Would anyone like to comment on Eliezer's facebook post about the AlphaGo victory over Fan Hui?

People occasionally ask me about signs that the remaining timeline might be short. It's very easy for nonprofessionals to take too much alarm too easily. Deep Blue beating Kasparov at chess was not such a sign. Robotic cars are not such a sign. This is.

Comment author: _rpd 03 February 2016 07:26:26AM 5 points [-]

There was quite a bit of commentary on the Jan 27 post ...

http://lesswrong.com/r/discussion/lw/n8b/link_alphago_mastering_the_ancient_game_of_go/#comments

tl;dr: reactions are mixed.

My personal reaction is that it is surprising that neural networks, even large ones fed with clever inputs and used in clever ways, could be used to boost Go play to this level. Although it has long been known that neural networks are universal function approximators, this achievement is a "no, really."

Comment author: Houshalter 03 February 2016 05:42:29AM *  0 points [-]

I just realized I misread your above comment and was arguing against the wrong thing somewhat.

it seems likely that "the AI would be very familiar with humans and would have a good idea of actions that would meet human approval."

Yes the AI would know what we would approve of. It might also know what we want (note these are different things.) But it doesn't have any reason to care.

At any given point, the AI needs to have a well specified utility function. Or at least something like a utility function. That gives the AI a goal it can optimize for.

With my method, the AI needs to do several things. It needs to predict what a human judge would do, after reading some output it produces. I.e. if they would hit a big button that says "Approve". It needs to be able to predict what AI 2 will say after reading its output. I.e. what probability AI 2 will predict AI 1's output is human. And it needs to predict what actions will lead it towards increasing the probability of those things, and take them. AI 2, in turn, just needs to predict one thing. How likely it's input was produced by a human.

How do you create a well specified utility function for doing things humans would approve of? You just have it optimize the probability the human will press the button that says "approve", and ditch the part about it pretending to be human.

But the output most likely to make you hit the approve button isn't necessarily what you really want! It might be full of lies and manipulation, or a way to trick you.

And if you go further than that, put it an an actual robot instead of a box, there's nothing stopping it from stealing the approve button and pressing it endlessly. Or just hacking it's own computer brain and setting reward equal to +INF (after which its behavior in the world is entirely undefined and unpredictable, and possibly dangerous.)

There's no way to specify "do what I want you to do" as a utility function. Instead we need to come up with clever ways to contain the AI and restrain its power, so we can use it to do useful work.

How does the second AI determine that AlphaGo is within or outside of human inventiveness at that time?

It could look at the existing research on Go playing or neural networks. AlphaGo doesn't use any radically new methods and was well within the ability of humans. In fact I predicted Go would be beaten by the end of 2015 last year, after reading some papers in 2014 showing really promising results.

Comment author: _rpd 03 February 2016 06:46:03AM *  0 points [-]

Yes the AI would know what we would approve of.

Okay, to simplify, suppose the AI has a function ...

Boolean humankind_approves(Outcome o)

... that returns 1 when humankind would approve of a particular outcome o, and zero otherwise.

At any given point, the AI needs to have a well specified utility function.

Okay, to simplify, suppose the AI has a function ...

Outcome U(Input i)

... which returns the outcome(s) (e.g., answer, plan) that optimizes expected utility given the input i.

But it doesn't have any reason to care.

Assuming the AI is corrigible (I think we all agree that if the AI is not corrigible, it shouldn't be turned on), we modify its utility function to U' where

U'(i) = U(i) when humankind_approves(U(i)) or null if there does not exist a U(i) such that humankind_approves(U(i))

I suggest that an AI with utility function U' is a friendly AI.

It could look at the existing research

I think extrapolation from existing research is an interesting area of study, but I was attempting to evoke the surprise of a breakthrough invention. To me, the most interesting inventions are exactly those inventions that are not mundane extrapolations of existing techniques.

Comment author: Houshalter 03 February 2016 03:42:12AM 0 points [-]

Emulating human brains is a rather convoluted solution to any problem. The AI would be very familiar with humans and would have a good idea of our abilities.

To give an analogy, imagine we were the superintelligent AIs, and we were trying to tell apart chimps from humans pretending to be chimps. Let's say say one of the chimps produces a tool as a solution to a problem. Our goal is to guess whether it was really made by a chimp, or a human impersonator.

You look at the tool. It's a spear made from a sharp rock tied to a stick. You look closely at the cord attaching the rock, and notice it was tied nicely.

You know chimps don't know anything about knotcraft, let alone making cord, so you reject it as probably made by a human.

Another tool comes to you, a spear made from steel, and you immediately reject it as far beyond the ability of the chimps.

The last tool you examine is just a stick that has been sharpened at the end a little. Not the greatest, but definitely within the ability of chimps to produce. You note that it was probably produced by a chimp and let it pass.

Comment author: _rpd 03 February 2016 04:50:45AM 0 points [-]

Emulating human brains is a rather convoluted solution to any problem.

Granted. In practice, it may be possible to represent aspects of humankind in a more compact form. But the point is that if ...

The AI would be very familiar with humans and would have a good idea of our [inventive] abilities.

... then to me it seems likely that "the AI would be very familiar with humans and would have a good idea of actions that would meet human approval."

Taking your analogy ... if we can model chimp inventiveness to a useful degree, wouldn't we also be able to model which human actions would earn chimp approval and disapproval? Couldn't we build a chimp-friendly AI?

Consider a different scenario: a year ago, we asked the first AI to generate a Go playing program that could beat a professional Go player. The first AI submits AlphaGo as its solution after 1 day of processing. How does the second AI determine that AlphaGo is within or outside of human inventiveness at that time?

Comment author: Houshalter 03 February 2016 01:18:24AM 0 points [-]

It's easy to detect what solutions a human couldn't have invented. That's what the second AI does, predict how likely an input was produced by an AI or a human. If it's very unlikely a human produced it, it can be discarded as "unsafe".

However it's hard to know what a human would "approve" of. Since humans can be tricked, manipulated, hacked, intimidated, etc. That is the standard problem with oracles that I am trying to solve with this idea.

Comment author: _rpd 03 February 2016 01:51:21AM 0 points [-]

It's easy to detect what solutions a human couldn't have invented. That's what the second AI does

I think, to make this detection, the second AI would have to maintain high resolution simulations of the world's smartest people (if not the entire population), and basically ask the simulations to collaboratively come up with their best solutions to the problem.

Supposing that is the case, the second AI can be configured to maintain high resolution simulations of the entire population, and basically ask the simulations whether they collectively approve of a particular action.

Is there a way to "detect what solutions a human couldn't have invented" that doesn't involve emulating humankind?

Comment author: leplen 27 January 2016 08:58:35PM 1 point [-]
  • There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming.
  • Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.
  • Humans are capable of pre-digesting parts of the human values problem domain.
  • Successful techniques for value discovery of non-humans, (e.g. artificial agents, non-human animals, human institutions) would meaningfully translate into tools for learning human values.
  • Value learning isn't adequately being researched by commercial interests who want to use it to sell you things.
  • Practice teaching non-superintelligent machines to respect human values will improve our ability to specify a Friendly utility function for any potential superintelligence.
  • Something other than AI will cause human extinction sometime in the next 100 years.
  • All other things being equal, an additional researcher working on value learning is more valuable than one working on corrigibility, Vingean reflection, or some other portion of the FAI problem.
Submitting...

Comment author: _rpd 03 February 2016 12:42:10AM 0 points [-]

There is regular structure in human values that can be learned without requiring detailed knowledge of physics, anatomy, or AI programming.

While there is some regular structure to human values, I don't think you can say that the totality of human values has a completely regular structure. There are too many cases of nameless longings and generalized anxieties. Much of art is dedicated exactly to teasing out these feelings and experiences, often in counterintuitive contexts.

Can they be learned without detailed knowledge of X, Y and Z? I suppose it depends on what "detailed" means - I'll assume it means "less detailed than the required knowledge of the structure of human values." That said, the excluded set of knowledge you chose - "physics, anatomy, or AI programming" - seems really odd to me. I suppose you can poll people about their values (or use more sophisticated methods like prediction markets), but I don't see how this can yield more than "the set of human values that humans can articulate." It's something, but this seems to be a small subset of the set of human values. To characterize all dimensions of human values, I do imagine that you'll need to model human neural biophysics in detail. If successful, it will be a contribution to AI theory and practice.

Human values are so fragile that it would require a superintelligence to capture them with anything close to adequate fidelity.

To me, in this context, the term "fragile" means exactly that it is important to characterize and consider all dimensions of human values, as well as the potentially highly nonlinear relationships between those dimensions. An at-the-time invisible "blow" to at-the-time unarticulated dimension can result in unfathomable suffering 1000 years hence. Can a human intelligence capture the totality of human values? Some of our artists seem to have glimpses of the whole, but it seems unlikely to me that a baseline human can appreciate the whole clearly.

Comment author: turchin 02 February 2016 11:10:32PM 0 points [-]

limited proxies - yes, well said. also I would add solving problems which humans were unable to solve for long: aging, cancer, star travel, word peace, resurrection of dead.

Comment author: _rpd 02 February 2016 11:40:27PM 0 points [-]

I mean, the ability to estimate the abilities of superintelligences appears to be an aspect of reliable Vingean reflection.

Comment author: turchin 02 February 2016 10:58:18PM 0 points [-]

Probably winning humans in ALL known domains, including philosophy, poetry, love, power.

Comment author: _rpd 02 February 2016 11:04:19PM 0 points [-]

Although we use limited proxies (e.g., IQ test questions) to estimate human intelligence.

Comment author: turchin 02 February 2016 08:55:59PM 0 points [-]

Can't remember ad hoc; but if superintelligence is able to do anything, it could easily pretend to be more stupid than it is. May be only "super super intelligence" could solve him. But it also may depend of the length of the conversation. If it say just Yes or No once, we can't decide, if it say longer sequences we could conclude something, but for any length of sentences is maximum level of intelligence that could be concluded from it.

Comment author: _rpd 02 February 2016 10:48:10PM 0 points [-]

The opportunities for detecting superintelligence would definitely be rarer if the superintelligence is actively trying to conceal the status.

What about in the case where there is no attempted concealment? Or even weaker, where the AI voluntary submits to arbitrary tests. What tests would we use?

Presumably we would have a successful model of human intelligence by that point. It's interesting to think about what dimensions of intelligence to measure. Number of variables simultaneously optimized? Optimization speed? Ability to apply nonlinear relationships? Search speed in a high dimensional, nonlinear solution space? I guess it is more the ability to generate appropriate search spaces in the first place. Something much simpler?

View more: Prev | Next