Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

Comment author: Wei_Dai 11 August 2017 03:40:18AM 2 points [-]

The Chinese characters 异化 literally mean "different/abnormal" and "transform". The combination is typically used in China to refer to Marx's theory of alienation which is probably why it was translated that way. I'm not sure what the writer intended by putting those characters next to characters for "robot". Googling for the combination of characters together doesn't give many results besides this AI development plan. If I had to guess, I think maybe they mean robots developing alien/unaligned values.

Comment author: capybaralet 11 August 2017 04:40:23AM 0 points [-]

a FB friend of mine speculated that this was referring to alienation resulting from ppl losing their jobs to robots... shrug

Comment author: Wei_Dai 10 August 2017 09:20:34AM *  17 points [-]

The article links to a translation of the full Chinese document, but I've noticed some errors and awkwardness in the translation, so I decided to do my own, just for the part that deals with AI safety. (The main error is that in Chinese "safety" and "security" are the same word, and for this section of the document, they always translate it to "security" instead of trying to figure out what's appropriate based on context.)

Establish an AI safety supervision and evaluation system

Strengthen research and evaluation of the impact of AI on national security and secrecy protection; improve the safety and security protection system, with mutually supporting human, technological, material, and organizational elements; and construct an AI safety monitoring and early warning mechanism. Strengthen the prediction, assessment, and follow-up research of AI technology; maintain a problem-focused orientation [standard political slogan]; accurately apprehend technological and industrial trends. Enhance risk awareness, emphasize risk assessment and management, and strengthen prevention, guidance, and regulation, with a short-term focus on impacts on employment and a long-term focus on impacts on social ethics, to ensure that AI development remains safe and controllable. Establish a robust, open, and transparent AI regulatory system, with a two-tiered structure of design accountability and application monitoring, thus realizing oversight of the entire process of algorithm design, product development, and deployment. Promote industry and enterprise self-regulation; effectively strengthen control; increase disciplinary efforts aimed at the abuse of data, violations of personal privacy, and actions contrary to morality and ethics. Strengthen research and development of AI cybersecurity technologies, and improve the cybersecurity protection of AI products and systems. Establish dynamic AI research and development evaluation mechanisms; develop systematic testing methods and benchmark systems for complexity, risk, uncertainty, interpretability, potential economic impact, and other AI issues; construct a cross-domain AI test platform to promote AI safety certification and assessment of capability/performance of AI products and systems.

ETA: There's another part of the document that's also relevant for AI safety. I'll just copy from the linked full translation for this part:

Develop laws, regulations, and ethical norms that promote the development of AI

Strengthen research on legal, ethical, and social issues related to AI, and establish laws, regulations and ethical frameworks to ensure the healthy development of AI. Conduct research on legal issues such as civil and criminal responsibility confirmation, proteciton of privacy and property, and information security utilization related to AI applications. Establish a traceability and accountability system, and clarify the main body of AI and related rights, obligations, and responsibilities. Focus on autonomous driving, service robots, and other application subsectors with a comparatively good usage foundation, and speed up the study and development of relevant safety management laws and regulations, to lay a legal foundation for the rapid application of new technology. Launch research on AI behavior science and ethics and other issues, establish an ethical and moral multi-level judgment structure and human-computer collaboration ethical framework. Develop an ethical code of conduct and R&D design for AI products, strengthen the assessment of the potential hazards and benefits of AI, and build solutions for emergencies in complex AI scenarios. China will actively participate in global governance of AI, strengthen the study of major international common problems such as robot alienation and safety supervision, deepen international cooperation on AI laws and regulations, international rules and so on, and jointly cope with global challenges.

Comment author: capybaralet 10 August 2017 10:26:05PM 0 points [-]

What is "robot alienation"?

Comment author: Jonii 23 July 2009 08:12:00AM 6 points [-]

there can just as easily be a superintelligence that rewards people predicted to act one way as one that rewards people predicted to act the other.

Yeah, now. But after Omega really, really, appears in front of you, chance of Omega existing is about 1. Chance of No-Mega is still almost non-existent. In this problem, existence of Omega is given. It's not something you are expecting to encounter now, just as we're not expecting to encounter eccentric Kavkan billionaires that will give you money for toxicating yourself. The Kavka's Toxin and the counterfactual mugging present a scenario that is given, and ask you how would you act then.

Comment author: capybaralet 30 January 2017 06:14:17PM 1 point [-]

But you aren't supposed to be updating... the essence of UDT, I believe, is that your policy should be set NOW, and NEVER UPDATED.

So... either: 1. You consider the choice of policy based on the prior where you DIDN'T KNOW whether you'd face Nomega or Omega, and NEVER UPDATE IT (this seems obviously wrong to me: why are you using your old prior instead of your current posterior?). or 2. You consider the choice of policy based on the prior where you KNOW that you are facing Omega AND that the coin is tails, in which case paying Omega only loses you money.

Comment author: Caspian 05 April 2009 05:18:44AM 26 points [-]

The counterfactual anti-mugging: One day No-mega appears. No-mega is completely trustworthy etc. No-mega describes the counterfactual mugging to you, and predicts what you would have done in that situation not having met No-mega, if Omega had asked you for $100.

If you would have given Omega the $100, No-mega gives you nothing. If you would not have given Omega $100, No-mega gives you $10000. No-mega doesn't ask you any questions or offer you any choices. Do you get the money? Would an ideal rationalist get the money?

Okay, next scenario: you have a magic box with a number p inscribed on it. When you open it, either No-mega comes out (probability p) and performs a counterfactual anti-mugging, or Omega comes out (probability 1-p), flips a fair coin and proceeds to either ask for $100, give you $10000, or give you nothing, as in the counterfactual mugging.

Before you open the box, you have a chance to precommit. What do you do?

Comment author: capybaralet 30 January 2017 06:08:34PM *  0 points [-]

Thanks for pointing that out. The answer is, as expected, a function of p. So I now find explanations of why UDT gets mugged incomplete and misleading.

Here's my analysis:

The action set is {give, don't give}, which I'll identify with {1, 0}. Now, the possible deterministic policies are simply every mapping from {N,O} --> {1,0}, of which there are 4.

We can disregard the policies for which pi(N) = 1, since giving money to Nomega serves no purpose. So we're left with




which give/don't, respectively, to Omega.

Now, we can easily compute expected value, as follows:

r (pi_give(N)) = 0

r (pi_give(O, heads)) = 10

r (pi_give(0, tails)) = -1

r (pi_don't(N)) = 10

r (pi_don't(0)) = 0

So now:

Eg := E_give(r) = 0 * p + .5 * (10-1) * (1-p)

Ed := E_don't(r) = 10 * p + 0 * (1-p)

Eg > Ed whenever 4.5 * (1-p) > 10 * p,

i.e. whenever 4.5 > 14.5 p

i.e. whenever 9/29 > p

So, whether you should precommit to being mugged depends on how likely you are to encounter N vs. O, which is intuitively obvious.

Comment author: capybaralet 05 January 2017 04:56:37AM 3 points [-]

Looking at what they've produced to date, I don't really expect MIRI and CHCAI to produce that similar of work. I expect Russell's group to be more focused on value learning an corrigibility vs. reliable agent designs (MIRI).

Comment author: capybaralet 26 September 2016 10:48:41PM *  1 point [-]

Does anyone have any insight into VoI plays with Bayesian reasoning?

At a glance, it looks like the VoI is usually not considered from a Bayesian viewpoint, as it is here. For instance, wikipedia says:

""" A special case is when the decision-maker is risk neutral where VoC can be simply computed as; VoC = "value of decision situation with perfect information" - "value of current decision situation" """

From the perspective of avoiding wireheading, an agent should be incentivized to gain information even when this information decreases its (subjective) "value of decision situation". For example, consider a bernoulli 2-armed bandit:

If the agent's prior over the arms is uniform over [0,1], so its current value is .5 (playing arm1), but after many observations, it learns that (with high confidence) arm1 has reward of .1 and arm2 has reward of .2, it should be glad to know this (so it can change to the optimal policy, of playing arm2), BUT the subjective value of this decision situation is less than when it was ignorant, because .2 < .5.

Problems with learning values from observation

0 capybaralet 21 September 2016 12:40AM

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn't allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person's brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone's Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

I have a picture of value learning where the AI learns via observation (since we don't want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.

Comment author: WhySpace 28 August 2016 03:44:40AM 1 point [-]

I actually brought up a similar question in the open thread, but it didn't really go very far. May or may not be worth reading, but it's still not clear to me whether such a thing is even practical. It's likely that all substantially easier AIs are too far from FAI to still be a net good.

I've come a little closer to answering my questions by stumbling on this Future of Humanity Institute video on "Reduced Impact AI". Apparently that's the technical term for it. I haven't had a chance to look for papers on the subject, but perhaps some exist. No hits on google scholar, but a quick search shows a couple mentions on LW and MIRI's website.

Comment author: capybaralet 30 August 2016 12:22:45AM 0 points [-]

It seems like most people think that reduced impact is as hard as value learning.

I think that's not quite true; it depends on details of the AIs design.

I don't agree that "It's likely that all substantially easier AIs are too far from FAI to still be a net good.", but I suspect the disagreement comes from different notions of "AI" (as many disagreements do, I suspect).

Taking a broad definition of AI, I think there are many techniques (like supervised learning) that are probably pretty safe and can do a lot of narrow AI tasks (and can maybe even be composed into systems capable of general intelligence). For instance, I think the kind of systems that are being built today are a net good (but might not be if given more data and compute, especially those based on Reinforcement Learning).

Comment author: moridinamael 29 August 2016 01:17:46PM *  1 point [-]

Is it even possible to have a perfectly aligned AI?

If you teach an AI to model the function f(x) = sin(x), it will only be "aligned" with your goal of computing sin(x) to the point of computational accuracy. You either accept some arithmetic cutoff or the AI turns the universe to computronium in order to better approximate Pi.

If you try to teach an AI something like handwritten digit classification, it'll come across examples that even a human wouldn't be able to identify accurately. There is no "truth" to whether a given image is a 6 or a very badly drawn 5, other than the intent of the person who wrote it. The AI's map can't really be absolutely correct because the notion of correctness is not unambiguously defined in the territory. Is it a 5 because the person who wrote it intended it to be a 5? What if 75% of humans say it's a 6?

Since there will always be both computational imprecision and epistemological uncertainty from the territory, the best you can ever do is probably an approximate solution that captures what is important to the degree of confidence we ultimately decide is sufficient.

Comment author: capybaralet 30 August 2016 12:16:03AM 0 points [-]

I edited to clarify what I mean by "approximate value learning".

Risks from Approximate Value Learning

3 capybaralet 27 August 2016 07:34PM

Solving the value learning problem is (IMO) the key technical challenge for AI safety.
How good or bad is an approximate solution?

EDIT for clarity:
By "approximate value learning" I mean something which does a good (but suboptimal from the perspective of safety) job of learning values.  So it may do a good enough job of learning values to behave well most of the time, and be useful for solving tasks, but it still has a non-trivial chance of developing dangerous instrumental goals, and is hence an Xrisk.


1. How would developing good approximate value learning algorithms effect AI research/deployment?
It would enable more AI applications.  For instance, many many robotics tasks such as "smooth grasping motion" are difficult to manually specify a utility function for.  This could have positive or negative effects:

* It could encourage more mainstream AI researchers to work on value-learning.

* It could encourage more mainstream AI developers to use reinforcement learning to solve tasks for which "good-enough" utility functions can be learned.
Consider a value-learning algorithm which is "good-enough" to learn how to perform complicated, ill-specified tasks (e.g. folding a towel).  But it's still not quite perfect, and so every second, there is a 1/100,000,000 chance that it decides to take over the world. A robot using this algorithm would likely pass a year-long series of safety tests and seem like a viable product, but would be expected to decide to take over the world in ~3 years.
Without good-enough value learning, these tasks might just not be solved, or might be solved with safer approaches involving more engineering and less performance, e.g. using a collection of supervised learning modules and hand-crafted interfaces/heuristics.

2. What would a partially aligned AI do? 
An AI programmed with an approximately correct value function might fail 
* dramatically (see, e.g. Eliezer, on AIs "tiling the solar system with tiny smiley faces.")
* relatively benignly (see, e.g. my example of an AI that doesn't understand gustatory pleasure)

Perhaps a more significant example of benign partial-alignment would be an AI that has not learned all human values, but is corrigible and handles its uncertainty about its utility in a desirable way.

View more: Next