Problems with learning values from observation

0 capybaralet 21 September 2016 12:40AM

I dunno if this has been discussed elsewhere (pointers welcome).

Observational data doesn't allow one to distinguish correlation and causation.
This is a problem for an agent attempting to learn values without being allowed to make interventions.

For example, suppose that happiness is just a linear function of how much Utopamine is in a person's brain.
If a person smiles only when their Utopamine concentration is above 3 ppm, then an value-learner which observes both someone's Utopamine levels and facial expression and tries to predict their reported happiness on the basis of these features will notice that smiling is correlated with higher levels of reported happiness and thus erroneously believe that it is partially responsible for the happiness.

------------------
an IMPLICATION:
I have a picture of value learning where the AI learns via observation (since we don't want to give an unaligned AI access to actuators!).
But this makes it seem important to consider how to make an un unaligned AI safe-enough to perform value-learning relevant interventions.

Risks from Approximate Value Learning

1 capybaralet 27 August 2016 07:34PM

Solving the value learning problem is (IMO) the key technical challenge for AI safety.
How good or bad is an approximate solution?

EDIT for clarity:
By "approximate value learning" I mean something which does a good (but suboptimal from the perspective of safety) job of learning values.  So it may do a good enough job of learning values to behave well most of the time, and be useful for solving tasks, but it still has a non-trivial chance of developing dangerous instrumental goals, and is hence an Xrisk.

Considerations:

1. How would developing good approximate value learning algorithms effect AI research/deployment?
It would enable more AI applications.  For instance, many many robotics tasks such as "smooth grasping motion" are difficult to manually specify a utility function for.  This could have positive or negative effects:

Positive:
* It could encourage more mainstream AI researchers to work on value-learning.

Negative:
* It could encourage more mainstream AI developers to use reinforcement learning to solve tasks for which "good-enough" utility functions can be learned.
Consider a value-learning algorithm which is "good-enough" to learn how to perform complicated, ill-specified tasks (e.g. folding a towel).  But it's still not quite perfect, and so every second, there is a 1/100,000,000 chance that it decides to take over the world. A robot using this algorithm would likely pass a year-long series of safety tests and seem like a viable product, but would be expected to decide to take over the world in ~3 years.
Without good-enough value learning, these tasks might just not be solved, or might be solved with safer approaches involving more engineering and less performance, e.g. using a collection of supervised learning modules and hand-crafted interfaces/heuristics.

2. What would a partially aligned AI do? 
An AI programmed with an approximately correct value function might fail 
* dramatically (see, e.g. Eliezer, on AIs "tiling the solar system with tiny smiley faces.")
or
* relatively benignly (see, e.g. my example of an AI that doesn't understand gustatory pleasure)

Perhaps a more significant example of benign partial-alignment would be an AI that has not learned all human values, but is corrigible and handles its uncertainty about its utility in a desirable way.

Inefficient Games

14 capybaralet 23 August 2016 05:47PM

There are several well-known games in which the pareto optima and Nash equilibria are disjoint sets.
The most famous is probably the prisoner's dilemma.  Races to the bottom or tragedies of the commons typically have this feature as well.

I proposed calling these inefficient games.  More generally, games where the sets of pareto optima and Nash equilibria are distinct (but not disjoint), such as a stag hunt could be called potentially inefficient games.

It seems worthwhile to study (potentially) inefficient games as a class and see what can be discovered about them, but I don't know of any such work (pointers welcome!)

Should we enable public binding precommitments?

0 capybaralet 31 July 2016 07:47PM

The ability to make arbitrary public binding precommitments seems like a powerful tool for solving coordination problems.

We'd like to be able to commit to cooperating with anyone who will cooperate with us, as in the open-source prisoner's dilemma (although this simple case is still an open problem, AFAIK).  But we should be able to do this piece-meal.

It seems like we are moving in this direction, with things like Etherium that enable smart contracts.  Technology should enable us to enforce more real-world precommitments, since we'll be able to more easily monitor and make public our private data.

Optimistically, I think this could allow us to solve coordination issues robustly enough to have a very low probability of any individual actor making an unsafe AI.  This would require a lot of people to make the right kind of precommitments.

I'm guesing there are a lot of potential downsides and ways it could go wrong, which y'all might want to point out.

A Basic Problem of Ethics: Panpsychism?

-4 capybaralet 27 January 2015 06:27AM

Panpsychism seems like a plausible theory of consciousness.  It raises extreme challenges for establishing reasonable ethical criteria.

It seems to suggest that our ethics is very subjective: the "expanding circle" of Peter Singer would eventually (ideally) stretch to encompass all matter.  But how are we to communicate with, e.g. rocks?  Our ability to communicate with one another and our presumed ability to detect falsehood and empathize in a meaningful way allow us to ignore this challenge wrt other people.

One way to argue that this is not such a problem is to suggest that humans are simply very limited in our capacity as ethical beings, and that we are fundamentally limited in our perceptions of ethical truth to only be able to draw conclusions with any meaningful degree of certainty about other humans or animals (or maybe even life-forms, if you are optimistic).  

But this is not very satisfying if we consider transhumanism.  Are we to rely on AI to extrapolate our intuitions to the rest of matter?  How do we know that our intuitions are correct (or do we even care?  I do, personally...)?  How can we tell if an AI is correctly extrapolating?




A Somewhat Vague Proposal for Grounding Ethics in Physics

-3 capybaralet 27 January 2015 05:45AM

As Tegmark argues, the idea of "final goal" for AI is likely incoherent, at least if (as he states), "Quantum effects aside, a truly well-defined goal would specify how all particles in our Universe should be arranged at the end of time."  

But "life is a journey not a destination".  So really, what we should be specifying is the entire evolution of the universe through its lifespan.  So how can the universe "enjoy itself" as much as possible before the big crunch (or before and during the heat death)*.

I hypothesize that experience is related to, if not a product of, change.  I further propose (counter-intuitively, and with an eye towards "refinement" (to put it mildly))** that we treat experience as inherently positive and not try to distinguish between positive and negative experiences.

Then it seems to me the (still rather intractable) question is: how does the rate of entropy's increase relate to the quantity of experience produced?  Is it simply linear (in which case, it doesn't matter, ethically)?  My intuition is that is it more like the fuel efficiency of a car, non-linear and with a sweet spot somewhere between a lengthy boredom and a flash of intensity.



*I'm not super up on cosmology; are there other theories I ought to be considering?

**One idea for refinement: successful "prediction" (undefined here) creates positive experiences; frustrated expectations negative ones.


View more: Next