Comment author: jacobt 26 May 2012 11:43:03PM *  6 points [-]

For the second question:

Imagine there are many planets with a civilization on each planet. On half of all planets, for various ecological reasons, plagues are more deadly and have a 2/3 chance of wiping out the civilization in its first 10000 years. On the other planets, plagues only have a 1/3 chance of wiping out the civilization. The people don't know if they're on a safe planet or an unsafe planet.

After 10000 years, 2/3 of the civilizations on unsafe planets have been wiped out and 1/3 of those on safe planets have been wiped out. Of the remaining civilizations, 2/3 are on safe planets, so the fact that your civilization survived for 10000 years is evidence that your planet is safe from plagues. You can just apply Bayes' rule:

P(safe planet | survive) = P(safe planet) P(survive | safe planet) / P(survive) = 0.5 * 2/3 / 0.5 = 2/3

EDIT: on the other hand, if logical uncertainty is involved, it's a lot less clear. Supposed either all planets are safe or none of them are safe, based on the truth-value of a logical proposition (say, the trillionth digit of pi being odd) that is estimated to be 50% likely a priori. Should the fact that your civilization survived be used as evidence of the logical coin flip? SSA suggests no, SIA suggests yes because more civilizations survive when the coin flip makes all planets safe. On the other hand, if we changed the thought experiment so that no civilization survives if the logical proposition is false, then the fact that we survived is proof that the logical proposition is true.

Comment author: jacobt 25 May 2012 07:09:49PM *  2 points [-]

I think this paper will be of interest. It's a formal definition of universal intelligence/optimization power. Essentially you ask how well the agent does on average in an environment specified by a random program, where all rewards are specified by the environment program and observed by the agent. Unfortunately it's uncomputable and requires a prior over environments.

Comment author: jacobt 18 May 2012 07:28:54AM 3 points [-]

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the universe adopt "evolved" values, then CEV will extrapolate this desire. The only issue is that other people might not share this desire, even when extrapolated. In that case insisting that values "evolve" is imposing minority desires on everyone, mostly people who could never be convinced that these values are good. Which might be a good thing, but it can be handled in CEV by taking CEV(some "progressive" subset of humans).

Comment author: jacobt 11 May 2012 08:14:21AM 2 points [-]

I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without "agency", and this is possibly even easier to do than "agency". My design is definitely flawed but it's a sketch for what a recursively self-improving tool would look like.

Comment author: RobertLumley 04 May 2012 01:04:56AM 0 points [-]

Isn't that easily circumvented by changing the wording of Pascal's mugging? I think the typical formulation (or at least Eliezer's) was "create and kill 3^^^^3 people. And this formulation was "minus 3^^^^3 utilions".

Comment author: jacobt 04 May 2012 01:54:47AM 1 point [-]

"Minus 3^^^^3 utilons", by definition, is so bad that you'd be indifferent between -1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal's Mugging. But I don't see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it's hard to see why 2 people being tortured isn't twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal's Mugging is the right thing to do.

Comment author: jacobt 04 May 2012 12:21:17AM *  1 point [-]

I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intuitive. See the discussion here. The question you ask is not "how many people do my actions affect?" but instead "what percentage of simulated observer-time, assuming all universes are being simulated in parallel and given computation time proportional to the probabilities of their laws of physics, do my actions affect?". So I don't think you need to use ad-hoc heuristics to prevent Pascal's Mugging.

Comment author: Manfred 14 April 2012 10:20:40PM 1 point [-]

This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness. In order to have the best shot at regenerating human categorizations, our world-modeler would have to have human resources and use concepts for human activities.

And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in order to communicate or set up a goal system. And you have to pick it out without knowing what the categories are beforehand, or else you'd just write that.

Comment author: jacobt 14 April 2012 10:33:53PM 0 points [-]

This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.

Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don't see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.

And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in order to communicate or set up a goal system.

Right, you can "index" a category by providing some positive and negative examples. If I gave you some pictures of oranges and some pictures of non-oranges, you could figure out the true categorization because you consider the categorization of oranges/non-oranges to be simple. There's probably a more robust way of doing this.

Are Magical Categories Relatively Simple?

3 jacobt 14 April 2012 08:59PM

In Magical Categories, Eliezer criticizes using machine learning to learn the concept of "smile" from examples. "Smile" sounds simple to humans but is actually a very complex concept. It only seems simple to us because we find it useful.

If we saw pictures of smiling people on the left and other things on the right, we would realize that smiling people go to the left and categorize new things accordingly. A supervised machine learning algorithm, on the other hand, will likely learn something other than what we think of as "smile" (such as "containing things that pass the smiley face recognizer") and categorize molecular smiley faces as smiles.

This is because simplicity is subjective: a human will consider "happy" and "person" to be basic concepts, so the intended definition of smile as "expression of a happy person" is simple. A computational Occam's Razor will consider this correct definition to be a more complex concept than "containing things that pass the smiley face recognizer". I'll use the phrase "magical category" to refer to concepts that have a high Kolmogorov complexity but that people find simple.

I hope that it's possible to create conditions under which the computer will have an inductive bias towards magical categories, as humans do. I think that people find these concepts simple because they're useful to explain things that humans want to explain (such as interactions with people or media depicting people). The video has pixels arranged in this pattern because it depicts a person who is happy because he is eating chocolate.

So, maybe it's possible to learn these magical categories from a lot of data, by compressing the categorizer along with the data. Here's a sketch of a procedure for doing this:

  1. Amass a large collection of data from various societies, containing photographs, text, historical records, etc.

  2. Come up with many categories (say, one for each noun in a long list). For each category, decide which pieces of data fit the category.

  3. Find categorizer_1, categorizer_2, ..., categorizer_n to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n)

What do these mean:

  • K(x) is the Kolmogorov complexity of x; that is, the length of the shortest (program,input) pair that, when run, produces x. This is uncomputable so it has to be approximated (such as through resource-bounded data compression).
  • + denotes string concatenation. There should be some separator so the boundaries between strings are clear.
  • dataset is the collection of data
  • categorizer_k is a program that returns "true" or "false" depending on whether the input fits category #k

  • When learning a new category, find new_categorizer to minimize K(dataset + categorizer_1 + categorizer_2 + ... + categorizer_n + new_categorizer) while still matching the given examples.

Note that while in this example we learn categorizers, in general it should be possible to learn arbitrary functions including probabilistic functions.

The fact that the categorizers are compressed along with the dataset will create a bias towards categorizers that use concepts useful in compressing the dataset and categorizing other things. From looking at enough data, the concept of "person" naturally arises (in the form of a recognizer/generative model/etc), and it will be used both to compress the dataset and to recognize the "person" category. In effect, because the "person" concept is useful for compressing the dataset, it will be cheap/simple to use in categorizers (such as to recognize real smiling faces).

A useful concept here is "relative complexity" (I don't know the standard name for this), defined as K(x|y) = K(x + y) - K(y). Intuitively this is how complex x is if you already understand y. The categorizer should be trusted in inverse proportion to its relative complexity K(categorizer | dataset and other categorizers); more complex (relative to the data) categorizers are more arbitrary, even given concepts useful for understanding the dataset, and so they're more likely to be wrong on new data.

If we can use this setup to learn "magical" categories, then Friendly AI becomes much easier. CEV requires the magical concepts "person" and "volition" to be plugged in. So do all seriously proposed complete moral systems. I see no way of doing Friendly AI without having some representation of these magical categories, either provided by humans or learned from data. It should be possible to learn deontological concepts such as "obligation" or "right", and also consequentialist concepts such as "volition" or "value". Some of these are 2-place predicates so they're categories over pairs. Then we can ask new questions such as "Do I have a right to do x in y situation?" All of this depends on whether the relevant concepts have low complexity relative to the dataset and other categorizers.

Using this framework for Friendly AI has many problems. I'm hand-waving the part about how to actually compress the data (approximating Kolmogorov complexity). This is a difficult problem but luckily it's not specific to Friendly AI. Another problem is that it's hard to go from categorizing data to actually making decisions. This requires connecting the categorizer to some kind of ontology. The categorization question that we can actually give examples for would be something like "given this description of the situation, is this action good?". Somehow we have to provide examples of (description,action) pairs that are good or not good, and the AI has to come up with a description of the situation before deciding whether the action is good or not. I don't think that using exactly this framework to make Friendly AI is a good idea; my goal here is to argue that sufficiently advanced machine learning can learn magical categories.

If it is in fact possible to learn magical categories, this suggests that machine learning research (especially related to approximations of Solomonoff induction/Kolmogorov complexity) is even more necessary for Friendly AI than it is for unFriendly AI. I think that the main difficulty of Friendly AI as compared with unFriendly AI is the requirement of understanding magical concepts/categories. Other problems (induction, optimization, self-modification, ontology, etc.) are also difficult but luckily they're almost as difficult for paperclip maximizers as they are for Friendly AI.

This has a relationship to the orthogonality thesis. Almost everyone here would agree with a weak form of the orthogonality thesis: that there exist general optimizers AI programs to which you can plug in any goal (such as paperclip maximization). A stronger form of the orthogonality thesis asserts that all ways of making an AI can be easily reduced to specifying its goals and optimization separately; that is, K(AI) ~= K(arbitrary optimizer) + K(goals). My thesis here (that magical categories are simpler relative to data) suggests that the strong form is false. Concepts such as "person" and "value" have important epistemic/instrumental value and can also be used to create goals, so K(Friendly AI) < K(arbitrary optimizer) + K(Friendliness goal). There's really no problem with human values being inherently complex if they're not complex relative to data we can provide to the AI or information it will create on its own for instrumental purposes. Perhaps P(Friendly AI | AGI, passes some Friendliness tests) isn't actually so low even if the program is randomly generated (though I don't actually suggest taking this approach!).

I'm personally working on a programming language for writing and verifying generative models (proving lower bounds on P(data|model)). Perhaps something like this could be used to compress data and categories in order to learn magical categories. If we can robustly learn some magical categories even with current levels of hardware/software, that would be strong evidence for the possibility of creating Friendly AI using this approach, and evidence against the molecular smiley face scenario.

Comment author: jacobt 29 March 2012 02:46:17AM 4 points [-]

I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verify it, then the problem is well-defined because Omega can just lie to you. I guess you could ask the counterfactual "what if your digit verification procedure malfunctioned and said the digit was even", but now we're getting into doubting your own mental faculties.

Comment author: Dmytry 28 February 2012 02:55:02PM *  2 points [-]

with regards to AI not caring about the real world, for example the h sapiens cares about the 'outside' world and wants to maximize number of paperclips, err, souls in heaven, without ever having been given any cue that outside even exists. It seems we assume that AI is some sciencefiction robot dude that acts all logical and doesn't act creatively, and is utterly sane. Sanity is NOT what you tend to get from hill climbing. You get 'whatever works'.

Comment author: jacobt 01 March 2012 12:05:29AM 0 points [-]

That's a good point. There might be some kind of "goal drift": programs that have goals other than optimization that nevertheless lead to good optimization. I don't know how likely this is, especially given that the goal "just solve the damn problems" is simple and leads to good optimization ability.

View more: Prev | Next