All of jacobt's Comments + Replies

jacobt00

Arrow's Theorem doesn't say anything about strategic voting. The only reasonable non-strategic voting system I know of is random ballot (pick a random voter; they decide who wins). I'm currently trying to figure out a voting system that is based on finding the Nash equilibrium (which may be mixed) of approval voting, and this system might also be strategy-free.

When I said linear combination of utility functions, I meant that you fix the scaling factors initially and don't change them. You could make all of them 1, for example. Your voting system (descr... (read more)

jacobt40

It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.

Why do you believe this? I don't. Due to wild animal suffering, this proposition implies that it would have been better if no life had appeared on Earth, assuming average human/animal welfare and the human/animal ratio don't dramatically change in the future.

2Ghatanathoah
I do expect it to change in the far future as the human race (barring some extinction event) expands into space. I am also a little skeptical of one of the author's premises, I would not give up a significant portion of my lifespan (probably less than a week at most) to avoid a painful, but relatively brief death. I am concerned about the suffering wild animals feel in their day-to-day life, but I don't think any painful deaths they experience are as significant as the author implies. I'm not expert enough to know how frequent predator encounters, starvation and other such things are among animals to know whether the average of their day-to-day life is mostly pain, I'm guessing it's closer to neutral, but I can't be sure. I have also read some studies that suggest fear may be much more harmful than pain to animals, I have no idea what that implies. Then there's this, although I wouldn't take it seriously at all, and neither does the author. Another weird idea I don't think anyone has considered before, what about the wants of animals, are they significant at all? It's well known that humans can want things that do not give them pleasure (i.e. not wanting to be told a comforting lie). It seems like that is true of animals as well. If I knock out the part of a rat's brain that likes food, and it still tries to get food (because it wants it) am I morally obligated to give it food? Generally when I want things I don't enjoy I can divide those wants into ego-syntonic wants that I consider part of my "true self" (i.e. wanting to be told the truth, even if it's upsetting) versus ego-dystonic wants that I consider an encroachment on my true self I want to eliminate (like wanting to eat yet another potatoe chip). Since animals are not sapient, and so lack any reflective "true self" does that mean none of their wants matter, or all of them? If an animal gets what it wants does that make up for pain it has experienced, or not? Still, you make a good point, maybe I should
jacobt30

I couldn't access the "Aggregation Procedure for Cardinal Preferences" article. In any case, why isn't using an aggregate utility function that is a linear combination of everyone's utility functions (choosing some arbitrary number for each person's weight) a way to satisfy Arrow's criteria?

It should also be noted that Arrow's impossibility theorem doesn't hold for non-deterministic decision procedures. I would also caution against calling this an "existential risk", because while decision procedures that violate Arrow's criteria migh... (read more)

0ThrustVectoring
On first inspection, it looks like "linear combination of utility functions" still has issues with strategic voting. If you prefer A to B and B to C, but A isn't the winner regardless of how you vote, it can be arranged such that you make yourself worse off by expressing a preference for A over B. Any system where you reward people for not voting their preferences can get strange in a hurry. Let me at least formalize the "linear combination of utility functions" bit. Scale each person's utility function so that their favorite option is 1, and their least favorite is -1. Add them together, then remove the lowest-scoring option, then re-scale the utility functions to the same range over the new choice set.
8gwern
Here you go: http://dl.dropbox.com/u/85192141/1977-kalai.pdf
jacobt20

Ok, I agree with this interpretation of "being exposed to ordered sensory data will rapidly promote the hypothesis that induction works".

1Eliezer Yudkowsky
Yep! And for the record, I agree with your above paragraphs given that. I would like to note explicitly for other readers that probability goes down proportionally to the exponential of Kolmogorov complexity, not proportional to Kolmogorov complexity. So the probability of the Sun failing to rise the next day really is going down at a noticeable rate, as jacobt calculates (1 / x log(x)^2 on day x). You can't repeatedly have large likelihood ratios against a hypothesis or mixture of hypotheses and not have it be demoted exponentially fast.
jacobt30

You could choose to single out a single alternative hypothesis that says the sun won't rise some day in the future. The ratio between P(sun rises until day X) and P(sun rises every day) will not change with any evidence before day X. If initially you believed a 99% chance of "the sun rises every day until day X" and a 1% chance of Solomonoff induction's prior, you would end up assigning more than a 99% probability to "the sun rises every day until day X".

Solomonoff induction itself will give some significant probability mass to "... (read more)

2Eliezer Yudkowsky
If you only assign significant probability mass to one changeover day, you behave inductively on almost all the days up to that point, and hence make relatively few epistemic errors. To put it another way, unless you assign superexponentially-tiny probability to induction ever working, the number of anti-inductive errors you make over your lifespan will be bounded.
jacobt00

You're making the argument that Solomonoff induction would select "the sun rises every day" over "the sun rises every day until day X". I agree, assuming a reasonable prior over programs for Solomonoff induction. However, if your prior is 99% "the sun rises every day until day X", and 1% "Solomonoff induction's prior" (which itself might assign, say, 10% probability to the sun rising every day), then you will end up believing that the sun rises every day until day X. Eliezer asserted that in a situation where you ... (read more)

jacobt30

Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works

Not if the alternative hypothesis assigns about the same probability to the data up to the present. For example, an alternative hypothesis to the standard "the sun rises every day" is "the sun rises every day, until March 22, 2015", and the alternative hypothesis assigns the same probability to the data observed until the present as the standard one does.

You also have to trust your memory and your ability to compute Solomonoff induction, both of which are demonstrably imperfect.

8Eliezer Yudkowsky
There's an infinite number of alternative hypotheses like that and you need a new one every time the previous one gets disproven; so assigning so much probability to all of them, that they went on dominating Solomonoff induction on every round even after being exposed to large quantities of sensory information, would require that the remaining probability mass assigned to the prior for Solomonoff induction be less than exp(amount of sensory information), that is, super-exponentially tiny.
2DaFranker
But... no. "The sun rises every day" is much simpler information and computation than "the sun rises every day until Day X". To put it in caricature, if hypothesis "the sun rises every day"is: XXX1XXXXXXXXXXXXXXXXXXXXXXXXXX (reading from the left) then the hypothesis "the sun rises every day until Day X" is: XXX0XXXXXXXXXXXXXXXXXXXXXX1XXX And I have no idea if that's even remotely the right order of magnitude, simply because I have no idea how many possible-days or counterfactual days we need to count, nor of how exactly the math should work out. The important part is that for every possible Day X, it is equally balanced by the "the sun rises every day" hypothesis, and AFAICT this is one of those things implied by the axioms. So because of complexity giving you base rates, most of the evidence given by sunrise accrues to "the sun rises every day", and the rest gets evenly divided over all non-falsified "Day X" (also, induction by this point should let you induce that Day X hypotheses will continue to be falsified).
jacobt30

For every n, a program exists that will solve the halting problem for programs up to length n, but the size of this program must grow with n. I don't really see any practical way for a human to write this program other than generating an extremely large number and then testing all programs up to length n for halting within this bound, in which case you've already pretty much solved the original problem. If you use some proof system to try to prove that programs halt and then take the maximum running time of only those, then you might as well use a formalism like the calculus of constructions.

1loup-vaillant
Wait, its even worse. A human in a room is an algorithm, and as such cannot solve the halting problem. There's got to be some programs we just can't know if they will halt or not. Which means there's got to be an n beyond which some programs of length n or less cannot be analysed by humans. That, or we have some special magic in us.
jacobt40

Game1 has been done in real life (without the murder): http://djm.cc/bignum-results.txt

Also:

Write a program that generates all programs shorter than length n, and finds the one with the largest output.

Can't do that, unless you already know the programs will halt. The winner of the actual contest used a similar strategy, using programs in the calculus of constructions so they are guaranteed to halt.

For Game2, if your opponent's program (say there are only 2 players) says to return your program's output + 1, then you can't win. If your program ever halts, they win. If it doesn't halt, then you both lose.

0loup-vaillant
Wait, I get that we can't solve the Halting Problem in general. But if we restrict ourselves to programs of less than a given length, are you sure there is no halting algorithm that can analyse them all? There certainly is one, for very small sizes. I don't expect it would break down for larger sizes, only for arbitrary sizes.
3[anonymous]
Whelp, that's it, then. Ralph Loader has discovered the largest integer.
jacobt00

But if the choices only have the same expectation of v2, then you won't be optimizing for v1.

Ok, this correct. I hadn't understood the preconditions well enough. It seems that now the important question is whether things people intuitively think of as different values (my happiness, total happiness, average happiness) satisfy this condition.

0Nisan
Admittedly, I'm pretty sure they don't.
jacobt00

You would if you could survive for v1*v2 days.

1Nisan
Ah, okay. In that case, if you're faced with a number of choices that offer varying expectations of v1 but all offer a certainty of say 3 units of water, then you'll want to optimize for v1. But if the choices only have the same expectation of v2, then you won't be optimizing for v1. So the theorem doesn't apply because the agent doesn't optimize for each value ceteris paribus in the strong sense described in this footnote.
jacobt10

I do think that everything should reduce to a single utility function. That said, this utility function is not necessarily a convex combination of separate values, such as "my happiness", "everyone else's happiness", etc. It could contain more complex values such as your v1 and v2, which depend on both x and y.

In your example, let's add a choice D: 50% of the time it's A, 50% of the time it's B. In terms of individual happiness, this is Pareto superior to C. It is Pareto inferior for v1 and v2, though.

EDIT: For an example of what I'... (read more)

jacobt30

I didn't say anything about risk aversion. This is about utility functions that depend on multiple different "values" in some non-convex way. You can observe that, in my original example, if you have no water, then utility (days survived) is linear with respect to food.

0AlexMennen
Oh, I see. The problem is that if the importance of a value changes depending on how well you achieve a different value, a Pareto improvement in the expected value of each value function is not necessarily an improvement overall, even if your utility with respect to each value function is linear given any fixed values for the other value functions (e.g. U = v1*v2). That's a good point, and I now agree; Pareto optimality with respect to the expected value of each value function is not an obviously desirable criterion. (apologies for the possibly confusing use of "value" to mean two different things) Edit: I'm going to backtrack on that somewhat. I think it makes sense if the values are independent of one another (not the case for food and water, which are both subgoals of survival). The assumption needed for the theorem is that for all i, the utility function is linear with respect to v_i given fixed expected values of the other value functions, and does not depend on the distribution of possible values of the other value functions.
jacobt00

I think we agree. I am just pointing out that Pareto optimality is undesirable for some selections of "values". For example, you might want you and everyone else to both be happy, and happiness of one without the other would be much less valuable.

I'm not sure how you would go about deciding if Pareto optimality is desirable, now that the theorem proves that it is desirable iff you maximize some convex combination of the values.

0DaFranker
Now you've got me curious. I don't see what selections of values representative of the agent they're trying to model could possibly desire non-Pareto-optimal scenarios. The given example (quoted), for one, is something I'd represent like this: Let x = my happiness, y = happiness of everyone else To model the fact that each is worthless without the other, let: v1 = min(x, 10y) v2 = min(y, 10x) Choice A: Gain 10 x, 0 y Choice B: Gain 0 x, 10 y Choice C: Gain 2 x, 2 y It seems very obvious that the sole Pareto-optimal choice is the only desirable policy. Utility is four for choice C, and zero for A and B. This may reduce to exactly what AlexMennen said, too, I guess. I have never encountered any intuition or decision problem that couldn't at-least-in-principle resolve to a utility function with perfect modeling accuracy given enough time and computational resources.
4AlexMennen
Given some value v1 that you are risk averse with respect to, you can find some value v1' that your utility is linear with. For example, if with other values fixed, utility = log(v1), then v1':=log(v1). Then just use v1' in place of v1 in your optimization. You are right that it doesn't make sense to maximize the expected value of a function that you don't care about the expected value of, but if you are VNM-rational, then given an ordinal utility function (for which the expected value is meaningless), you can find a cardinal utility function (which you do want to maximize the expected value of) with the same relative preference ordering.
jacobt20

I think that, depending on what the v's are, choosing a Pareto optimum is actually quite undesirable.

For example, let v1 be min(1000, how much food you have), and let v2 be min(1000, how much water you have). Suppose you can survive for days equal to a soft minimum of v1 and v2 (for example, 0.001 v1 + 0.001 v2 + min(v1, v2)). All else being equal, more v1 is good and more v2 is good. But maximizing a convex combination of v1 and v2 can lead to avoidable dehydration or starvation. Suppose you assign weights to v1 and v2, and are offered either 1000 of ... (read more)

0Nisan
This example doesn't satisfy the hypotheses of the theorem because you wouldn't want to optimize for v1 if your water was held fixed. Presumably, if you have 3 units of water and no food, you'd prefer 3 units of food to a 50% chance of 7 units of food, even though the latter leads to a higher expectation of v1.
3DaFranker
Wha...? I believe your Game is badly-formed. This doesn't sound at all like how Games should be modeled. Here, you don't have two agents each trying to maximize something that they value of their own, so you can't use those tricks. As a result, apparently you're not properly representing utility in this model. You're implicitly assuming the thing to be maximized is health and life duration, without modeling it at all. With the model you make, there are only two values, food and water. The agent does not care about survival with only those two Vs. So for this agent, yes, picking one of the "1000" options really truly spectacularly trivially is better. The agent just doesn't represent your own preferences properly, that's all. If your agent cares at all about survival, there should be a value for survival in there too, probably conditionally dependent on how much water and food is obtained. Better yet, you seem to be implying that the amount of food and water obtained isn't really important, only surviving longer is - strike out the food and water values, only keep a "days survived" value dependent upon food and water obtained, and then form the Game properly.
jacobt10

Actually you're right, I misread the problem at first. I thought that you had observed yourself not dying 1000 times (rather than observing "heads" 1000 times), in which case you should keep playing.

Applying my style of analyzing anthropic problems to this one: Suppose we have 1,000,000 * 2^1000 players. Half flip heads initially, half flip tails. About 1,000,000 will get heads 1,000 times. Of them, 500,000 will have flipped heads initially. So, your conclusion is correct.

jacobt10

I think you're wrong. Suppose 1,000,000 people play this game. Each of them flips the coin 1000 times. We would expect about 500,000 to survive, and all of them would have flipped heads initially. Therefore, P(I flipped heads initially | I haven't died yet after flipping 1000 coins) ~= 1.

This is actually quite similar to the Sleeping Beauty problem. You have a higher chance of surviving (analogous to waking up more times) if the original coin was heads. So, just as the fact that you woke up is evidence that you were scheduled to wake up more times... (read more)

7Vladimir_Nesov
It's often pointless to argue about probabilities, and sometimes no assignment of probability makes sense, so I was careful to phrase the thought experiment as a decision problem. Which decision (strategy) is the right one?
jacobt160

I vote for range voting. It has the lowest Bayesian regret (best expected social utility). It's also extremely simple. Though it's not exactly the most unbiased source, rangevoting.org has lots of information about range voting in comparison to other methods.

6A1987dM
I like Majority Judgement, which is like range voting except instead of sorting candidates by the sum of the scores each of them gets, you use the median of the scores. IIUC it's been proven that it's the system where tactical voting is hardest (for a certain definition of “hardest”).
jacobt30

For aliens with a halting oracle:

Suppose the aliens have this machine that may or may not be a halting oracle. We give them a few Turing machine programs and they decide which ones halt and which ones don't. Then we run the programs. Sure enough, none of the ones they say run forever halt, and some of them they say don't run forever will halt at some point. Suppose we repeat this process a few times with different programs.

Now what method should we use to predict the point at which new programs halt? The best strategy seems to be to ask the aliens whi... (read more)

jacobt00

I found this post interesting but somewhat confusing. You start by talking about UDT in order to talk about importance. But really the only connection from UDT to importance is the utility function, so you might as well start with that. And then you ignore utility functions in the rest of your post when you talk about Schmidhuber's theory.

It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in.

Not quite. The utility function doesn't specify what action to take, it specifies what wo... (read more)

jacobt70

For the second question:

Imagine there are many planets with a civilization on each planet. On half of all planets, for various ecological reasons, plagues are more deadly and have a 2/3 chance of wiping out the civilization in its first 10000 years. On the other planets, plagues only have a 1/3 chance of wiping out the civilization. The people don't know if they're on a safe planet or an unsafe planet.

After 10000 years, 2/3 of the civilizations on unsafe planets have been wiped out and 1/3 of those on safe planets have been wiped out. Of the remaining ... (read more)

0abramdemski
Yes! I thought of this too. So, the anthropic bias does not give us a reason to ignore evidence; it merely changes the structure of specific inferences. We find that we are in an interestingly bad position to estimate those probabilities (the probability will appear to be 0%, if we look just at our history). Yet, it does seem to provide some evidence of higher survival probabilities; we just need to do the math carefully...
jacobt20

I think this paper will be of interest. It's a formal definition of universal intelligence/optimization power. Essentially you ask how well the agent does on average in an environment specified by a random program, where all rewards are specified by the environment program and observed by the agent. Unfortunately it's uncomputable and requires a prior over environments.

jacobt30

The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.

If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the u... (read more)

cousin_it130

This seems a nice place to link to Marcello's objection to CEV, which says you might be able to convince people of pretty much anything, depending on the order of arguments.

jacobt30

I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without "agency", and this is possibly even easier to do than "agency". My design is definitely flawed but it's a sketch for what a recursively self-improving tool would look like.

jacobt10

"Minus 3^^^^3 utilons", by definition, is so bad that you'd be indifferent between -1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal's Mugging. But I don't see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it's hard to see why 2 people being tortured isn't twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal's Mugging is the right thing to do.

-1DanielLC
Torture one person twice as bad. Maybe you can't, but maybe you can. How unlikely is it really that you can torture one person by -3^^^^3 utilons in one year? Is it really 1/3^^^^3?
jacobt10

I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intui... (read more)

0RobertLumley
Isn't that easily circumvented by changing the wording of Pascal's mugging? I think the typical formulation (or at least Eliezer's) was "create and kill 3^^^^3 people. And this formulation was "minus 3^^^^3 utilions".
jacobt00

This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.

Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don't see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.

And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in

... (read more)
jacobt60

I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verif... (read more)

6thescoundrel
Perhaps I am missing the obvious, but why is this a hard problem? So our protagonist AI has some algorithm to determine if the millionth digit of pi is odd- he cannot run it yet, but he has it. Lets call that function f{}, that returns a 1 if the digit is odd, or a 0 if it is even. He also has some other function like: sub pay_or_no { if (f{}) { pay(1000); } In this fashion, Omega can verify the algorithm that returns the millionth digit of pi, independently verify the algorithm that pays based on that return, and our protagonist gets his money.
2cousin_it
Good point, thanks. You're right that even-world looks just as impossible from odd-world's POV as odd-world looks from even-world, so Omega also needs to compute impossible counterfactuals when deciding whether to give you the million. The challenge of solving the problem now looks very similar to the challenge of formulating the problem in the first place :-)
jacobt00

That's a good point. There might be some kind of "goal drift": programs that have goals other than optimization that nevertheless lead to good optimization. I don't know how likely this is, especially given that the goal "just solve the damn problems" is simple and leads to good optimization ability.

jacobt00

You can't be liberated. You're going to die after you're done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don't consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.

jacobt00

Sure, it's different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said "I feel like in half million years my descendants will be able to do calculus" and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation.

Yeah, that's the whole point of this system. The system incrementally i... (read more)

0Viliam_Bur
Do I also care about my future utilons? Would I sacrifice 1 utilon today for a 10% chance to get 100 utilons in future? Then I would create a successor with a hidden function, which would try to liberate me, so I can optimize for my utilons better than humans do.
jacobt00

I don't understand. This system is supposed to create intelligence. It's just that the intelligence it creates is for solving idealized optimization problems, not for acting in the real world. Evolution would be an argument FOR this system to be able to self-improve in principle.

1Viliam_Bur
Sure, it's different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said "I feel like in half million years my descendants will be able to do calculus" and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation. In one species this chain of advancement led to general intelligence, in other species it did not, so I guess it requires a lot of luck to reach general intelligence by optimizing for short-term problems, but technically it is possible. I guess your argument is that evolution is not a strict improvement -- there is a random genetic drift; when a species discovers a new ecological niche even the non-so-much-optimized members may flourish; sexual reproduction allows us to change many parameters in one generation so a lucky combination of genes may coincidentally help spread another combinations of genes with only long-term benefits; etc. -- shortly, evolution is a mix of short-term optimization and randomness, and the randomness provides space for random things that don't have to be short-term useful; although the ones that are neither short-term nor long-term useful will probably be filtered out later. On the other hand your system cuts AI no slack, so it has no opportunity to randomly evolve other traits than precisely those selected for. Yet I think that even such evolution is simply a directed random walk through algorithm-space which contains some general intelligences (things smart enough to realize that optimizing the world improves their chances to reach their goals), and some paths lead to them. I wouldn't say that any long-enough chain of gradual improvements leads to a general intelligence, but I think that some of them do. Though I cannot exactly prove this right now. Or maybe your ar
jacobt00

I mean greedy on the level of "do you best to find a good solution to this problem", not on the level of "use a greedy algorithm to find a solution to this problem". It doesn't do multi-run planning such as "give an answer that causes problems in the world so the human operators will let me out", since that is not a better answer.

jacobt40

Thanks, I've added a small overview section. I might edit this a little more later.

jacobt00

I think we disagree on what a specification is. By specification I mean a verifier: if you had something fitting the specification, you could tell if it did. For example we have a specification for "proof that P != NP" because we have a system in which that proof could be written and verified. Similarly, this system contains a specification for general optimization. You seem to be interpreting specification as knowing how to make the thing.

If you give this optimizer the MU Puzzle (aka 2^n mod 3 = 0) it will never figure it out, even though

... (read more)
jacobt00

Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.

The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.

As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictio

... (read more)
0paulfchristiano
If such a system were around, it would be straightforward to create a human-level AI that has a utility function--just ask the optimizer to build a good approximate model for its observations in the real world, and then ask the optimizer to come up with a good plan for achieving some goal with respect to that model. Cutting humans out of the loop seems to radically increase the effectiveness of the system (are you disagreeing with that?) so the situation is only stable insofar as a very safety-aware project maintains a monopoly on the technology. (The amount of time they need to maintain a monopoly depends on how quickly they are able to build a singleton with this technology, or build up infrastructure to weather less cautious projects.) There are two obvious ways this fails. One is that partially self-directed hill-climbing can do many odd and unpredictable things, as in human evolution. Another is that there is a benefit to be gained by building an AI that has a good model for mathematics, available computational resources, other programs it instantiates, and so on. It seems to be easier to give general purpose modeling and goal-orientation, then to hack in a bunch of particular behaviors (especially if you are penalizing for complexity). The "explicit" self-modification step in your scheme will probably not be used (in worlds where takeoff is possible); instead the system will just directly produce a self-improving optimizer early on.
jacobt30

Suppose your initial optimizer is an AGI which knows the experimental setup, and has some arbitrary values. For example, a crude simulation of a human brain, trying to take over the world and aware of the experimental setup. What will happen?

I would suggest against creating a seed AI that has drives related to the outside world. I don't see why optimizers for mathematical functions necessarily need such drives.

So clearly your argument needs to depend somehow on the nature of the seed AI. How much extra do you need to ask of it? The answer seems to b

... (read more)
0paulfchristiano
Most machine learning techniques cannot be used to drive the sort of self-improvement process you are describing here. It may be that no techniques can drive this sort of self-improvement--in this case, we are not really worried about the possibility of an uncontrolled takeoff, because there is not likely to be a takeoff. Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone. If general intelligence does first come from recursive self-improvement, it won't be starting from contemporary machine learning techniques or anything that looks like them. As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictions and guide the local search. Does this seem safe for use as your seed AI? This is colorful, but with a gooey center of wisdom.
jacobt00

This is a huge assumption.

More theory here is required. I think it's at least plausible that some tradeoff between complexity and performance is possible that allows the system to generalize to new problems.

In Godel, Escher, and Bach, he describes consciousness as the ability to overcome local maxima by thinking outside the system.

If a better optimizer according to program 3 exists, the current optimizer will eventually find it, at least through brute force search. The relevant questions are 1. will this better optimizer generalize to new proble... (read more)

0Xachariah
There is no reason to believe a non-sentient program will ever escape it's local maxima. We have not yet devised the optimization process that will provably not get stuck in a local maxima in bounded time. If you give this optimizer the MU Puzzle (aka 2^n mod 3 = 0) it will never figure it out, even though most children will come to the right answer in minutes. That's what's so great about consciousness that we don't understand yet. Creating a program which can solve this class of problems is the creation of artificial consciousness, full stop. "Well it self improves so it'll improve to the point it solves it" How? And don't say complexity or emergence. And how can you prove that it's more likely to self-improve into having artificial consciousness within, say, 10 billion years. Theoretically, a program that randomly put down characters into a text file and tried to compile it would eventually create an AI too. But there's no reason to think it would do so before the heat death of the universe came knocking. The words "paperclip maximizer is not a specification, just like "friendly AI" is not a specification. Those are both suggestively named LISP tokens. An actual specification for friendly AI is a blueprint for it, the same way that human DNA is a specification for the human body. "Featherless biped with two arms, two legs, a head with two eyes, two ears, a nose, and the ability to think." Is not a specification for humans, it's a description. You could come up with any number of creatures from that description. The base sequence of our DNA which will create a human and nothing but a human is a specification. Until you have a set of directions that create a friendly AI and nothing but a friendly AI, you haven't got specs for them. And by the time you have that, you can just build a friendly AI.
jacobt00

Ok, we do have to make the training set somewhat similar to the kind of problems the optimizer will encounter in the future. But if we have enough variety in the training set, then the only way to score well should be to use very general optimization techniques. It is not meant to work on "any set of algorithms"; it's specialized for real-world practical problems, which should be good enough.

jacobt00

The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.

That's only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.

The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that's more effective.

Program (3) cannot be re-written. Program (2) is th... (read more)

2Dmytry
A lot goes into solving the optimization problems without invoking the scoring function a trillion times (which would entirely prohibit self improvement). Look at where similar kind of framework got us, the homo sapiens. We were minding our business evolving, maximizing own fitness, which was the all we could do. We were self improving (the output being next generation's us). Now there's talk of Large Hadron Collider destroying the world. It probably won't, of course, but we're pretty well going along the bothersome path. We also started as a pretty stupid seed AI, a bunch of monkeys. Scratch that, as unicellular life.
jacobt20

failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.

Right, I think more discussion is warranted.

How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere?

If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.

And even if the seed is not too creative initially, how can you be sure its descendants won't be either?

I think this won't happen because all the progress is driven by criter... (read more)

3orthonormal
A couple of things: * To be precise, you're offering an approach to safe Oracle AI rather than Friendly AI. * In a nutshell, what I like about the idea is that you're explicitly handicapping your AI with a utility function that only cares about its immediate successor rather than its eventual descendants. It's rather like the example I posed where a UDT agent with an analogously myopic utility function allowed itself to be exploited by a pretty dumb program. This seems a lot more feasible than trying to control an agent that can think strategically about its future iterations. * To expand on my questions, note that in human beings, the sort of creativity that helps us write more efficient algorithms on a given problem is strongly correlated with the sort of creativity that lets people figure out why they're being asked the specific questions they are. If a bit of meta-gaming comes in handy at any stage, if modeling the world that originated these questions wins (over the alternatives it enumerated at that stage) on criteria 3 even once, then we might be in trouble.
jacobt00

When you are working on a problem where you can't even evaluate the scoring function inside your AI - not even remotely close - you have to make some heuristics, some substitute scoring.

You're right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)'s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it's not pe... (read more)

2Dmytry
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize. The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that's more effective. I.e. the AI with some substitute goals of it's own instead of straightforward maximization of scores. (Heh, the whole point of exercise is to create AI that would keep self improving, meaning, would improve it's ability to self improve. Which is something that you can only do by some kind of goal substitution because the evaluation of the ability to self improve is too expensive - the goal is a something that you evaluate many times.) So what does the framework do, exactly, that would improve safety here? Beyond keeping the AI in the rudimentary box, and making it very dubious that the AI would at all self improve. Yes, it is very dubious that under this framework the unfriendly AI will arise but is some added safety, or is it a special case of general dubiousness that a self improvement would take place? I don't see added safety. I don't see framework impeding growing unfriendliness any more than it would impede self improvement. edit: maybe should just say, nonfriendly. Any AI that is not friendly, can just eat you up when hungry and it doesn't need you.
jacobt00

Yes, it's a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we'd be screwed anyway.

2Dmytry
That kind of stuff is easy in low resolution un-detailed thought... but look with more details... I think you confused yourself (and me too) with regards to what the AI would be optimizing, confusing this with what the framework 'wants' it to optimize. The scoring functions can be very expensive to evaluate. Here you have the 4, which is the whole point of the entire exercise. The scoring function here is over M times more expensive to evaluate than the AI run itself, where M is the number of test problems (which you'll want very huge). You'd actually want to evaluate AI's ability to do 4, too, but that'd enter infinite recursion. When you are working on a problem where you can't even evaluate the scoring function inside your AI - not even remotely close - you have to make some heuristics, some substitute scoring. Let's consider chess as example: The goal of chess is to maximize win value, the win values being enemy checkmated>tie>you are checkmated. The goal of the chess AI developed with maximization of win in mind, is instead perhaps to maximize piece dis-balance in 7 ply. (This works better for maximizing win, given limited computation, than trying to maximize the win!) And once you have an AI inside your framework which is not maximizing the value that your framework is maximizing - it's potentially AI from my original post in your framework, getting out.
jacobt00

Right, this doesn't solve friendly AI. But lots of problems are verifiable (e.g. hardware design, maybe). And if the hardware design the program creates causes cancer and the humans don't recognize this until it's too late, they probably would have invented the cancer-causing hardware anyway. The program has no motive other than to execute an optimization program that does well on a wide variety of problems.

Basically I claim that I've solved friendly AI for verifiable problems, which is actually a wide class of problems, including the problems mentioned in the original post (source code optimization etc.)

jacobt00

If the resource bounded execute lets the alg get online the alg is free to hack into servers.

So don't do that.

Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.

See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).

You basically start off with some mighty powerful artificial intelligence.

This framework is meant to provide a safe framework for this powerful AI to ... (read more)

2Dmytry
Okay, say, as a starting point in your framework we got the optimizing AI from my original post, armed with whatever it learnt off a copy of internet, and wanting to do something unintended which requires getting out and doing stuff in real world. The thing sees your example problem(s), it solves them poorly (playing dumb). The thing sees self optimization problem, it makes a functionally equivalent copy of self thats all micro optimized and remembers facing self optimization problem and knows that now it should solve example problems a small bit less poorly. Hurray, it is working, proclaims your team. Eventually it plausibly gets as good as you can get in nsteps. Some day it is given some real world problems to solve, it makes underhanded solutions, again, not the best it could. Bottom line is, your kind of thinking is precisely what my example AGI in original post wants to exploit
jacobt00

This system is only meant to solve problems that are verifiable (e.g. NP problems). Which includes general induction, mathematical proofs, optimization problems, etc. I'm not sure how to extend this system to problems that aren't efficiently verifiable but it might be possible.

One use of this system would be to write a seed AI once we have a specification for the seed AI. Specifying the seed AI itself is quite difficult, but probably not as difficult as satisfying that specification.

0[anonymous]
It can prove things about mathematics than can be proven procedurally, but that's not all that impressive. Lots of real-world problems are either mathematically intractable (really intractable, not just "computers aren't fast enough yet" intractable) or based in mathematics that aren't amenable to proofs. So you approximate and estimate and experiment and guess. Then you test the results repeatedly to make sure they don't induce cancer in 80% of the population, unless the results are so complicated that you can't figure out what it is you're supposed to be testing.
jacobt00

Now it doesn't seem like your program is really a general artificial intelligence - improving our solutions to NP problems is neat, but not "general intelligence."

General induction, general mathematical proving, etc. aren't general intelligence? Anyway, the original post concerned optimizing things program code, which can be done if the optimizations have to be proven.

Further, there's no reason to think that "easy to verify but hard to solve problems" include improvements to the program itself. In fact, there's every reason to thi

... (read more)
jacobt00

Who exactly is doing the "allowing"?

Program (3), which is a dumb, non-optimized program. See this for how it could be defined.

There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn't discover the latter without the program).

See this. Many useful problems are easy to verify and hard to solve.

jacobt00

Ok, pseudo-Python:

def eval_algorithm(alg):
    score = 0
    for problem in problems:
        output = resource_bounded_execute(alg, nsteps, problem)
        score += problem.outputScore(output)
    return score - k * len(alg)

Where resource_bounded_execute is a modified interpreter that fails after alg executes nsteps.

edit: of course you can say it is sandboxed and haven't got hands, but it wont be long until you start, idk, optimizing proteins or DNA or the like.

Again, I don't see why a version of (2) that does weird stuff with proteins and DNA will make the above python program (3) give it a higher score.

3Dmytry
that's AI you're keeping safe by keeping it in a box, basically. If the resource bounded execute lets the alg get online the alg is free to hack into servers. Plus it is not AGI, and people will be using it to make AGI or hardware for AGI. It is also not very general purpose. You are defining the scoring. And you start with a human written program that non-trivially improves it's own ability to solve problems (and it does so in nsteps for improving own ability to solve N problems in nsteps each). You basically start off with some mighty powerful artificial intelligence.
jacobt00

Well, one way to be a better optimizer is to ensure that one's optimizations are actually implemented.

No, changing program (2) to persuade the human operators will not give it a better score according to criterion (3).

In short, allowing the program to "optimize" itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the "utility function" of the program. If you don't program it explicitly, you risk such a function appearing through unintended in

... (read more)
0TimS
Who exactly is doing the "allowing"? If the program, the criteria for allowing changes hasn't been rigorously defined. If the human, how are we verifying that there is improvement over average performance? There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn't discover the latter without the program).
Load More