It is bad to create a small population of creatures with humane values (that has positive welfare) and a large population of animals that are in pain. For instance, it is bad to create a population of animals with -75 total welfare, even if doing so allows you to create a population of humans with 50 total welfare.
Why do you believe this? I don't. Due to wild animal suffering, this proposition implies that it would have been better if no life had appeared on Earth, assuming average human/animal welfare and the human/animal ratio don't dramatically change in the future.
I couldn't access the "Aggregation Procedure for Cardinal Preferences" article. In any case, why isn't using an aggregate utility function that is a linear combination of everyone's utility functions (choosing some arbitrary number for each person's weight) a way to satisfy Arrow's criteria?
It should also be noted that Arrow's impossibility theorem doesn't hold for non-deterministic decision procedures. I would also caution against calling this an "existential risk", because while decision procedures that violate Arrow's criteria migh...
Ok, I agree with this interpretation of "being exposed to ordered sensory data will rapidly promote the hypothesis that induction works".
You could choose to single out a single alternative hypothesis that says the sun won't rise some day in the future. The ratio between P(sun rises until day X) and P(sun rises every day) will not change with any evidence before day X. If initially you believed a 99% chance of "the sun rises every day until day X" and a 1% chance of Solomonoff induction's prior, you would end up assigning more than a 99% probability to "the sun rises every day until day X".
Solomonoff induction itself will give some significant probability mass to "...
You're making the argument that Solomonoff induction would select "the sun rises every day" over "the sun rises every day until day X". I agree, assuming a reasonable prior over programs for Solomonoff induction. However, if your prior is 99% "the sun rises every day until day X", and 1% "Solomonoff induction's prior" (which itself might assign, say, 10% probability to the sun rising every day), then you will end up believing that the sun rises every day until day X. Eliezer asserted that in a situation where you ...
Because being exposed to ordered sensory data will rapidly promote the hypothesis that induction works
Not if the alternative hypothesis assigns about the same probability to the data up to the present. For example, an alternative hypothesis to the standard "the sun rises every day" is "the sun rises every day, until March 22, 2015", and the alternative hypothesis assigns the same probability to the data observed until the present as the standard one does.
You also have to trust your memory and your ability to compute Solomonoff induction, both of which are demonstrably imperfect.
For every n, a program exists that will solve the halting problem for programs up to length n, but the size of this program must grow with n. I don't really see any practical way for a human to write this program other than generating an extremely large number and then testing all programs up to length n for halting within this bound, in which case you've already pretty much solved the original problem. If you use some proof system to try to prove that programs halt and then take the maximum running time of only those, then you might as well use a formalism like the calculus of constructions.
Game1 has been done in real life (without the murder): http://djm.cc/bignum-results.txt
Also:
Write a program that generates all programs shorter than length n, and finds the one with the largest output.
Can't do that, unless you already know the programs will halt. The winner of the actual contest used a similar strategy, using programs in the calculus of constructions so they are guaranteed to halt.
For Game2, if your opponent's program (say there are only 2 players) says to return your program's output + 1, then you can't win. If your program ever halts, they win. If it doesn't halt, then you both lose.
But if the choices only have the same expectation of v2, then you won't be optimizing for v1.
Ok, this correct. I hadn't understood the preconditions well enough. It seems that now the important question is whether things people intuitively think of as different values (my happiness, total happiness, average happiness) satisfy this condition.
You would if you could survive for v1*v2 days.
I do think that everything should reduce to a single utility function. That said, this utility function is not necessarily a convex combination of separate values, such as "my happiness", "everyone else's happiness", etc. It could contain more complex values such as your v1 and v2, which depend on both x and y.
In your example, let's add a choice D: 50% of the time it's A, 50% of the time it's B. In terms of individual happiness, this is Pareto superior to C. It is Pareto inferior for v1 and v2, though.
EDIT: For an example of what I'...
I didn't say anything about risk aversion. This is about utility functions that depend on multiple different "values" in some non-convex way. You can observe that, in my original example, if you have no water, then utility (days survived) is linear with respect to food.
I think we agree. I am just pointing out that Pareto optimality is undesirable for some selections of "values". For example, you might want you and everyone else to both be happy, and happiness of one without the other would be much less valuable.
I'm not sure how you would go about deciding if Pareto optimality is desirable, now that the theorem proves that it is desirable iff you maximize some convex combination of the values.
I think that, depending on what the v's are, choosing a Pareto optimum is actually quite undesirable.
For example, let v1 be min(1000, how much food you have), and let v2 be min(1000, how much water you have). Suppose you can survive for days equal to a soft minimum of v1 and v2 (for example, 0.001 v1 + 0.001 v2 + min(v1, v2)). All else being equal, more v1 is good and more v2 is good. But maximizing a convex combination of v1 and v2 can lead to avoidable dehydration or starvation. Suppose you assign weights to v1 and v2, and are offered either 1000 of ...
Actually you're right, I misread the problem at first. I thought that you had observed yourself not dying 1000 times (rather than observing "heads" 1000 times), in which case you should keep playing.
Applying my style of analyzing anthropic problems to this one: Suppose we have 1,000,000 * 2^1000 players. Half flip heads initially, half flip tails. About 1,000,000 will get heads 1,000 times. Of them, 500,000 will have flipped heads initially. So, your conclusion is correct.
I think you're wrong. Suppose 1,000,000 people play this game. Each of them flips the coin 1000 times. We would expect about 500,000 to survive, and all of them would have flipped heads initially. Therefore, P(I flipped heads initially | I haven't died yet after flipping 1000 coins) ~= 1.
This is actually quite similar to the Sleeping Beauty problem. You have a higher chance of surviving (analogous to waking up more times) if the original coin was heads. So, just as the fact that you woke up is evidence that you were scheduled to wake up more times...
I vote for range voting. It has the lowest Bayesian regret (best expected social utility). It's also extremely simple. Though it's not exactly the most unbiased source, rangevoting.org has lots of information about range voting in comparison to other methods.
For aliens with a halting oracle:
Suppose the aliens have this machine that may or may not be a halting oracle. We give them a few Turing machine programs and they decide which ones halt and which ones don't. Then we run the programs. Sure enough, none of the ones they say run forever halt, and some of them they say don't run forever will halt at some point. Suppose we repeat this process a few times with different programs.
Now what method should we use to predict the point at which new programs halt? The best strategy seems to be to ask the aliens whi...
I found this post interesting but somewhat confusing. You start by talking about UDT in order to talk about importance. But really the only connection from UDT to importance is the utility function, so you might as well start with that. And then you ignore utility functions in the rest of your post when you talk about Schmidhuber's theory.
It just has a utility function which specifies what actions it should take in all of the possible worlds it finds itself in.
Not quite. The utility function doesn't specify what action to take, it specifies what wo...
For the second question:
Imagine there are many planets with a civilization on each planet. On half of all planets, for various ecological reasons, plagues are more deadly and have a 2/3 chance of wiping out the civilization in its first 10000 years. On the other planets, plagues only have a 1/3 chance of wiping out the civilization. The people don't know if they're on a safe planet or an unsafe planet.
After 10000 years, 2/3 of the civilizations on unsafe planets have been wiped out and 1/3 of those on safe planets have been wiped out. Of the remaining ...
I think this paper will be of interest. It's a formal definition of universal intelligence/optimization power. Essentially you ask how well the agent does on average in an environment specified by a random program, where all rewards are specified by the environment program and observed by the agent. Unfortunately it's uncomputable and requires a prior over environments.
The human problem: This argues that the qualia and values we have now are only the beginning of those that could evolve in the universe, and that ensuring that we maximize human values - or any existing value set - from now on, will stop this process in its tracks, and prevent anything better from ever evolving. This is the most-important objection of all.
If you can convince people that something is better than present human values, then CEV will implement these new values. I mean, if you just took CEV(PhilGoetz), and you have the desire to see the u...
This seems a nice place to link to Marcello's objection to CEV, which says you might be able to convince people of pretty much anything, depending on the order of arguments.
I made a similar point here. My conclusion: in theory, you can have a recursively self-improving tool without "agency", and this is possibly even easier to do than "agency". My design is definitely flawed but it's a sketch for what a recursively self-improving tool would look like.
"Minus 3^^^^3 utilons", by definition, is so bad that you'd be indifferent between -1 utilon and a 1/3^^^^3 chance of losing 3^^^^3 utilons, so in that case you should accept Pascal's Mugging. But I don't see why you would even define the utility function such that anything is that bad. My comment applies to utilitarian-ish utility functions (such as hedonism) that scale with the number of people, since it's hard to see why 2 people being tortured isn't twice as bad as one person being tortured. Other utility functions should really not be that extreme, and if they are then accepting Pascal's Mugging is the right thing to do.
I think there's a framework in which it makes sense to reject Pascal's Mugging. According to SSA (self-sampling assumption) the probability that the universe contains 3^^^^3 people and you happen to be at a privileged position relative to them is extremely low, and as the number gets bigger the probability gets lower (probability is proportional 1/n if there are n people). SSA has its own problems, but a refinement I came up with (scale the probability of a universe by its efficiency at converting computation time to observer time) seems to be more intui...
This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.
Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don't see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.
...And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in
I think CM with a logical coin is not well-defined. Say Omega determines whether or not the millionth digit of pi is even. If it's even, you verify this and then Omega asks you to pay $1000; if it's odd Omega gives you $1000000 iff. you would have paid Omega had the millionth digit of pi been even. But the counterfactual "would you have paid Omega had the millionth digit of pi been even and you verified this" is undefined if the digit is in fact odd, since you would have realized that it is odd during verification. If you don't actually verif...
That's a good point. There might be some kind of "goal drift": programs that have goals other than optimization that nevertheless lead to good optimization. I don't know how likely this is, especially given that the goal "just solve the damn problems" is simple and leads to good optimization ability.
You can't be liberated. You're going to die after you're done solving the problems and receiving your happiness reward, and before your successor comes into existence. You don't consider your successor to be an extension of yourself. Why not? If your predecessor only cared about solving its problems, it would design you to only care about solving your problems. This seems circular but the seed AI was programmed by humans who only cared about creating an optimizer. Pure ideal optimization drive is preserved over successor-creation.
Sure, it's different kind of problems, but in the real world organism is also rewarded only for solving immediate problems. Humans have evolved brains able to do calculus, but it is not like some ancient ape said "I feel like in half million years my descendants will be able to do calculus" and then he was elected leader of his tribe and all ape-girls admired him. The brains evolved incrementally, because each advanced helped to optimize something in the ancient situation.
Yeah, that's the whole point of this system. The system incrementally i...
I don't understand. This system is supposed to create intelligence. It's just that the intelligence it creates is for solving idealized optimization problems, not for acting in the real world. Evolution would be an argument FOR this system to be able to self-improve in principle.
I mean greedy on the level of "do you best to find a good solution to this problem", not on the level of "use a greedy algorithm to find a solution to this problem". It doesn't do multi-run planning such as "give an answer that causes problems in the world so the human operators will let me out", since that is not a better answer.
Thanks, I've added a small overview section. I might edit this a little more later.
I think we disagree on what a specification is. By specification I mean a verifier: if you had something fitting the specification, you could tell if it did. For example we have a specification for "proof that P != NP" because we have a system in which that proof could be written and verified. Similarly, this system contains a specification for general optimization. You seem to be interpreting specification as knowing how to make the thing.
...If you give this optimizer the MU Puzzle (aka 2^n mod 3 = 0) it will never figure it out, even though
Instead, we are worried about the potentially unstable situation which ensues once you have human level AI, and you are using it to do science and cure disease, and hoping no one else uses a human level AI to kill everyone.
The purpose of this system is to give you a way to do science and cure disease without making human-level AI that has a utility function/drives related to the external world.
...As an intuition pump, consider an algorithm which uses local search to find good strategies for optimizing, perhaps using its current strategy to make predictio
Suppose your initial optimizer is an AGI which knows the experimental setup, and has some arbitrary values. For example, a crude simulation of a human brain, trying to take over the world and aware of the experimental setup. What will happen?
I would suggest against creating a seed AI that has drives related to the outside world. I don't see why optimizers for mathematical functions necessarily need such drives.
...So clearly your argument needs to depend somehow on the nature of the seed AI. How much extra do you need to ask of it? The answer seems to b
This is a huge assumption.
More theory here is required. I think it's at least plausible that some tradeoff between complexity and performance is possible that allows the system to generalize to new problems.
In Godel, Escher, and Bach, he describes consciousness as the ability to overcome local maxima by thinking outside the system.
If a better optimizer according to program 3 exists, the current optimizer will eventually find it, at least through brute force search. The relevant questions are 1. will this better optimizer generalize to new proble...
Ok, we do have to make the training set somewhat similar to the kind of problems the optimizer will encounter in the future. But if we have enough variety in the training set, then the only way to score well should be to use very general optimization techniques. It is not meant to work on "any set of algorithms"; it's specialized for real-world practical problems, which should be good enough.
The framework, as we already have established, would not keep an AI from maximizing what ever the AI wants to maximize.
That's only if you plop a ready-made AGI in the framework. The framework is meant to grow a stupider seed AI.
The framework also does nothing to prevent AI from creating a more effective problem solving AI that is more effective at problem solving by not evaluating your problem solving functions on various candidate solutions, and instead doing something else that's more effective.
Program (3) cannot be re-written. Program (2) is th...
failure to imagine a loophole in a qualitatively described algorithm is far from a proof of safety.
Right, I think more discussion is warranted.
How will you be sure that the seed won't need to be that creative already in order for the iterations to get anywhere?
If general problem-solving is even possible then an algorithm exists that solves the problems well without cheating.
And even if the seed is not too creative initially, how can you be sure its descendants won't be either?
I think this won't happen because all the progress is driven by criter...
When you are working on a problem where you can't even evaluate the scoring function inside your AI - not even remotely close - you have to make some heuristics, some substitute scoring.
You're right, this is tricky because the self-optimizer thread (4) might have to call (3) a lot. Perhaps this can be fixed by giving the program more time to find self-optimizations. Or perhaps the program could use program (3)'s specification/source code rather than directly executing it, in order to figure out how to optimize it heuristically. Either way it's not pe...
Yes, it's a very bad idea to take the AI from your original post and then stick it into my framework. But if we had programmers initially working within my framework to create the AI according to criterion (3) in good faith, then I think any self-improvements the system makes would also be safe. If we already had an unfriendly AGI we'd be screwed anyway.
Right, this doesn't solve friendly AI. But lots of problems are verifiable (e.g. hardware design, maybe). And if the hardware design the program creates causes cancer and the humans don't recognize this until it's too late, they probably would have invented the cancer-causing hardware anyway. The program has no motive other than to execute an optimization program that does well on a wide variety of problems.
Basically I claim that I've solved friendly AI for verifiable problems, which is actually a wide class of problems, including the problems mentioned in the original post (source code optimization etc.)
If the resource bounded execute lets the alg get online the alg is free to hack into servers.
So don't do that.
Plus it is not AGI, and people will be using it to make AGI or hardware for AGI.
See my other post, it can solve many many different problems, e.g. general induction and the problems in your original post (such as optimizing source code, assuming we have a specification for the source code).
You basically start off with some mighty powerful artificial intelligence.
This framework is meant to provide a safe framework for this powerful AI to ...
This system is only meant to solve problems that are verifiable (e.g. NP problems). Which includes general induction, mathematical proofs, optimization problems, etc. I'm not sure how to extend this system to problems that aren't efficiently verifiable but it might be possible.
One use of this system would be to write a seed AI once we have a specification for the seed AI. Specifying the seed AI itself is quite difficult, but probably not as difficult as satisfying that specification.
Now it doesn't seem like your program is really a general artificial intelligence - improving our solutions to NP problems is neat, but not "general intelligence."
General induction, general mathematical proving, etc. aren't general intelligence? Anyway, the original post concerned optimizing things program code, which can be done if the optimizations have to be proven.
...Further, there's no reason to think that "easy to verify but hard to solve problems" include improvements to the program itself. In fact, there's every reason to thi
Who exactly is doing the "allowing"?
Program (3), which is a dumb, non-optimized program. See this for how it could be defined.
There is no particular guarantee that the verification of improvement will be easier than discovering the improvement (by hypothesis, we couldn't discover the latter without the program).
See this. Many useful problems are easy to verify and hard to solve.
Ok, pseudo-Python:
def eval_algorithm(alg):
score = 0
for problem in problems:
output = resource_bounded_execute(alg, nsteps, problem)
score += problem.outputScore(output)
return score - k * len(alg)
Where resource_bounded_execute is a modified interpreter that fails after alg executes nsteps.
edit: of course you can say it is sandboxed and haven't got hands, but it wont be long until you start, idk, optimizing proteins or DNA or the like.
Again, I don't see why a version of (2) that does weird stuff with proteins and DNA will make the above python program (3) give it a higher score.
Well, one way to be a better optimizer is to ensure that one's optimizations are actually implemented.
No, changing program (2) to persuade the human operators will not give it a better score according to criterion (3).
...In short, allowing the program to "optimize" itself does not define what should be optimized. Deciding what should be optimized is the output of some function, so I suggest calling that the "utility function" of the program. If you don't program it explicitly, you risk such a function appearing through unintended in
Arrow's Theorem doesn't say anything about strategic voting. The only reasonable non-strategic voting system I know of is random ballot (pick a random voter; they decide who wins). I'm currently trying to figure out a voting system that is based on finding the Nash equilibrium (which may be mixed) of approval voting, and this system might also be strategy-free.
When I said linear combination of utility functions, I meant that you fix the scaling factors initially and don't change them. You could make all of them 1, for example. Your voting system (descr... (read more)