All of Andrew Jacob Sauer's Comments + Replies

That's beside the point. In the first case you'd take 1A in the first game, and 2A in the 2nd game(34% chance of living is better than 33%). In the 2nd case, if you bothered to play at all, you'd probably take 1B/2B. What doesn't make sense is taking 1A and 2B. That policy is inconsistent no matter how you value different amounts of money (unless you don't care about money at all in which case do whatever, the paradox is better illustrated with something you do care about) so things like risk, capital cost, diminishing returns etc are beside the point.

In this case the only reason the money pumping doesn't work is because Omega is unable to choose its policy based on its prediction of your second decision: If it could, you would want to switch back to b, because if you chose a, Omega would know that and you'd get 0 payoff. This makes the situation after the coinflip different from the original problem where Omega is able to see your decision and make its decision based on that.

In the Allais problem as stated, there's no particular reason why the situation where you get to choose between $24,000, or $27,000 with 33/34 chance, differs depending on whether someone just offered it to you, or if they offered it to you only after you got <=34 on a d100.

1ViktoriaMalyasova
Well, Omega doesn't know which way the coin landed, but it does know that my policy is to choose a if the coin landed heads and b if the coin landed tails. I agree that the situation is different, because Omega's state of knowledge is different, and that stops money pumping.  It's just interesting that breaking the independence axiom does not lead to money pumping in this case. What if it doesn't lead to money pumping in other cases too?

My worry with automation isn't that it will destroy the intrinsic value of human endeavors, rather that it will destroy the economic value of the average person's endeavors. I agree that human art is still valuable even if AI can make better art. My concern is that under the current system of production where people must contribute to society in a competitive way in order to secure an income and a living for themselves, full automation will be materially harmful to everyone who doesn't own the automated systems.

3Shmi
The (progressive) hope is that we will end up in a post-scarcity situation, where "securing a living" is not a thing anymore and "owning automated systems" is not necessary for full access to them. Of course you are right that humans are excellent at creating inequality for themselves, and the outdated "current system of production" will get preserved rather than replaced.
1Shiroe
Exactly. I wish the economic alignment issue was brought up more often.

Is everybody's code going to be in Python?

3lsusr
The rules are that, from the game engine's perspective, everyone's code is going to be written in Python 3 or Hy. It is theoretically possible that someone's code might, say, include some code in a different language that is then executed from the Python runtime.

What are the rules about program runtime?

0lsusr
Vague. The only limitations are obvious stuff like hacking the game client to get around information restrictions, using so much compute you slow the game itself to a crawl, installing anything that takes me significant effort to set up and anything I consider a security risk. You may PM me for clarification on your specific situation. I can easily perform speed tests as I have already written the game client. If you are a programmer then you may request reasonable features such as reading your opponent's source code. Edit #1: In order to maximize resource use, you may provide an adjustable constant such as TREE_DEPTH. Edit #2: Any code that can always complete 10,000 move calls within 5 seconds is guaranteed to be "fast enough".

A common concern around here seems to be that, without massive and delicate breakthroughs in our understanding of human values, any superintelligence will destroy all value by becoming some sort of paperclip optimizer. This is what Eliezer claims in Value is Fragile. Therefore, any vision of the future that manages to do better than this without requiring huge philosophical breakthroughs (in particular, a future that doesn’t know how to implement CEV before the Singularity happens) is encouraging to me as a proof of concept for how the future might be more... (read more)

Thanks for the link, I will check it out!

As for cannibalism, it seems to me that its role in Eliezer's story is to trigger a purely illogical revulsion in the humans who antropomorphise the aliens.

I dunno about you but my problem with the aliens isn't that it is cannibalism but that the vast majority of them die slow and horribly painful deaths

No cannibalism takes place, but the same amount of death and suffering is present as in Eliezer's scenario. Should we be less or more revolted at this?

The same.

Which scenario has the greater moral weight?

Neither. They are both horr... (read more)

Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots. Thanks for your notes on how it works in the logically uncertain case. I found a different objection based on the assumption of logical omniscience:

Regarding this you say:

Perhaps you think that the problem with the above version is that I assumed logical omniscience. It is unrealistic to suppose that agents have beliefs which perfectly respect logic. (Un)Fortunately, the argument doesn't really depend
... (read more)

Sorry to necro this here, but I find this topic extremely interesting and I keep coming back to this page to stare at it and tie my brain in knots.

(As for this, I think a major goal of LessWrong -- and the alignment forum -- is to facilitate sustained intellectual progress; a subgoal of that is that discussions can be sustained over long periods of time, rather than flitting about as would be the case if we only had attention for discussing the most recent posts!!)

3abramdemski
Right, this is what you have to do. Hmm. So, a bounded theorem prover using PA can still prove Löb about itself. I think everything is more complicated and you need to make some assumptions (because there's no guarantee a bounded proof search will find the right Löbian proof to apply to itself, in general), but you can make it go through. I believe the technical details you're looking for will be in Critch's paper on bounded Löb.

That's what I was thinking. Garbage in, garbage out.

This seems equivalent to Tegmark Level IV Multiverse to me. Very simple, and probably our universe is somewhere in there, but doesn't have enough explanatory power to be considered a Theory of Everything in the physical sense.

From an omniscient point of view, yes. From my point of view, probably not, but there are still problems that arise relating to this, that can cause logic-based agents to get very confused.

Let A be an agent, considering options X and not-X. Suppose A |- Action=not-X -> Utility=0. The naive approach to this would be to say: if A |- Action=X -> Utility<0, A will do not-X, and if A |- Action=X -> Utility>0, A will do X. Suppose further that A knows its source code, so it knows this is the case.
Consider the statement G=(A |- G) -> (Action=X -... (read more)

2TAG
I am not aware of a good reason to believe that a perfect decision theory is even possible, or that counterfactuals of any sort are the main obstacle.
Suppose you learn about physics and find that you are a robot. You learn that your source code is "A". You also believe that you have free will; in particular, you may decide to take either action X or action Y.

My motivation for talking about logical counterfactuals has little to do with free will, even if the philosophical analysis of logical counterfactuals does.

The reason I want to talk about logical counterfactuals is as follows: suppose as above that I learn that I am a robot, and that my source code is "A"(which is presumed to... (read more)

2jessicata
I'm not using "free will" to mean something distinct from "the ability of an agent, from its perspective, to choose one of multiple possible actions". Maybe this usage is nonstandard but find/replace yields the right meaning.
1TAG
From an omniscient point of view, or from your point of view? The typical agent has imperfect knowledge of both the inputs to their decision procedure, and the procedure itself. So long as an agent treats what it thinks is happening, as only one possibility, then there is not contradiction because possible-X is always compatible with possibly not-X.

It's hard to tell, since while common sense is sometimes wrong, it's right more often than not. An idea being common sense shouldn't count against it, even though like the article said it's not conclusive.

Seems to me that before a philosophical problem is solved, it becomes a problem in some other field of study. Atomism used to be a philosophical theory. Now that we know how to objectively confirm it, it (or rather, something similar but more accurate) is a scientific theory.

It seems that philosophy (at least, the parts of philosophy that are actively trying to progress) is about trying to take concepts that we have intuitive notions of, and figure out what if anything those concepts actually refer to, until we succeed at this well enough that to study the... (read more)

When "pure thought" tells you that 1 + 1 = 2, "independently of any experience or observation", you are, in effect, observing your own brain as evidence.

I mean, yeah? You can still do that in your armchair, without looking at anything outside of yourself. Mathematical facts are indeed "discoverable by the mere operation of thought, without dependence on what is anywhere existent in the universe," if you modify the statement a little to say "anywhere else existent" in order to acknowledge that the operation of tho... (read more)

Perhaps in many cases, if "X wants Y" then that means X will do or bring about Y unless it is prevented by something external. In some cases X is an unconscious optimization procedure, which therefore "wants" the thing that it is optimizing, in other cases X is the output of some optimization procedure, as in the case of a program that "wants" to complete its task or a microorganism that "wants" to reproduce, but optimization is not always involved, as illustrated by "high-pressure gas wants to expand".

1jmh
I get what you are saying an as such it may well be harmless. However, it's a bit odd to say the light wants to stay, so I have to toggle the switch to prevent it from staying on. Yes, that is true but in reality it is really that the electrons "want" to flow to ground and will do so through the light bulb, so producing the illumination, as they flow. So if we don't know much about electricity, saying the "light wants" may lead to a lot of troubleshooting of the bulb when the circuit breaker has been thrown. And that is part of my musing here. How often might we simplify, abstract or rely on metaphor when we lack more specific knowledge.

I think an important consideration is the degree of catastrophe. Even the asteroid strike, which is catastrophic to many agents on many metrics, is not catastrophic on every metric, not even every metric humans actually care about. An easy example of this is prevention of torture, which the asteroid impact accomplishes quite smoothly, along with almost every other negative goal. The asteroid strike is still very bad for most agents affected, but it could be much, much worse, as with the "evil" utility function you alluded to, which is very bad f... (read more)

3TurnTrout
Sure, but just like it makes sense to be able to say that a class of outcomes is "good" without every single such outcome being maximally good, it makes sense to have a concept for catastrophes, even if they're not literally the worst things possible. Building a powerful agent helping you get what you want, doesn't destroy your ability to get what you want. By my definition, that's not a catastrophe. Correct. Again, I don't mean to say that any catastrophe is literally the worst outcome possible.
But, over the lifetime of civilization, our accumulated experience led us to update this prior, and single out the complexity measure suggested by math.

I may be picking nits, here, but what exactly does it mean to "update a prior"?

And as a mathematical consideration, is it in general possible to switch your probabilities from one (limit computable) universal prior to another with a finite amount of evidence?

1Gurkenglas
Two priors could indeed start out diverging such that you cannot reach one from the other with finite evidence. Strange loops help here: One of the hypotheses the brain's prior admits is that the universe runs on math. This hypothesis predicts what you'd get by having used a mathematical prior from day one. Natural philosophy (and, by today, peer pressure) will get most of us enough evidence to favor it, and then physicist's experiments single out description length as the correct prior. But the ways in which the brain's prior diverges are still there, just suppressed by updating; and given evidence of magic we could update away again if math is bad enough at explaining it.

No way I'd take that bet on even odds. Though I do think it's better than even odds. It's kind of hard to figure out how I feel about this.

Uh, if you're worried about UFAI I'd be more concerned about your digital footprint. The concern with UFAI is that it might decide to torture a clone of you(who isn't the same as you unless the UFAI has a ton of other information about you, which is a separate thing) instead of somebody else. It doesn't seem that much worse from a selfless or selfish point of view.

1Mati_Roy
I personally would rather an FAI be able to bring me back than preventing an UFAI from doing so

Funny you mention AlphaGo, since the first time AlphaGo(or indeed any computer) beat a professional go player(Fan Hui), it was distributed across multiple computers. Only later did it become strong enough to beat top players with only a single computer.

This is one of those things that seems obvious but it did cause some things to click for me that I hadn't thought of before. Previously my idea of AGI becoming uncontrollable was basically that somebody would make a superintelligent AGI in a box, and we would be able to unplug it anytime we wanted, and the real danger would be the AGI tricking us into not unplugging it and letting it out of the box instead. What changed this view was this line: "Try to unplug Bitcoin." Once you think of it that way it does seem pretty obvious that the most p... (read more)

4Donald Hobson
Algorithms don't have a single "power" setting. It is easier to program a single computer than to make a distributed fault tolerant system. Algorithms like alpha go are run on a particular computer with an off switch, not spread around. Of course, a smart AI might soon load its code all over the internet, if it has access. But it would start in a box.

I think that fully specifying human values may not be the best approach to an AI utopia. Rather, I think it would be easier and safer to tell the AI to upload humans and run an Archipelago-esque simulated society in which humans are free to construct and search for the society they want, free from many practical problems in the world today such as resource scarcity.

We're talking about the impact of an event though. The very question is only asking about worlds where the event actually happens.

If I don't know whether an event is going to happen and I want to know the impact it will have on me, I compare futures where the event happens to my current idea of the future, based on observation(which also includes some probability mass for the event in question, but not certainty).

In summary, I'm not updating to "X happened with certainty" rather I am estimating the utility in that counterfactual case.

Rot13:

Gur vzcnpg bs na rirag ba lbh vf gur qvssrerapr orgjrra gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung gur rirag jvyy unccra, naq gur pheerag rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba.

Zber sbeznyyl, jr fnl gung gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba vf gur fhz, bire nyy cbffvoyr jbeyqfgngrf K, bs C(K)*H(K), juvyr gur rkcrpgrq inyhr bs lbhe hgvyvgl shapgvba tvira pregnvagl gung n fgngrzrag R nobhg gur jbeyq vf gehr vf gur fhz bire nyy cbffvoyr jbeyqfgngrf K bs C(K|R)*H(K). Gur vzcnpg bs R orvat gehr, gura, vf gur nofbyhgr inyhr bs gur qvssrerapr bs gubfr gjb dhnagvgvrf.

4TurnTrout
Translation to normal spoiler text:

Because assuming Provable(C)->C as a hypothesis doesn't allow you to prove C. Rather, the fact that a proof exists of Provable(C)->C allows you to construct a proof of C.

The proof doesn't work on a logically uncertain agent. The logic fails here:

Examining the source code of the agent, because we're assuming the agent crosses, either PA proved that crossing implies U=+10, or it proved that crossing implies U=0.

A logically uncertain agent does not need a proof of either of those things in order to cross, it simply needs a positive expectation of utility, for example a heuristic which says that there's a 99% chance crossing implies U=+10.

Though you did say there's a version which still works for logical ... (read more)

I've now edited the post to give the version which I claim works in the empirically uncertain case, and give more hints for how it still goes through in the fully logically uncertain case.

The Riemann argument seems to differ from the Great Filter argument in this way: the Riemann argument depends only on the sheer number of observers, i.e. the only thing you're taking into account is the fact that you exist. Whereas in the great filter argument you're updating based on what kind of observer you are, i.e. you're intelligent but not a space-travelling, uploaded posthuman.


The first kind of argument doesn't work because somebody exists either way: if the RH or whatever is false then you are one of a small number, if it'... (read more)

That's the funniest thing I've seen all day.

Seems to me that if an agent with a reasonable heuristic for logical uncertainty came upon this problem, and was confident but not certain of its consistency, it would simply cross because expected utility would be above zero, which is a reason that doesn't betray an inconsistency. (Besides, if it survived it would have good 3rd party validation of its own consistency, which would probably be pretty useful.)

I agree that "it seems that it should". I'll try and eventually edit the post to show why this is (at least) more difficult to achieve than it appears. The short version is that a proof is still a proof for a logically uncertain agent; so, if the Löbian proof did still work, then the agent would update to 100% believing it, eliminating its uncertainty; therefore, the proof still works (via its Löbian nature).

Regarding your comments on SPECKS preferable to TORTURE, I think that misses the argument they made. The reason you have to prefer 10N at X to N at X' at some point, is that a speck counts as a level of torture. That's exactly what OP was arguing against.

1Dacyn
The OP didn't give any argument for SPECKS>TORTURE, they said it was "not the point of the post". I agree my argument is phrased loosely, and that it's reasonable to say that a speck isn't a form of torture. So replace "torture" with "pain or annoyance of some kind". It's not the case that people will prefer arbitrary non-torture pain (e.g. getting in a car crash every day for 50 years) to a small amount of torture (e.g. 10 seconds), so the argument still holds.

Non-Archimedean utility functions seem kind of useless to me. Since no action is going to avoid moving the probability of any outcome by more than 1/3^^^3, absolutely any action is important only insomuch as it impacts the highest lexical level of utility. So you might as well just call that your utility function.