Comment author: benkuhn 10 December 2013 12:40:18AM *  1 point [-]

I'm confused by a couple minor points here, also:

  1. The paper asks for a "probability distribution over models of L". In fact, for many languages L, models of L form a proper class. Does this cause measure-theoretic difficulties? It seems like this might force mu to be zero on all sufficiently large models (otherwise you can do some sort of transfinite induction to get sets of unbounded measure) but I'm not very good at crazy set theory stuff.

  2. At one point the authors state "We would like P(forall phi in L' <blah>)". I thought we were in a first-order language and therefore couldn't quantify over propositions?

  3. It's not immediately clear to me that this actually constructs a measure on the set of theories: that is, if S is the set of all complete consistent theories, it's not clear to me that for the mu we construct by martingale, mu(S) = 1 (or even that mu(S) != 0). Mightn't additivity break when we take the limit and get a whole theory rather than just an incomplete bag of axioms?

Comment author: pengvado 11 December 2013 04:09:45AM 1 point [-]
  1. Can we instead do "probability distribution over equivalence classes of models of L", where equivalence is determined by agreement on the truth-values of all first order sentences? There's only 2^ℵ₀ of those, and the paper never depends on any distinction within such an equivalence class.
Comment author: jsteinhardt 30 November 2013 12:52:18AM *  0 points [-]

Thanks for the feedback. I agree that reductio ad absurdum is the weakest of the examples I gave, but let me try to justify it anyways: if X is a fully general counterargument, then we can use it to argue against true statements as well as false ones. So applying X without any additional justification would lead to patently false conclusions, and therefore (by reductio ad absurdum) X is not a valid form of reasoning. Perhaps this is not the best word for it, but it is similar to a very pervasive idea in mathematics, where when formulating possible approaches to prove a theorem, a key criterion is whether those approaches can distinguish between the theorem and similar statements that are known or suspected to be false.

ETA: And yes, I agree that specific examples are good!

Comment author: pengvado 30 November 2013 06:18:09PM 3 points [-]

Yes, that's the usual application, but it's the wrong level of generality to make them synonyms. "Fully general counterargument" is one particular absurdity that you can reduce things to. Even after you've specified that you're performing a reductio ad absurdum against the proposition "argument X is sound", you still need to say what the absurd conclusion is, so you still need a term for "fully general counterargument".

Comment author: Irgy 18 November 2013 05:12:27AM 6 points [-]

It seems like something has gone terribly wrong when our ethical decisions depend on our interpretation of quantum mechanics.

My understanding was that many-worlds is indistiguishable by observation from the Copenhagen interpretation. Has this changed? If not, it frightens me that people would choose a higher chance of the world ending to rescue hypothetical people in unobservable universes.

If anything this seems like a (weak) argument in favour of total utilitarianism, in that it doesn't suffer from giving different answers according to one's choice among indistiguishable theories.

Comment author: pengvado 18 November 2013 08:50:13AM 11 points [-]

Why should you not have preferences about something just because you can't observe it? Do you also not care whether an intergalactic colony-ship survives its journey, if the colony will be beyond the cosmological horizon?

Comment author: passive_fist 20 October 2013 04:04:40AM 0 points [-]

Retention time can be increased to days or perhaps weeks by cooling to cryogenic temperatures before power down

Actually no, modern DRAM loses information in milliseconds, even assuming you could cool it down to liquid helium temperatures. After a few seconds the data is almost entirely random.

Comment author: pengvado 20 October 2013 05:20:42AM 3 points [-]

Here's a citation for the claim of DRAM persisting with >99% accuracy for seconds at operating temperature or hours at LN2. (The latest hardware tested there is from 2007. Did something drastically change in the last 6 years?)

Comment author: [deleted] 01 October 2013 05:03:19PM 0 points [-]

I wrote up about a page-long reply, then realized it probably deserves its own posting. I'll see if I can get to that in the next day or so. There's a wide spectrum of possible solutions to the personal identity problem, from physical continuity (falsified) to pattern continuity and causal continuity (described by Eliezer in the OP), to computational continuity (my own view, I think). It's not a minor point though, whichever view turns out to be correct has immense ramifications for morality and timeless decision theory, among other things...

In response to comment by [deleted] on Timeless Identity
Comment author: pengvado 01 October 2013 06:09:33PM 2 points [-]

What relevance does personal identity have to TDT? TDT doesn't depend on whether the other instances of TDT are in copies of you, or in other people who merely use the same decision theory as you.

Comment author: Nornagest 21 September 2013 06:52:07AM *  1 point [-]

It looks to me like you want a cryptographically secure pseudo-random number generator restricted to the output space {0, 1} and with a known seed. That's unbiased and intractable pretty much by definition, indexable up to some usually very large periodicity, and typically verifiable and simple to refer to because that's standard practice in the security world.

There's plenty of PRNGs out there, and you can simply truncate or mod their outputs to give you the binary output you want; Fortuna looks like a strong candidate to me.

(I was going to suggest the Mersenne twister, which I've actually implemented before, but on further examination it doesn't look cryptographically strong.)

Comment author: pengvado 21 September 2013 11:13:14AM 3 points [-]

That works with caveats: You can't just publish the seed in advance, because that would allow the player to generate the coin in advance. You can't just publish the seed in retrospect, because the seed is an ordinary random number, and if it's unknown then you're just dealing with an ordinary coin, not a logical one. So publish in advance the first k bits of the pseudorandom stream, where k > seed length, thus making it information-theoretically possible but computationally intractable to derive the seed; use the k+1st bit as the coin; and then publish the seed itself in retrospect to allow verification.

Possible desiderata that are still missing: If you take multiple coins from the same pseudorandom stream, then you can't allow verification until the end of the whole experiment. You could allow intermediate verification by committing to N different seeds and taking one coin from each, but that fails wedrifid's desideratum of a single indexable problem (which I assume is there to prevent Omega from biasing the result via nonrandom choice of seed?).

I can get both of those desiderata at once using a different protocol: Pick a public key cryptosystem, a key, and a hash function with a 1-bit output. You need a cryptosystem where there's only one possible signature of any given input+key, i.e. one that doesn't randomize encryption. To generate the Nth coin: sign N, publish the signature, then hash the signature.

Comment author: homunq 07 September 2013 01:09:25PM *  2 points [-]

There are a number of possibilities still missing from the discussion in the post. For example:

  • There might not be any such thing as a friendly AI. Yes, we have every reason to believe that the space of possible minds is huge, and it's also very clear that some possibilities are less unfriendly than others. I'm also not making an argument that fun is a limited resource. I'm just saying that there may be no possible AI that takes over the world without eventually running off the rails of fun. In fact, the question itself seems superficially similar to the halting problem, where "running off the rails" is the analogue for "halting"; suggesting that even if friendliness existed, it might not be rigorously provable. (note: this analogy doesn't say what I think it says; see response below. But I still mean to say what I thought; a friendly world may be fundamentally less stable than a simple infinite loop, perhaps to the point of being unprovable.)

  • Alternatively, building a "Friendly-enough" AI may be easier than you think. Consider the game of go. Human grandmasters (professional 9-dan players) have speculated that "God" (that is, perfect play) would rate about 13 dan professionally; that is, that they could beat such a player more than half the time given a 3 or 4 stone handicap. Replace "go" with "taking over the world", "professional 9-dan player" with "all of humanity put together", and "3 or 4 stone handicap" with "relatively simple-to-implement Asimov-type safeguards", and it is possible that this describes the world. And it is also possible that a planetary computer would still "only be 12-dan"; that is, that additional computing power shows sharply diminishing intelligence returns at some point "short of perfection", to the point where a mega-computer would still be noticeably imperfect.

There may be good reasons not to spend much time thinking about the possibilities that FAI is impossible or "easy". I know that people around here have plenty of plausible arguments for why these possibilities are small; and even if they are appreciable, the contrary possibility (that FAI is possible but hard) is probably where the biggest payoffs lie, and so merits our focus. And the OP discussion does seem valid for that possible-hard case. But I still think it would be improved by stating these assumptions up-front, rather than hiding or forgetting about them.

Comment author: pengvado 07 September 2013 07:22:29PM *  9 points [-]

In fact, the question itself seems superficially similar to the halting problem, where "running off the rails" is the analogue for "halting"

If you want to draw an analogy to halting, then what that analogy actually says is: There are lots of programs that provably halt, and lots that provably don't halt, and lots that aren't provable either way. The impossibility of the halting problem is irrelevant, because we don't need a fully general classifier that works for every possible program. We only need to find a single program that provably has behavior X (for some well-chosen value of X).

If you're postulating that there are some possible friendly behaviors, and some possible programs with those behaviors, but that they're all in the unprovable category, then you're postulating that friendliness is dissimilar to the halting problem in that respect.

Comment author: Gurkenglas 03 September 2013 08:08:48PM *  0 points [-]

Similarly: What are a qubit's specs? I would like to be able to think about what class of problem would be trivial with a quantum computer.

Comment author: pengvado 04 September 2013 01:29:52AM *  2 points [-]

Then what you should be asking is "which problems are in BQP?" (if you just want a summary of the high level capabilities that have been proved so far), or "how do quantum circuits work?" (if you want to know what role individual qubits play). I don't think there's any meaningful answer to "a qubit's specs" short of a tutorial in the aforementioned topics. Here is one such tutorial I recommend.

Comment author: Lumifer 30 August 2013 04:47:10PM *  1 point [-]

I don't understand the meaning of the words "want", "innately sticky", and "honestly have a goal" as applied to an AI (and not to a human).

Constraints have to influence behavior by enumerating EVERYTHING you don't want to happen

Not at all. Constraints block off sections of solution space which can be as large as you wish. Consider a trivial set of constraints along the lines of "do not affect anything outside of this volume of space", "do not spend more than X energy", or "do not affect more than Y atoms".

Comment author: pengvado 31 August 2013 02:18:47AM 3 points [-]

"do not affect anything outside of this volume of space"

Suppose you, standing outside the specified volume, observe the end result of the AI's work: Oops, that's an example of the AI affecting you. Therefore, the AI isn't allowed to do anything at all. Suppose the AI does nothing: Oops, you can see that too, so that's also forbidden. More generally, the AI is made of matter, which will have gravitational effects on everything in its future lightcone.

Comment author: asr 21 August 2013 05:08:50AM *  3 points [-]

What happens if my valuation is noncircular, but is incomplete? What if I only have a partial order over states of the world? Suppose I say "I prefer state X to Z, and don't express a preference between X and Y, or between Y and Z." I am not saying that X and Y are equivalent; I am merely refusing to judge.

My impression is that real human preference routinely looks like this; there are lots of cases people refuse to evaluate or don't evaluate consistently.

It seems like even with partial preferences, one can be consequentialist -- if you don't have clear preferences between outcomes, you have a choice that isn't morally relevant. Or is there a self-contradiction lurking?

Comment author: pengvado 21 August 2013 05:37:45PM *  1 point [-]

Suppose I say "I prefer state X to Z, and don't express a preference between X and Y, or between Y and Z." I am not saying that X and Y are equivalent; I am merely refusing to judge.

If the result of that partial preference is that you start with Z and then decline the sequence of trades Z->Y->X, then you got dutch booked.

Otoh, maybe you want to accept the sequence Z->Y->X if you expect both trades to be offered, but decline each in isolation? But then your decision procedure is dynamically inconsistent: Standing at Z and expecting both trade offers, you have to precommit to using a different algorithm to evaluate the Y->X trade than you will want to use once you have Y.

View more: Prev | Next