All of JGWeissman's Comments + Replies

"Rationality" seems to give different answer to the same problem posed with different affine transformations of the players' utility functions.

0Tyrrell_McAllister
[Still arguing with tongue in cheek...] That's where the measures p and q come in.

Error: Adding values from different utility functions.

See this comment.

1Tyrrell_McAllister
[Resuming my tongue-in-cheek argument...] It is true that adding different utility functions is in general an error. However, for agents bound to follow Rationality (and Rationality alone), the different utility functions are best thought of as the same utility function conditioned on different hypotheses, where the different hypotheses look like "The utility to P2 turns out to be what really matters". After all, if the agents are making their decisions on the basis of Rationality alone, then Rationality alone must have a utility function. Since Rationality is universal, the utility function must be universal. What alternative does Rationality have, given the constraints of the problem, other than a weighted sum of the utility functions of the different individuals who might turn out to matter?

Eliezer's "arbitrary" strategy has the nice property that it gives both players more expected utility than the Nash equilibrium. Of course there are other strategies with this property, and indeed multiple strategies that are not themselves dominated in this way. It isn't clear how ideally rational players would select one of these strategies or which one they would choose, but they should choose one of them.

Why not "P1: C, P2: Y", which maximizes the sum of the two utilities, and is the optimal precommitment under the Rawlian veil-of-ignorance prior?

If we multiply player 2's utility function by 100, that shouldn't change anything because it is an affine transformation to a utility function. But then "P1: B, P2: Y" would maximize the sum. Adding values from different utility functions is a meaningless operation.

1itaibn0
You're right. I'm not actually advocating this option. Rather, I was comparing EY's seemingly arbitrary strategy with other seemingly arbitrary strategies. The only one I actually endorse is "P1: A". It's true that this specific criterion is not invariant under affine transformations of utility functions, but how do I know EY's proposed strategy wouldn't change if we multiply player 2's utility function by 100 as you propose? (Along a similar vein, I don't see how I can justify my proposal of "P1: 3/10 C 7/10 B". Where did the 10 come from? "P1: 2/7 C 5/7 B" works equally well. I only chose it because it is convenient to write down in decimal.)

The reason player 1 would choose B is not because it directly has a higher payout but because including B in a mixed strategy gives player 2 an incentive to include Y in its own mixed strategy, increasing the expected payoff of C for player 1. The fact that A dominates B is irrelevant. The fact that A has better expected utility than the subgame with B and C indicates that player 1 not choosing A is somehow irrational, but that doesn't give a useful way for player 2 to exploit this irrationality. (And in order for this to make sense for player 1, player 1 would need a way to counter exploit player 2's exploit, and for player 2 to try its exploit despite this possibility.)

0James_Miller
"The reason player 1 would choose B is not because it directly has a higher payout but because including B in a mixed strategy gives player 2 an incentive to include Y in its own mixed strategy, " No since Player 2 only observes Player 1's choice not what probabilities Player 1 used.

The definition you linked to doesn't say anything about entering subgame not giving the players information, so no, I would not agree with that.

I would agree that if it gave player 2 useful information, that should influence the analysis of the subgame.

(I also don't care very much whether we call this object within the game of how the strategies play out given that player 1 doesn't choose A a "subgame". I did not intend that technical definition when I used the term, but it did seem to match when I checked carefully when you objected, thinking t... (read more)

0James_Miller
"I also disagree that player 1 not picking A provides useful information to player 2." Player 1 gets 3 if he picks A and 2 if he picks B, so doesn't knowing that Player 1 did not pick A provide useful information as to whether he picked B?

I'm sorry but "subgame" has a very specific definition in game theory which you are not being consistent with.

I just explained in detail how the subgame I described meets the definition you linked to. If you are going to disagree, you should be pointing to some aspect of the definition I am not meeting.

Also, intuitively when you are in a subgame you can ignore everything outside of the subgame, playing as if it didn't exist. But when Player 2 moves he can't ignore A because the fact that Player 1 could have picked A but did not provides insi

... (read more)
4James_Miller
Let's try to find the source of our disagreement. Would you agree with the following: "You can only have a subgame that excludes A if the fact that Player 1 has not picked A provides no useful information to Player 2 if Player 2 gets to move."

To see that it is indeed a subgame:

Represent the whole game with a tree whose root node represents player 1 choosing whether to play A (leads to leaf node), or to enter the subgame at node S. Node S is the root of the subgame, representing player 1's choices to play B or C leading to nodes representing player 2 choice to play X or Y in those respective cases, each leading to leaf nodes.

Node S is the only node in its information set. The subgame contains all the descendants of S. The subgame contains all nodes in the same information set as any node in the ... (read more)

-1James_Miller
I'm sorry but "subgame" has a very specific definition in game theory which you are not being consistent with. Also, intuitively when you are in a subgame you can ignore everything outside of the subgame, playing as if it didn't exist. But when Player 2 moves he can't ignore A because the fact that Player 1 could have picked A but did not provides insight into whether Player 1 picked B or C. I am a game theorist.

Classical game theory says that player 1 should chose A for expected utility 3, as this is better than than the sub game of choosing between B and C where the best player 1 can do against a classically rational player 2 is to play B with probability 1/3 and C with probability 2/3 (and player 2 plays X with probability 2/3 and Y and with probability 1/3), for an expected value of 2.

But, there are pareto improvements available. Player 1's classically optimal strategy gives player 1 expected utility 3 and player 2 expected utility 0. But suppose instead Playe... (read more)

1Joshua_Blaine
Two TDT players have 3 plausible outcomes to me, it seems. This comes from my admittedly inexperienced intuitions, and not much rigorous math. The 1st two plausible points that occurred to me are 1)both players choose C,Y, with certainty, or 2)they sit at exactly the equilibrium for p1, giving him an expected payout of 3, and p2 an expected payout of .5. Both of these improve on the global utility payout of 3 that's gotten if p1 just chooses A (giving 6 and 3.5, respectively), which is a positive thing, right? The argument that supports these possibilities isn't unfamiliar to TDT. p2 does not expect to be given a choice, except in the cases where p1 is using TDT, therefore she has the choice of Y, with a payout of 0, or not having been given a chance to chose at all. Both of these possibilities have no payout, so p2 is neutral about what choice to make, therefore choosing Y makes some sense. Alternatively, Y has to choose between A for 3 or C for p(.5)*(6), which have the same payout. C, however, gives p2 .5 more utility than she'd otherwise get, so it makes some sense for p1 to pick C. Alternatively, and what occurred to me last, both these agents have some way to equally share their "profit" over Classical Decision Theory. For however much more utility than 3 p1 gets, p2 gets the same amount. This payoff point (p1-3=p2) does exists, but I'm not sure where it is without doing more math. Is this a well formulated game theoretic concept? I don't know, but it makes some sense to my idea of "fairness", and the kind of point two well-formulated agents should converge on.
0James_Miller
"Classical game theory says that player 1 should chose A for expected utility 3, as this is better than than the sub game of choosing between B and C " No since this is not a subgame because of the uncertainty. From Wikipedia " In game theory, a subgame is any part (a subset) of a game that meets the following criteria...It has a single initial node that is the only member of that node's information set... " I'm uncertain about what TDT/UDT would say.

If you have trouble confronting people, you make a poor admin.

Can we please act like we actually know stuff about practical instrumental rationality given how human brains work, and not punish people for openly noticing their weaknesses.

You could have more constructively said something like "Thank you for taking on these responsibilities even though it sometimes makes you uncomfortable. I wonder if anyone else who is more comfortable with that would be willing to help out."

Shmi130

not punish people for openly noticing their weaknesses.

Thanks! Yes, that's a good point. On the other hand, willingness to confront problem users is one of the absolute minimum requirements for a forum moderator. I suppose Kaj was not expected to do the moderator's job, probably just behind-the-scene maintenance, and I assumed too much. Sorry, Kaj!

That said, a competent active forum moderator is required to deal with this particular issue, and I am yet to see one here.

I use whole life insurance. If you use term insurance, you should have a solid plan for an alternate funding source to replace your insurance at the end of the term.

I believe the Efficient Market Hypothesis is correct enough that reliably getting good results from buying term insurance and investing the premium difference would be a lot of work if possible at all.

Mere survival doesn't sound all that great. Surviving in a way that is comforting is a very small target in the general space of survival.

0Shmi
Beats dying if you believe that some day you will be saved BY THE POWER OF SCIENCE!

By saying "clubs", I communicate the message that my friend would be better off betting $1 on a random club than $2 on the seven of diamonds, (or betting $1 on a random heart or spade), which is true, so I don't really consider that lying.

If, less conveniently, my friend takes what I see to literally mean the suit of the top card, but I still can get them to not bet $2 on the wrong card, then I bite the bullet and lie.

0Scott Garrabrant
I expect most people here would bite that bullet, but I am not sure if everyone here will. "Never Lie" seems like a rather convenient Schelling Fence.

And there's a very real danger of this being a fully general counterargument against any sufficiently simple moral theory.

Establishing a lower bound on the complexity of a moral theory that has all the features we want seems like a reasonable thing to do. I don't think the connotations of "fully general counterargument" are appropriate here. "Fully general" means you can apply it against a theory without really looking at the details of the theory. If you have to establish that the theory is sufficiently simple before applying the co... (read more)

0blacktrance
"This theory is too simple" is something that can be argued against almost any theory you disagree with. That's why it's fully general.

and the number of possible models for T rounds is exponential in T

??? Here n is the number of other people betting. It's a constant.

Within a single application of online learning, n is a constant, but that doesn't mean we can't look at the consequences of it having particular values, even values that vary with other parameters. But, you seem to be agreeing with the main points that if you use all possible models (or "super-people") the regret bound is meaningless, and that in order to reduce the number of models so it is not meaningless, wh... (read more)

Everyone with half a brain could game them either to shorten their stay or to get picked as a leader candidate.

Maybe that's the test.

4hyporational
I talked to other people who gamed the test and they usually got they wanted, which happened to be shortened stay for all of them. I myself didn't pick just the wrong answers, but tried to randomly pick some right answers as well. Considering that none of the low-mid rank officers seemed to be the sharpest tools in the box, maybe you're right :) Perhaps the psychologist who designed the test was a pacifist.

Regarding myth 5 and the online learning, I don't think the average regret bound is as awesome as you claim. The bound is square root( (log n) / T). But if there are really no structural assumptions, then you should be considering all possible models, and the number of possible models for T rounds is exponential in T, so the bound ends up being 1, which is the worst possible average regret using any strategy. With no assumptions of structure, there is no meaningful guarantee on the real accuracy of the method.

The thing that is awesome about the bounds guar... (read more)

0jsteinhardt
??? Here n is the number of other people betting. It's a constant. If you wanted to, you could create "super-people" that mix and match the bets of other people depending on the round. Then the number of super-people grows exponentially in T, and without further assumptions you can't hope to be competitive with such "super-people". If that's what you're saying, then I agree with that. And I agree with the broader point that in general you need to make structural assumptions to make progress. The thing that's awesome about the regret bound is that it does well even in the presence of correlated, non-i.i.d., maybe even adversarial data, and even if the "true hypothesis" isn't in the family of models we consider.

The obvious steelman of dialogue participant A would keep the coin hidden but ready to inspect, so that A can offer bets having credible ignorance of the outcomes and B isn't justified in updating on A offering the bet.

Yvain says that people claim to be using one simple deontological rule "Don't violate consent" when in fact they are using a complicated collection of rules of the form "Don't violate consent in this specific domain" while not following other rules of that form.

And yet, you accuse him of strawmanning their argument to be simple.

0MugaSofer
Sort of, yes. I definitely need to write a full post on why I believe his criticism is subtly unfair in various ways - likely because this is an emotional subject for him, so he is somewhat less inclined to pull his punches and steelman opposing views; and he is both a brilliant writer and a brilliant thinker. Actually, he accuses them of claiming to, and advocating following those rules only in those situations where doing so agrees with their agenda - which he characterizes, not unreasonably. A charge of hypocrisy, rather than inconsistency. I do? I accuse him of strawmanning their arguments to be cartoonishly poor arguments, but simple...? Ah! Are you perhaps referring to my characterization of simple deontological rules ("thou shalt not kill" etc.)? Yes, I would generally reject those as overly simple - there are many situation where one might be called upon to kill for the greater good, for example. (There are vast differences between deontological ethics, rule utilitarianism, and the optimal laws for legal systems both real and hypothetical.)

Arguing that the consequentialist approach is better than the deontological approach is different than skipping that step and going straight to refuting your own consequentialist argument for the position others were arguing on deontological grounds. Saying they should do some expected utility calculations is different than saying the expected utility calculations the haven't done are wrong.

-2MugaSofer
Except he isn't doing that. He's misrepresenting people's arguments (due to misunderstanding?), tearing his strawman apart, and then "explaining" the poor quality of this argument by declaring that his opponents are lying about their beliefs, and their actual beliefs consist of simple deontological rules. ... and obviously, an arbitrary set of deontological rules is not an argument, so he no longer has to actually disprove it. I'm starting to think I need to write a larger deconstruction of his post, actually, but I hope you see what I mean. (Thank Azathoth that Yvain is such a clear writer and thinker so I can show this so simply with quotes like this. Although I suppose he wouldn't have as many of us caring what he writes if it wasn't worth reading.)

I thought he was ramming home his point that his opponents are secretly deontologists there

I think the point was that his opponents are openly deontologists, making openly deontological arguments for their openly deontological position, and therefor they are rightly confused and not moved by Yvain's refutation of a shoehorning of their position into a consequentialist argument when they never made any such argument, which Yvain now understands and therefor he doesn't do that anymore.

0MugaSofer
Well, this is what comes immediately after the quoted paragraph, for context: So my interpretation doesn't seem entirely unreasonable. I haven't finished rereading the whole post yet, though.

This seems to be overloading the term "side effects". The functional programming concept of side effects (which it says its functions shouldn't have) is changing the global state of the program that invokes them other than by returning the value. It makes no claims of these other concepts of a program being affected by analyzing the source code of the function independent of invoking it or of the the function running on morally relevant causal structure.

1Scott Garrabrant
Yes, perhaps I should not have called it that, but the two concepts seem very similar to me. While the things I talk about do not fit in the definition of side effect from functional languages, I think that it is similar enough that the analogy should be made. Perhaps I should have made the analogy, but used a different term.

It seems to me that in the quote Yvain is admitting an error, not celebrating victory. Try taking his use of the word "reasonably" at face value.

0MugaSofer
Really? When I read that article, I thought he was ramming home his point that his opponents are secretly deontologists there - hence the title of the post in question. Perhaps I too have failed to apply the principle of charity. (Insert metahumourous joke about not bothering because of the OP's topic here.)

CFAR can achieve its goal of creating effective, rational do-gooders by taking existing do-gooders and making them more effective and rational. This is why they offer scholarships to existing do-gooders. Their goal is not to create effective, rational do-gooders out of blank slates but make valuable marginal increases in this combination of traits, often by making people who already rank highly in these areas even better.

They also use the same workshops to make people in general more effective and rational, which they can charge money for to fund the work... (read more)

-2brazil84
I wasn't aware that this was the strategy; perhaps I read the original post too quickly. Well are they attempting to turn non-do-gooders into do-gooders? Perhaps, but that strikes me as a dangerous first step towards a kind of mission creep. Towards a scenario (3) or (4). Same problem.

CFAR does offer to refund the workshop fee if after the fact participants evaluate that it wasn't worth it. They also solicit donations from alumni. So they are kind of telling participants to evaluate the value provided by CFAR and pay what they think is appropriate, while providing an anchor point and default which covers the cost of providing the workshop. That anchor point and default are especially important for the many workshop participants who are not selected for altruism, who probably will learn a lot of competence and epistemic rationality but not much altruism, and whose workshop fees subsidize CFAR's other activities.

-7brazil84

my feeling about CFAR is that they are providing a service to individuals for money and it's probably not a terrible idea to let the market determine if their services are worth the amount they charge.

I think that CFAR's workshops are self funding and contribute to paying for organizational overhead. Donated funds allow them to offer scholarships to their workshops to budding Effective Altruists (like college students) and run the SPARC program (targeting mathematically gifted children who may be future AI researchers). So, while CFAR does provide a ser... (read more)

Yes; this is correct. The workshops pay for themselves, and partially subsidize the rest of our activities; but SPARC, scholarships for EA folk, and running CFAR training at the effective altruism summit don't; nor does much desirable research. We should have a longer explanation of this up later today.

Edited to add: Posted.

I'm convinced AGI is much more likely to be built by a government or major corporation, which makes me more inclined to think movement-building activities are likely to be valuable, to increase the odds of the people at that government or corporation being conscious of AI safety issues, which MIRI isn't doing.

MIRI's AI workshops get outside mathematicians and AI researchers involved in FAI research, which is good for movement building within the population of people likely to be involved in creating an AGI.

3[anonymous]
MIRI hasn't been very successful in brining in outside AI researchers. What basis is there for thinking that mathmaticians will have significant impact on the creation of AGI?

Qiaochu's answer seems off. The argument that the parent AI can already prove what it wants the successor AI to prove and therefore isn't building a more powerful successor, isn't very compelling because being able to prove things is a different problem than searching for useful things to prove. It also doesn't encompass what I understand to be the Lobian obstacle, that being able to prove that if your own mathematical system proves something that thing is true implies that your system is inconsistent.

Is there more context on this?

1Qiaochu_Yuan
It's entirely possible that my understanding is incomplete, but that was my interpretation of an explanation Eliezer gave me once. Two comments: first, this toy model is ignoring the question of how to go about searching for useful things to prove; you can think of the AI and its descendants as trying to determine whether or not any action leads to goal G. Second, it's true that the AI can't reflectively trust itself and that this is a problem, but the AI's action criterion doesn't require that it reflectively trust itself to perform actions. However, it does require that it trust its descendants to construct its descendants.

Then, back in Azkaban, facing that auror, when Quirrel used Avadra Kevadra in an attempt to force the auror to dodge

What do you think you know about which spell Quirrell used, and how do you think you know it?

0CCC
I think Quirrel used a powerful killing curse. I think that this is the case mainly because I don't think a stunner would have had as much effect on a patronus as the spell which Quirrel used; also because, even when facing a significant decline in his reputation in Harry's eyes (in hostile territory) Quirrel did not think to try to claim that it was a green stunner, leading me to the tentative conclusion that Quirrel had not thought of a green stunner at that time.

It sounds to me like somebody is purchasing utilons, using themselves as an example to get other people to also purchase utilons, and incidentally deriving a small amount of well deserved status from the process.

-7V_V

Can you give an example of a property a star might have because having that property made its ancestor stars better at producing descendant stars with that property?

0Shmi
Sorry, I'm not an expert in stellar physics. Possibly metallicity, or maybe something else relevant. My original point was to agree that there is no good definition of "life" which does not include some phenomena we normally don't think of as living.

Is Jaynes' PT:LOS able to be read by moi, given that I know basic set theory?

A good way to find out would be to try reading it.

With math, it's useful to be able to distinguish books you can't understand because you're missing prerequisite knowledge from books you can't understand because you just aren't reading them carefully enough. The prevailing wisdom seems to be that you can't really expect to be able to follow Jaynes through if you pick it up as your first serious textbook on probability.

However, it's actually determined what your decision is - any Laplacian demon could deduce it from looking at your brain. It's all pretty clear, and quantum events are not enough to derail it (barring very very low measure stochastic events). So from the universe's perspective, you're not choosing anything, not shifting measure from anything to anything.

The logical structure of my decision still controls what world gets the measure. From Timeless Control:

Surely, if you can determine the Future just by looking at the Past, there's no need to look at th

... (read more)
0Stuart_Armstrong
The demon is not adding quantum measure, or selecting anything. Every Everett branch is getting its measure multiplied - nobody's choice determines where the measure goes. At least, from the outside perspective, for someone who knows what everyone else's choices are/will be (and whose own choices are not relevant), nobody's choice is determining where the measure goes. From the insider perspective, for someone who doesn't know their own decision - well, that depends on their decision theory, and how they treat measure. Do you also disagree with , http://lesswrong.com/lw/g9n/false_vacuum_the_universe_playing_quantum_suicide/ btw? Because that's simply the same problem in reverse.

I don't think that would prevent you from falling asleep, but you would get more benefit from the nap if it is earlier, so your waking periods are more even. You should also be able to sleep less at night, typical biphasic schedules have a 6.5 hour core.

Are you claiming that a) this model is incoherent, or b) that this model does not entail what I'm claiming (that you should save for the future)?

The basic model you described, even as alternative physics, is underspecified, and depending on how I try to steelman it so it is coherent, it doesn't entail what you claim, and if I try to steelman it so it entails what you say, it isn't coherent.

The big question is what worlds get to accumulate measure and why those particular worlds. If the answer is that all worlds accumulate measure, then the accumulation ... (read more)

1Stuart_Armstrong
Consider this setup: you decide whether to buy ice cream now or chocolate later (chocolate ice cream unfortunately not being an option). Your mind will go through various considerations and analyses, and will arrive at a definite conclusion. However, it's actually determined what your decision is - any Laplacian demon could deduce it from looking at your brain. It's all pretty clear, and quantum events are not enough to derail it (barring very very low measure stochastic events). So from the universe's perspective, you're not choosing anything, not shifting measure from anything to anything. But you can't know your own decision before making it. So you have the impression of free will, and are using an appropriate decision theory. Most of these work "as if" your own decision determines which (logical) world will exist, and hence which world will get the increased measure. Or, if your prefer, you know that the world you decide on will get increased measure in the future, you are simply in ignorance of which one it will be. So you have to balance "ice cream before the increased measure" with "chocolate after the increased measure", even though you know one of these is impossible.

My model was of gradual proportional increase in utility

Yes, my example shows a proportional increase in measure between two times, and is indifferent to the gradual increase between these times. If you think the gradual increase is important, please provide an example that illustrates this.

not absolute addition to every branch.

I have already explained why adding the measure to a single branch is incoherent in both the cases where the decision causes or does not cause selection of the branch that receives the measure.

1Stuart_Armstrong
I don't quite understand the point. I'm claiming that, for instance, if a branch has measure M at time 0, it will have measure 2M at time 1. i.e. it's measure at time 1 is twice that at time 0. If measure splits into N+N'=M, then the branch with N will go to 2N and that with N' will go to 2N'. Are you claiming that a) this model is incoherent, or b) that this model does not entail what I'm claiming (that you should save for the future)?

Yes, that is how your decision gives your measure M to world WA or to world WB, but that shouldn't affect accumulation of measure into later states of these worlds by quantum fluctuation, so both worlds still get measure 10M from that.

Unless you mean that quantum fluctuations into later states of the world are directed by the normal evolution of the earlier states, including your decision, in which case, this process would be adding measure (perhaps not quantum measure, but counting as decision theoretic measure in the same way) to the initial state of the... (read more)

0Stuart_Armstrong
My model was of gradual proportional increase in utility, not absolute addition to every branch.

I am not sure what you mean by "logically incompatible worlds", but if worlds WA and WB are the results of different available decisions of an agent embedded in a common precursor world, then they both follow the same laws of physics and just have their particles or whatever in different places, and in a quantum universe they just have different quantum states.

1Stuart_Armstrong
I may decide to go left or right at a crossroad. If I decide to go left (for good reasons, after thinking about it), then almost all of my measure will go left, apart from a tiny bit of measure that tunnels right for various reasons. So if I decide on A, WB will exist, but only with the tiniest of measures.

I was envisaging utilons being "consumed" at the time they were added (say people eating chocolate bars).

My example is entirely compatible with this.

So choosing A would add 4M utilons, and choosing B would add 33M utilons.

So the problem here is that you are not accounting for the fact that choosing A in the measure M world does not prevent the accumulation of measure 10M to world WB from quantum fluctuation. You get those 30M utilons whether you choose A or B, choosing A gets you an immediate 4M additional utilons, while choosing B gets you a deferred 3M utilons.

1Stuart_Armstrong
A and B could be logically incompatible worlds, not simply different branches of the multiverse.

Suppose you are in a world with measure M and are choosing between A and B, where A results in world WA which includes an immediate effect worth 4 utilons per measure, and B results in world WB which includes a later effect at time T worth 3 utililons per measure. Suppose further that under your not-serious theory, at time T, random quantum fluctuations have added measure 10M to the worlds WA and WB. So your choice between A and B is a choice to either add measure M to world WA or world WB, so that choice A results in WA immediately having measure M worth ... (read more)

0Stuart_Armstrong
I was envisaging utilons being "consumed" at the time they were added (say people eating chocolate bars). So choosing A would add 4M utilons, and choosing B would add 33M utilons.

My point was that under your assumptions, the amount you affect does not increase in time at all, only the amount you do not affect increases.

0Stuart_Armstrong
? Er no, you can still make choices that increase of decrease utility. It's simply that the measure of the consequences of these choices keeps on increasing.

For any population of people of happiness h, you can add more people of happiness less than h, and still improve things.

I think that this property, at least the way you are interpreting it, does not fully represent the intuition that leads to the repugnant conclusion. A stronger version would be: For any population of people, you want add more people with positive happiness (keeping the happiness of the already existing people constant), and still improve things.

I don't think your unintuitive aggregation formula would be compatible with that.

0Stuart_Armstrong
I agree. That's why I didn't present my aggregation formula as a counterexample to the mere addition paradox, but merely being connected to it.

My critique of the physics was more of an aside. The main point was the critique of the decision theory, that under the assumptions of this non-serious theory of physics, most of the measure of the various outcomes are independent of your decisions, and you should only base you decisions on the small amount of measure you actually affect.

0Stuart_Armstrong
But whether that small amount is increasing in time or not is very relevant to your decision (depending on how your theory treats measure in the first place).

Occasionally, one of them will partially tunnel, by chance, into the same state our universe is in - and then will evolve forwards in time exactly as our universe is.

So, pretending that this sort of thing has any significance, you would also expect some worlds to tunnel, but chance, into neighboring states, as might result from making different decisions. So, the argument for always sacrificing in favor of future gains falls down, most of the measure for world in which you get the future benefits of the sacrifice comes from quantum fluctuations, not the... (read more)

0Stuart_Armstrong
Er... this isn't a serious theory of physics I've put forwards!

There are sequences of .Net instructions that result in the runtime throwing type exceptions, because it tries to read a value of a certain type of the stack, and it gets an incompatible value. This is the situation that my verifier guards against.

The standard .Net runtime also includes a verifier that checks the same thing, and it will not run code that fails this validation unless it is explicitly trusted. So a verifiable .Net assembly will not throw type exceptions without an explicit cast or throw, but an arbitrary assembly may do so. The compilers for... (read more)

0V_V
Thanks for the clarification. IIUC, unverifiable code does not, or at least is not guaranteed to, politely throw an exception should a type error occur. It may crash the runtime or fail silently leaving the application in an incorrect state. Ok. I thought that you were considering assemblies that passed the standard .NET verification and you were trying to check for some stronger property (such as absence of runtime exceptions caused by downcasts). That would have been equivalent to arbitrary first-order logic inference. Since you are instead checking for decidable properties, your system is indeed not equivalent to arbitrary first-order logic inference. But as jsteinhardt says, it is actually possible to write verifiers that attempt to check for undecidable properties, provided that they have the option to give up.

I am not trying to write a classifier that tells you whether or not an arbitrary program throws a type exception. I wrote a verifier that tells you whether or not an arbitrary program can be proven not to throw type exceptions (except possibly at an explicit cast statement, or a throw exception statement) with a particular proof strategy that covers a huge space of useful, nicely structured programs.

See also jsteinhardt's comment I was responding to, which discussed getting around the halting problem by allowing the checker to say "I don't know".

0V_V
I'm not an expert on .NET, but is there anything that can throw a type exception other than an explicit cast or an explicit throw (or the standard library, I suppose)?

Ah, so by "Godel's theorem presents a strictly stronger obstacle than Lob's theorem" you mean if you overcome Godelian obstacles you also overcome Lobian obstacles? I think I agree, but I am not sure that it is relevant, because the program analyzer examples don't overcome Godelian obstacles, they just cope with the Godelian obstacles, which does not similarly imply coping with or overcoming Lobian obstacles.

Godel's theorem presents a strictly stronger obstacle than Lob's theorem.

Why do you say that? My understanding is that Godel's theorem says that a (sufficiently powerful) logical system has true statements it can't prove, but these statements are excessively complicated and probably not important. Is there some way you envision an AGI being limited in its capacity to achieve its goals by Godel's theorem, as we envision Lob's theorem blocking an AGI from trusting its future self to make effective decisions? (Besides where the goals are tailored to be blo... (read more)

0jsteinhardt
Do you agree or disagree that complete implies reflectively consistent? If you agree, then do you agree or disagree that this means avoidance of Godelian obstacles implies avoidance of Lobian obstacles? If you agree with both of those statements, I'm confused as to why: is a controversial statement.

This is a good approach for dealing with the halting problem, but I think that Lob's theorem is not so closely related that getting around the halting problem means you get around Lob's theorem.

The theoretical AI that would run into Lob's theorem would need more general proof producing capability than these relatively simple program analyzers.

It seems like these program analyzers are built around the specification S they check for, with the human programmer doing the work of constructing a structure of a proof which can be filled in to a complete proof by... (read more)

0V_V
I'm not sure I've understood what you have in mind here, but in the general case complete type checking in .NET (that is, proving that an assembly not only is syntactically well-formed but also never throws type-related exceptions at runtime) is undecidable because of Rice's theorem. In the general case, complete type checking is as difficult as proving arbitrary claims in first-order logic.
0jsteinhardt
My mathematical logic is a bit rusty, but my impression is that the following are true: 1. Godel's theorem presents a strictly stronger obstacle than Lob's theorem. A reflectively consistent theory may still be incomplete, but any complete theory is necessarily reflectively consistent. 2. The undecidability of the halting problem is basically Godel's theorem stated in computational terms. If we could identify a subset L of Turing machines for whom the halting problem can be decided, as long as it was closed under operations such as inserting a (non-self-referential) sub-routine, then we would be able to verify any (non-self-referential) property of the program that was also expressible in L. This is a sketch of a claim rather than an actual claim that I've proved, though. Finally, I think it's worth pointing out an actual example of a program analysis tool since I think they are more powerful than you have in mind. The following slides are a good example of such a tool. At a high level, it gets around the problems you are worried about by constructing an over-approximation of the halting problem that is expressible in propositional logic (and thus decidable, in fact it is even in NP). More generally we can construct a sequence of approximations, each expressible in propositional logic, whose conjunction is no longer an approximation but in fact exactly the original statement.
Load More