LESSWRONG
LW

All of bryjnar's Comments + Replies

Often the effect of being blinded is that you take suboptimal actions. As you pointed out in your example, if you see the problem then all sorts of cheap ways to reduce the harmful impact occur to you. So perhaps one way of getting to the issue could be to point at that: "I know you care about my feelings, and it wouldn't have made this meeting any less effective to have had it more privately, so I'm surprised that you didn't"?

Half-baked AI Safety ideas thread

bryjnar3y220

Wireheading traps.

An agent is "wireheading" if it is taking an action that a) provides it with enormous amounts of utility for little effort, b) is trivial or seemingly unrelated to its "main" utility function or goals.

People have discussed the possibility of an AI wireheading as a problem for getting it to do what you want, "what if it just works out a way to set your metric to +ve infinity and then goes to sleep satisfied?".

But we can also use this as a guard-rail.

A "wireheading trap" is an action which a) is very hard for an AI to do below a level of ca... (read more)

1Chris_Leong3y

I’ve had similar thoughts too. I guess the way I’d implement it is by giving the AI a command that it can activate that directly overwrites the reward buffer but then turns the AI off. The idea here is to make it as easy as possible for an ai inclined to wire head to actually wire head so it is less incentivised to act in the physical world. During training I would ensure that the SGD used the true reward rather than the wire-headed reward. Maybe that would be sufficient to stop wire-heading, but there are issues with it pursuing the highest probability plan rather than just a high probability plan. Maybe quantilising probability can help here

3[anonymous]3y

To be specific to a "toy model". AI has a goal: collect stamps/build paperclips. A deliberately easy to hack system is physically adjacent that tracks the AI's reward. Say it has a no password shell and is accessible via IP. AI becomes too smart, and hacks itself so it now has infinite reward and it has a clock register it can tamper with so it believes infinite time has already passed. AI is now dead. Since no action it can take beats infinite reward it does nothing more. Sorta like a heroin overdose.

3simon3y

If the AI is a long term planner seeking particular world states, then I am concerned that once it achieves the wireheading objective, it is incentivized to maintain the situation, which may be best achieved if any humans who might decide to erase the writing are dead. A suggestion: if the AI has a utility function that applies to actions not world states then you can assign high utility to the combined action of writing "Bill is a poo poo head" in 10m high letters into Mt Everest and then shutting itself down. Note: this does not solve the problem of the AI actively seeking this out instead of doing what it's supposed to. To do the latter, you could try something like: 1. Have the action evaluator ignore the wirehead action unless it is "easy" in some sense to achieve given the AI and world's current state, and 2. Have the AI assume that the wirehead action will always be ignored in the future Unfortunately, I don't know how one would do (2) reliably, and if (2) fails, (1) would lead the AI to actively avoid the tripwire (as activating it would be bad for the AI's current plans given that the wirehead action is currently being ignored).

The Proof of Doom

bryjnar3y120

We have trained ML systems to play games, what if we trained one to play a simplified version of the "I'm an AI in human society" game?

Have a population of agents with preferences, the AI is given some poorly specified goal, it has the ability to expand its capabilities etc. You might expect to observe things like a "treacherous turn".

If we could do that it would be quite the scary headline "Researchers simulate the future with AI and it kills us all". Not proof, but perhaps viral and persuasive.

2johnlawrenceaspden3y

The board game 'Diplomacy' comes to mind. I wonder if anyone's ever tried to get AIs to play it? Certainly there've been a lot of multi-agent prisoners dilemma tournaments. I think MIRI even managed to get agents to cooperate in one-shot prisoners dilemma games, as long as they could examine each other's source code.

7lorepieri3y

This would not be a conclusive test, but definitely a cool one and may spark a lot of research. Perhaps we could get started with something NLP based, opening up more and more knowledge access to the AI in the form of training data. Probably still not feasible as of 2022 in term of raw compute required.

Meaning and Moral Foundations Theory

bryjnar7y10

I think I would argue that harm/care isn't obviously deontological. Many of the others are indeed about the performance of the action, but I think arguably harm/care is actually about the harm. There isn't an extra term for "and this was done by X".

That might just be me foisting my consequentialist intuitions on people, though.

Is Rhetoric Worth Learning?

bryjnar7y10

"What if there's an arms race / race to the bottom in persuasiveness, and you have to pick up all the symmetrical weapons others use and then use asymmetrical weapons on top of those?"

Doesn't this question apply to other cases of symmetric/asymmetric weapons just as much?

I think the argument is that you want to try and avoid the arms race by getting everyone to agree to stick to symmetrical weapons because they believe it'll benefit them (because they're right). This may not work if they don't actually believe they&#x... (read more)

Local Validity as a Key to Sanity and Civilization

bryjnar7y210

The point that the Law needs to be simple and local so that humans can cope with it is also true of other domains. And this throws up an important constraint for people designing systems that humans are supposed to interact with: you must make it possible to reason simply and locally about them.

This comes up in programming (to a man with a nail everything looks like a hammer): good programming practice emphasises splitting programs up into small components that can be reasoned about in isolation. Modularity, compositionality, abstraction, etc. aside from t... (read more)

Is Rhetoric Worth Learning?

bryjnar7y120

I am reminded of Guided by the Beauty of our Weapons. Specifically, it seems like we want to encourage forms of rhetoric that are disproportionately persuasive when deployed by someone who is in fact right.

Something like "make the structure of your argument clear" is probably good (since it will make bad arguments look bad), "use vivid examples" is unclear (can draw people's attention to the crux of your argument, or distract from it), "tone and posture" are probably bad (because the effect is symmetrical).

So a good test ... (read more)

Ben Pace7y140

If I put on my cynical hat, it looks like there's going to be lots of (regressional) goodharting on persuasiveness here. If you look at those who are the *most* believed when they say things on important matters, or the people whose ideas *most* dominate the conversation, its probably significantly because they maxed out other variables that go into 'speaking/writing well' which aren't just 'good communication of true and useful things'.

To point to a concrete example of the former, it seems like Peter Singer makes a number of

bryjnar7y10

Yes, this is very annoying.

An Apology is a Surrender

bryjnar7y50

I found Kevin Simmler's observation that an apology is a status lowering to be very helpful. In particular, it gives you a good way to tell if you made an apology properly - do you feel lower status?

I think that even if you take the advice in this post you can make non-apologies if you don't manage to make yourself lower your own status. Bits of the script that are therefore important:

Being honest about the explanation, especially if it's embarassing.
Emphasise explanations that attribute agency to you - "I just didn't think about i

... (read more)

2Qiaochu_Yuan7y

Dogeza is a good illustration of this; this is the traditional Serious Anime Apology (probably it is also a traditional Serious Japanese Apology but in point of fact I have only ever seen it in anime so I'll stick to what I know) in which you kneel and then prostrate yourself fully, so that all four of your limbs and your head are on the ground.

5SquirrelInHell7y

I'd add that the desire to hear apologies is itself a disguised status-grabbing move, and it's prudent to stay wary of it.

Choice begets regret

bryjnar7y70

This is a great point. I think this can also lead to cognitive dissonance: if you can predict that doing X will give you a small chance of doing Y, then in some sense it's already in your choice set and you've got the regret. But if you can stick your fingers in your ears enough and pretend that X isn't possible, then that saves you from the regret.

Possible values of X: moving, starting a company, ending a relationship. Scary big decisions in general.

Something that confused me for a bit: people use regret-minimization to handle exporation-ex... (read more)

The expected value of the long-term future

bryjnar7y10

I've read it shallowly, and I think it's generally good. I think I'll have some more comments after I've thought about it a bit more. I'm surprised either by the lack of previous quantitative models, or the lack of reference to them (which is unsurprising if they don't exist!). Is there really nothing prior to this?

1[deleted]7y

Ord (2014) is all I'm aware of and it's unpublished.

The Copernican Revolution from the Inside

bryjnar7y20

I would dearly, dearly love to be able to use the fairly-standard Markdown footnote extension.

Zero-Knowledge Cooperation

bryjnar7y30

I think your example won't work, but it depends on the implementation of FHE. If there's a nonce involved (which there really should be), then you'll get different encrypted data for the output of the two programs you run, even though the underlying data is the same.

But you don't actually need to do that. The protocol lets B exfiltrate one bit of data, whatever bit they like. A doesn't get to validate the program that B runs, they can only validate the output. So any program that produces 0 or 1 will satisfy A and they'll even

... (read more)

Zero-Knowledge Cooperation

bryjnar7y10

I haven't read Age of Em, but something like "spur safes" was an inspiration (I'm sure I've come across the idea before). My version is similar except that

It's stripped down.
1. B only needs to make a Validator, which could be a copy of themself, but doesn't have to be.
2. It only validates A to B, rather than trying to do both simultaneously. You can of course just run it twice in both directions.
You don't need a trusted computing environment.

I think that's a pretty big deal, because the trusted computing environment

... (read more)

Zero-Knowledge Cooperation

bryjnar7y20

Pretty much! Expanding your explanation a little:

A sends msg_1 = Encrypt(A_source, A_key), and sends that to B
B wants to run Validate(source) = Sign(Check_trustworthy(source), B_key) on A_source, but can't do that directly because B only has an encrypted version.
1. So B runs Validate under FHE on msg_1, producing msg_2 = Encrypt(Validate(A_source), A_key), and sends that to A.
A decrypts msg_2, producing msg_3 = Validate(A_source) = Sign(Check_trustworthy(A_source), B_key), and sends that back to B (if it meets the agreed-on format).
B has a claim that A&#

... (read more)

3kvas7y

I'm not sure I'm completely solid on how FHE works, so perhaps this won't work, but here's an idea of how B can exploit this approach: 1. Let's imagine that Check_trustworthy(A_source) = 1. After step 3 of the parent comment B would know E1 = Encrypt(1, A_key). If Check_trustworthy(A_source) returned 0, B would instead know E0 = Encrypt(0, A_key) and the following steps works similarly. B knows which one it is by looking at msg_3. 2. B has another program: Check_blackmail(X, source) that simulates behaviour of an agent with the given source code in situation X and returns 1 if it would be blackmailable or 0 if not. 3. B knows Encrypt(A_source, A_key) and they can compute F(X) = Encrypt(Check_blackmail(X, A_source), A_key) for any X using FHE properties of the encryption scheme. 4. Let's define W(X) = if(F(X) = E1, 1, 0). It's easy to see that W(X) = Check_blackmail(X, A_source), so now B can compute that for any X. 5. Profit?

Newcomblike problems are the norm

bryjnar11y80

Fantastic post, I think this is right on the money.

Many more Newcomblike scenarios simply don't feel like decision problems: people present ideas to us in specific ways (depending upon their model of how we make choices) and most of us don't fret about how others would have presented us with different opportunities if we had acted in different ways.

I think this is a big deal. Part of the problem is that the decision point (if there was anything so firm) is often quite temporally distant from the point at which the payoff happens. The time when you &quo... (read more)

Utilitarianism and Relativity Realism

bryjnar11y10

You cannot possibly gain new knowledge about physics by doing moral philosophy.

This seems untrue. If you have high credence in the two premisses:

If X were a correct physical theory, then Y.
Not Y.

then that should decrease your credence in X. It doesn't matter whether Y is a proposition about the behaviour of gases or about moral philosophy (although the implication is likely to be weaker in the latter case).

Constructive mathemathics and its dual

bryjnar12y-10

Constructivist logic works great if you interpret it as saying which statements can be proven, or computed, but I would say it doesn't hold up when interpreted as showing which statements are true (given your axioms). It's therefore not really appropriate for mathematics, unless you want to look at mathematics in the light of its computational or proof-theoretic properties.

Constructive mathemathics and its dual

bryjnar12y00

Dialethists requires paraconsistent logic, as you have to be able to reason in the presence of contradictions, but paraconsitent logic can be used to model other things than truth. For example, constructive logic is often given the semantics of showing what statements can be proven, rather than what statements are true. There are similar interpretations for paraconsistent logic.

OTOH, if you think that paraconsistent logic is the correct logic for truth, then you probably do have to be a dialethist.

Confusion about Normative Morality

bryjnar12y10

That's pretty weird, considering that so-called "sophisticated" consequentialist theories (where you can say something like: although in this instance it would be better for me to do X than Y, overall it would be better to have a disposition to do Y than X, so I shall have such a disposition) have been a huge area of discussion recently. And yes, it's bloody obvious and it's a scandal it took so long for these kinds of ideas to get into contemporary philosophy.

Perhaps the prof meant that such a consequentialist account appears to tell you to foll... (read more)

0JMiller12y

Yeah, we read Railton's sophisticated consequentialism, which sounded pretty good. Norcross on why consequentialism is about offering suggestions and not requirements was also not too bad. I feel like the texts I am reading are more valuable than the classes, to be frank. Thanks for the input!

Right for the Wrong Reasons

bryjnar12y20

I agree that "right for the wrong reasons" is an indictment of your epsitemic process: it says that you made a prediction that turned out correctly, but that actually you just got lucky. What is important for making future predictions is being able to pick the option that is most likely, since "being lucky" is not a repeatable strategy.

The moral for making better decisions is that we should not praise people who predict prima facie unlikely outcomes -- without presenting a strong rationale for doing so -- but who then happen to be corre... (read more)

Harsanyi's Social Aggregation Theorem and what it means for CEV

bryjnar12y30

Great post! I wish Harsanyi's papers were better known amongst philosophers.

Pigliucci's comment on Yudkowsky's and Dai's stance on morality and logic

bryjnar12y120

Mainstream philosophy translation: moral concepts rigidly designate certain natural properties. However, precisely which properties these are was originally fixed by certain contingent facts about the world we live in and human history.

Hence the whole "If the world had been different, then what is denoted by "morality" would have been different, but those actions would still be immoral (given what "morality" actually denotes)" thing.

This position is sometimes referred to as "sythetic ethical naturalism".

Second-Order Logic: The Controversy

bryjnar12y70

I'm still worried about the word "model". You talk about models of second-order logic, but what is a model of second-order logic? Classically speaking, it's a set, and you do talk about ZF proving the existence of models of SOL. But if we need to use set theory to reason about the semantic properties of SOL, then are we not then working within a first-order set theory? And hence we're vulnerable to unexpected "models" of that set theory affecting the theorems we prove about SOL within it.

It seems like you're treating "model" a... (read more)

3abramdemski12y

My response to this situation is to say that proof theory is more fundamental and interesting than model theory, and pragmatic questions (which the dialog attempted to ask) are more important than model-theoretic questions. However, to some extent, the problem is to reduce model-theoretic talk to more pragmatic talk. So it isn't surprising to see model-theoretic talk in the post (although I did feel that the discussion was wandering from the point when it got too much into models).

My Best Case vs Your Worst Case

bryjnar12y50

It's like the opposite of considering the Least Convenient Possible World; the Most Convenient Possible World! Where everything on my side turns out as well as possible, and everything on yours turns out as badly as possible.

Interpersonal and intrapersonal utility comparisons

bryjnar12y30

I'm pretty sure that the idea of the previous two paragraphs has been talked about before, but I can't find where.

It's pretty commonly discussed in the philosophical literature on utilitarianism.

Can infinite quantities exist? A philosophical approach

bryjnar12y80

I think most of this worrying is dissolved by better philosophy of mathematics.

Infinte sets can be proven to exist in ZF, that's just a consequence of the Axiom of Infinity. Drop the axiom, and you can't prove them to exist. You're perfectly welcome to work in ZF-Infinity if you like, but most mathematicians find ZF to be more interesting and more useful. I think the mistake is to think that one of these is the "true" axiomatization of set theory, and therefore there is a fact of the matter over whether "infinite sets exist". There are ... (read more)

0Qiaochu_Yuan12y

Another mathematical point is that mathematical models involving infinite things can sometimes be shown to be equivalent to mathematical models involving only finite things. Terence Tao has written extensively on this; see, for example, this blog post. So quibbling about infinities is very much quibbling about properties of the map, not properties of the territory.

Three kinds of moral uncertainty

bryjnar12y10

This definitely seems to be a post-metaethics post: that is, it assumes something like the dominant EY-style metaethics around here (esp the bit about "intrinsic moral uncertainty"). That's fine, but it does mean that the discussion of moral uncertainty may not dovetail with the way other people talk about it.

For example, I think many people would gloss the problem of moral uncertainty as being unsure of which moral theory is true, perhaps suggesting that you can have a credence over moral theories much like you can over any other statement you a... (read more)

0Kaj_Sotala12y

This would still require some unpacking over what they mean with a moral theory being true, though.

The Relation Projection Fallacy and the purpose of life

bryjnar12y10

Oh, I see. Sorry, I misinterpreted you as being sceptical about the normal usage of "purpose". And nope, I can't give a taboo'd account of it: indeed, I think it's quite right that it's a confused concept - it's just that it's a confused concept not a confused use of a normal concept.

The Relation Projection Fallacy and the purpose of life

bryjnar12y10

I'd claim that there is a distinct concept of "purpose" that people use that doesn't entail an agent with that purpose. It may be a pretty unhelpful concept, but it's one that people use. It may also have arisen as a result of people mixing up the more sound concept of purpose.

I think you're underestimating people who worry about "ultimate purpose". You say they "don't even understand the context", as opposed to people who "understand the full context of the concept". I'm not sure whether you're just being a linguist... (read more)

0hyporational12y

Nobody is calling anyone an idiot here, brilliant people can be confused too. I think it's a feature of the brain to confuse new language with the originally intended concepts. We wouldn't have most of philosophy without this feature.

The Relation Projection Fallacy and the purpose of life

bryjnar12y10

"What's the point of that curious tool in your shed?"

"Oh, it's for clearing weeds."

The purpose of the tool is to clear weeds. This is pretty underdetermined: if I used it to pick my teeth then there would be a sense in which the purpose of the tool was to act as a toothpick, and a sense in which I was using it for a purporse unintended by its creator, say.

Importantly, this isn't supposed to be a magically objective property of the object, no Aristotelian forms here! It's just a feature of how people use or intend to use the object.

1hyporational12y

Sorry if I worded my question confusingly. I think the op already addresses this and is not simply projecting minds. The important part is that an agent can be assumed and queried. I was hoping for an example where an agent cannot be assumed as in "the ultimate purpose". Your example would make no sense at all if an agent could not be queried.

The Relation Projection Fallacy and the purpose of life

bryjnar12y10

+1 nitpickiness.

And Eliezer makes the same mistake in the linked article too ;) Not that it exactly matters!

The Relation Projection Fallacy and the purpose of life

bryjnar12y190

If we're naming fallacies, then I would say that this post commits the following:

The Linguistic Consistency Fallacy: claiming, implicitly or otherwise, that a word must be used in the same way in all instances.

A word doesn't always mean the same thing even if it looks the same. People who worry about the purpose of life aren't going to be immediately reassured once you point out that they're just missing one of the relata. "Oh, silly me, of course, it's a three-place relation everywhere else, so of course I was just confused when I was using it here&q... (read more)

0hyporational12y

What do you mean by an incorrect use of a concept? If you curry a function, you get a new function, in this case a new concept that happens to be confused because the original function needs all 3 places to make sense. It says so right here in this post. I'm inclined to believe the disagreement you posit simply doesn't exist. Would the historical account be that it was a less of a hassle to drop the agent from used language, and over time some people dropped it from the underlying concept, and got confused?

5hyporational12y

I'd really like to see someone taboo or at least write out what they mean with this 2-nary purpose. It surely got me confused before, and especially now after the op clarified my thoughts, it feels like a completely meaningless and incoherent utterance. Can you give any other examples where purpose is used this way in common language with intended 2-nary meaning* except "the ultimate purpose"? *edited, sorry for the confusing wording

3buybuydandavis12y

I don't think so. He's pointing out that the concept of purpose entails an agent with the purpose. We don't explicitly stating context for words all the time. But for words like purpose, people haven't just dropped context, they don't even understand the context, and think that their projections have singular meaning, and argue with other bozos suffering under the same confusion about a different singular meaning. Meanwhile, when two people who understand the full context of the concept have dropped context, they may miscommunicate at first, but have no problem clarifying their commnication - they just identify the full context in which they're speaking. "I mean Joe's purpose for his life." "Oh, I thought you were talking about my purpose for my life. Nevermind." As for the guy talking about "Ultimate Purpose", the OP points out that the concept of purpose entails a agent with that purpose. If by your own statement "it's not anybody's purpose", then you're not really talking about a purpose at all, and are just confused. The OP can show them the way out of their confusion, but there's not guarantee they'll take the way. You can lead a horse to water, but you can't make him think.

3handoflixue12y

Anecdotal counter-evidence: I used to fret about the "ultimate purpose" and then I thought about what it would MEAN for their to be some grand purpose, and it seemed like a dreadful prospect once I'd actually sat down and considered the idea that God might be Disappointed in me. The thing I wanted from my "ultimate purpose" was a guiding force, a Sign From God telling me what to do, and it's obvious that this doesn't exist. Even if there is a God, and even if He is deeply, deeply Disappointed in me, I've never been told - it can't influence my decisions. So now I embrace Discordianism, and the freedom to write my own purposes, and to just worry about disappointing myself and the people in my life. It's really quite relaxing.

9Academian12y

I'm definitely talking about the concept of purpose here, not the word. I don't think you're splitting hairs; this is not a word game, and perhaps I should say in the post that I don't think just saying "Purpose to whom?" is the way to address this problem in someone else. In my experience, saying something like this works better: "The purpose of life is a big question, and I think it helps to look at easier examples of purpose to understand why you might be looking for a purpose of life. First of all, you may be lacking satisfaction in your life for some reason, and framing this to yourself in philosophical terms like "Life has no purpose, because ." If that's true, it's quite likely that you'd feel differently if your emotional needs as a social primate were being met, and in that sense the solution is not an "answer" but rather some actions that will result in these needs being met. Still, that does not address the . So because "What's the purpose of life" may be a hard question, let's look at easier examples of purpose and see how they work. Notice how they all have someone the purpose is to? And how that's missing in your "purpose of life" question? That means you could end up feeling one of two ways: (1) Satisfied, because now you can just ask "What could be the purpose of my life to ", etc, and come up with answers, or (2) unsatisfied because there is no agent to ask about such that the answer would satisfy you. And I claim that whether you end up at (1) or (2) is a function of whether your social primate emotional needs are being met than any particular philosophical argument."

2magfrump12y

What I think of the post as saying, rather than "purpose has only the meaning (to english speakers) of a ternary relation," is that "when one normally asks about something's purpose, one implicitly uses its structure as a ternary relation, and since you haven't established a ternary relation here you aren't going to get a satisfying answer that way." I think I agree with you on at least one point, though, which is that "words" are really not the problem object; the sentence "what is the meaning of life?" is grammatically correct and not logically invalid and is somewhat a different use of the word purpose. The core object in these constructions I think is cognitive algorithms; in particular the "hear the word purpose, search for Z" algorithm breaks down when purpose changes meaning to no longer involve the same sorts of X,Y,Z.

Godel's Completeness and Incompleteness Theorems

bryjnar12y00

By semantics I mean your notion of what's true. All I'm saying is that if you think that you can prove everything that's true, you probably have a an overly weak notion of truth. This isn't necessarily a problem that needs to be "fixed" by being really smart.

Also, I'm not saying that our notion of proof is too weak! Looking back I can see how you might have got the impression that I thought we ought to switch to a system that allows infinite proofs, but I don't. For one thing, it wouldn't be much use, and secondly I'm not even sure if there even is such a proof system for SOL that is complete.

Godel's Completeness and Incompleteness Theorems

bryjnar12y00

Absolutely, but it's one that happens in a different system. That can be relevant. And I quite agree: that still leaves some things that are unknowable even by supersmart AI. Is that surprising? Were you expecting an AI to be able to know everything (even in principle)?

0moshez12y

No, it is not surprising... I'm just saying that saying that the semantics is impoverished if you only use finite syntactical proof, but not to any degree that can be fixed by just being really really really smart.

Ideal Advisor Theories and Personal CEV

bryjnar12y00

They explicitly don't address that:

Second, it might seem that this approach to determining Personal CEV will require a reasonable level of accuracy in simulation. If so, there might be concerns about the creation of, and responsibility to, potential moral agents.

Ideal Advisor Theories and Personal CEV

bryjnar12y10

Ooookay. The whole "loop" thing feels like a leaky abstraction to me. If you had to do that much work to explain the loopiness (which I'm still not sold on) and why it's a problem, perhaps saying it's "loopy" isn't adding much.

This loses the sight of the original purpose: the evaluating criteria should be acceptable to the original person

I think I may still be misunderstanding you, but this seems wrong. The whole point is that even if you're on some kind of weird drugs that make you think that drinking bleach would be great, the ide... (read more)

1Shmi12y

I wasn't trying to solve the whole CEV and FAI issue in 5 min, was only giving an example of how breaking a feedback loop avoids some of the complications.

Godel's Completeness and Incompleteness Theorems

bryjnar12y30

It is!? Does anyone know a proof of Compactness that doesn't use completeness as a lemma?

Yes. Or, at least, I did once! That's the way we proved it the logic course I did. The proof is a lot harder. But considering that the implication from Completeness is pretty trivial, that's not saying much.

Ideal Advisor Theories and Personal CEV

bryjnar12y50

Great post! It's really nice to see some engagement with modern philosophy :)

I do wonder slightly how useful this particular topic is, though. CEV and Ideal Avisor theories are about quite different things. Furthermore, since Ideal Advisor theories are working very much with ideals, the "advisors" they consider are usually supposed to be very much like actual humans. CEV, on the other hand, is precisely supposed to be an effective approximation, and so it would seem surprising if it were to actually proceed by modelling a large number of instance... (read more)

Ideal Advisor Theories and Personal CEV

bryjnar12y100

This comment reads to me like: "Haha, I think there are problems with your argument, but I'm not going to tell you what they are, I'm just going to hint obliquely in a way that makes me look clever."

If you actually do have issues with Sobel's arguments, do you think you could actually say what they are?

2Shmi12y

Sorry if this came across as a status game. Let me give you one example. This is a loop Sobel solves with the amnesia model. (A concurrent clone model would be a better description, to avoid any problems with influences between lives, such as physical changes). There is still however the issue of giving advice to your past self after removing amnesia, even though you " might be incapable of adequately evaluating the lives they’ve experienced based on their current, more knowledgeable, evaluative perspective." This loses the sight of the original purpose: the evaluating criteria should be acceptable to the original person, and no such criteria have been set in advance. Same with the parliament: the evaluation depends on the future experiences, feeding into the loop. To remedy the issue, you can decide to create and freeze the arbitration rules in advance. For example, you might choose as your utility function some weighted average of longevity, happiness, procreation, influence on the world around you, etc. Then score the utility of each simulated life, and then pick one of, say, top 10 as your "initial dynamic". Or the top life you find acceptable. (Not restricting to automatically picking the highest-utility one, in order to avoid the "literal genie" pitfall.) You can repeat as you see fit as you go on, adjusting the criteria (hence "dynamic"). While you are by no means guaranteed to end up with the "best life possible" life after breaking the reasoning loop, you at least are spared problems like "better off dead" and "insane parliament", both of which result from a preference feedback loop.

Ideal Advisor Theories and Personal CEV

bryjnar12y30

A lot of what you've said sounds like you're just reiterating what Luke says quite clearly near the beginning: Ideal Advisor theories are "metaphysical", and CEV is epistemic, i.e. Ideal Advisor theories are usually trying to give an account of what is good, whereas, as you say, CEV is just about trying to find a good effective approximation to the good. In that sense, this article is comparing apples to oranges. But the point is that some criticisms may carry over.

[EDIT: this comment is pretty off the mark, given that I appear to be unable to read the first sentence of comments I'm replying to. "historical context" facepalm]

Godel's Completeness and Incompleteness Theorems

bryjnar12y10

I absolutely agree that this will help people stop being confused about Godel's theorem, I just don't know why EY does it in this particular post.

Do you have any basis for this claim?

Nope, it's pure polemic ;) Intuitively I feel like it's a realism/instrumentalism issue: claiming that the only things which are true are provable feels like collapsing the true and the knowable. In this case the decision is about which tool to use, but using a tool like first-order logic that has these weird properties seems suspicious.

Godel's Completeness and Incompleteness Theorems

bryjnar12y00

Oh yeah - brain fail ;)

Godel's Completeness and Incompleteness Theorems

bryjnar12y60

the compactness theorem is equivalent to the ultrafilter lemma, which in turn is essentially equivalent to the statement that Arrow's impossibility theorem is false if the number of voters is allowed to be infinite.

Well, I can confirm that I think that that's super cool!

the compactness theorem is independent of ZF

As wuncidunci says, that's only true if you allow uncountable languages. I can't think of many cases off the top of my head where you would really want that... countable is usually enough.

Also: more evidence that the higher model theory of first-order logic is highly dependent on set theory!

Godel's Completeness and Incompleteness Theorems

bryjnar12y30

I think it's worth addressing that kind of argument because it is fairly well known. Penrose, for example, makes a huge deal over it. Although mostly I think of Penrose as a case study in how being a great mathematician doesn't make you a great philosopher, he's still fairly visible.

Godel's Completeness and Incompleteness Theorems

bryjnar12y00

Exactly (I'm assuming by subset you mean non-strict subset). Crucially, a non-standard model may not have all the bijections you'd expect it to, which is where EY comes at it from.

2A1987dM12y

I was, but that's not necessary -- a countably infinite set can be bijectively mapped onto {2, 3, 4, ...} which is a proper subset of N after all! ;-)

Godel's Completeness and Incompleteness Theorems

bryjnar12y00

Sure. So you're not going to be able to prove (and hence know) some true statements. You might be able to do some meta-reasoning about your logic to figure some of these out, although quite how that's supposed to work without requiring the context of set theory again, I'm not really sure.

3moshez12y

bryjnar: I think the point is that the metalogical analysis that happens in the context of set theory is still a finite syntactical proof. In essense, all mathematics can be reduced to finite syntactical proofs inside of ZFC. Anything that really, truly, requires infinite proof in actual math is unknowable to everyone, supersmart AI included.

bryjnar12y50

This post doesn't really say anything interesting about retributive justice at all. It sounds like what's actually bugging you is the question of national "sovereignty". Plus, you Godwined yourself. Between these things, you give a pretty bad impression. Perhaps if you reposted it with a less flamebaity example and a title like "Is there any ethical reason to respect national sovereignty?" or something you might fare better.

0kodos9612y

Well, you are correct that the post isn't actually about retributive justice........ but it's not about sovereignty either ;)

Godel's Completeness and Incompleteness Theorems

bryjnar12y50

A few things.

a) I'm a little confused by the discussion of Cantor's argument. As I understand it, the argument is valid in first-order logic, it's just that the conclusion may have different semantics in different models. That is, the statement "the set X is uncountable" is cashed out in terms of set theory, and so if you have a non-standard model of set theory, then that statement may have non-standard sematics.

This is all made horrendously confusing by the fact that when we do model theory we tend to model our domains using sets. So even in a n... (read more)

5AlexMennen12y

Disagree. I actually understand Godel's incompleteness theorem, and started out misunderstanding it until a discussion similar to the one presented in this post, so this may help clear up the incompleteness theorem for some people. And unlike the Compactness theorem, Godel's completeness theorem at least seems fairly intuitive. Proving the existence of nonstandard models from the Compactness theorem seems kind of like pulling a rabbit out of a hat if you can't show me why the Compactness theorem is true. Do you have any basis for this claim?

3Eliezer Yudkowsky12y

Isn't the knowable limited to what can be known with finite chains of reasoning, whatever your base logic?

9A1987dM12y

More clearly -- "X is uncountable" means "there is no bijection between X and a subset of N", but "there" stilll means "within the given model".

Mixed Reference: The Great Reductionist Project

bryjnar12y00

Right. So, as I said, you are counselling that "anthropics" is practically not a problem, as even if there is a sense of "expect" in which it would be correct to expect the Boltzmann-brain scenario, this is not worth worrying about because it will not affect our decisions.

That's a perfectly reasonable thing to say, but it's not actually addressing the question of getting anthropics right, and it's misleading to present it as such. You're just saying that we shouldn't care about this particular bit of anthropics. Doesn't mean that I wouldn't be correct (or not) to expect my impending dissolution.

2Tyrrell_McAllister12y

I would have been "addressing the question of getting anthropics right" if I had talked about what the "I" in "I will dissolve" means, or about how I should go about assigning a probability to that indexical-laden proposition. I don't think that I presented myself as doing that. I'm also not saying that I've solved these problems, or that we shouldn't work towards a general theory of anthropics that answers them. The uselessness of anticipating that you will be a Boltzmann brain is particular to Boltzmann-brain scenarios. It is not a feature of anthropic problems in general. The Boltzmann brain is, by hypothesis, powerless to do anything to change its circumstances. That is what makes anticipating the scenario pointless. Most anthropic scenarios aren't like this, and so it is much more reasonable to wonder how you should allocate "anticipation" to them. The question of whether indexicals like "I" should play a role in how we allocate our anticipation — that question is open as far as I know. My point was this. Eliezer seemed to be saying something like, "If a theory of anthropics reduces anthropics to physics+logic, then great. But if the theory does that at the cost of saying that I am probably a Boltzmann brain, then I consider that to be too high a price to pay. You're going to have to work harder than that to convince me that I'm really and truly probably a Boltzmann brain." I am saying that, even if a theory of anthropics says that "I am probably a Boltzmann brain" (where the theory explains what that "I" means), that is not a problem for the theory. If the theory is otherwise unproblematic, then I see no problem at all.