Can someone explain to me the significance of problems like Sleeping Beauty? I see a lot of digital ink being spilled over them and I can kind of see how they call into question what we mean by "probability" and "expected utility", but I can't quite pin down the thread that connects all of them. Someone will pose a solution to a paradox X, and then another reply with a modified version X' that the previous solution fails on, and I tend to have trouble seeing what the exact thing is people are trying to solve.
If you want to build an AI that maximizes utility, and that AI can create copies of itself, and each copy's existence and state of knowledge can also depend on events happening in the world, then you need a general theory of how to make decisions in such situations. In the limiting case when there's no copying at all, the solution is standard Bayesian rationality and expected utility maximization, but that falls apart when you introduce copying. Basically we need a theory that looks as nice as Bayesian rationality, is reflectively consistent (i.e. the AI won't immediately self-modify away from it), and leads to reasonable decisions in the presence of copying. Coming up with such a theory turns out to be surprisingly hard. Many of us feel that UDT is the right approach, but many gaps still have to be filled in.
Note that many problems that involve copying can be converted to problems that create identical mind states by erasing memories. My favorite motivating example is the Absent-Minded Driver problem. The Sleeping Beauty problem is similar to that, but formulated in terms of probabilities instead of decisions, so people get confused.
An even simpler way to emulate copying is by pu...
When a problem involves a predictor that's predicting your actions, it can often be transformed into another problem that has an indistinguishable copy of you inside the predictor. In some cases, like Counterfactual Mugging, the copy and the original can even receive different evidence, though they are still unable to tell which is which.
There are more complicated scenarios, where the predictor is doing high-level logical reasoning about you instead of running a simulation of you. In simple cases like Newcomb's Problem, that distinction doesn't matter, but there is an important family of problems where it matters. The earliest known example is Gary Drescher's Agent Simulates Predictor. Other examples are Wei Dai's problem about bargaining and logical uncertainty and my own problem about logical priors. Right now this is the branch of decision theory that interests me most.
I do not understand the point of the essay http://yudkowsky.net/rational/the-simple-truth/ . The preface says that it "is meant to restore a naive view of truth", but all I see is strawmanning everything Eliezer dislikes. What is that "naive view of truth"?
The naive view of truth:
Some things are true, some things are false, like "My name is 'Ben'." - True "My name is Alfred'." - False
When it comes to factual questions, you should believe in their truth the more you have evidence for them. If well-researched statistics indicate that one country has a higher homicide rate than another, then you should believe it (unless you have other, really good evidence to the contrary). If well-formulated studies come back in, and a certain brand of alternative medicine has been discovered to be 'ineffectual', then you should believe it (unless you have other, really good evidence to the contrary). One should not start arguing about "well, what is truth really?" or "how can we ever know anything really?". If one actually thought like this, I think it was Feynman who noted that these people would soon die of starvation, because they'd never really know if that yellow thing was a banana, and that they could eat it. These arguments are simply ways of dismissing really good evidence, and you should not use them.
The purpose of the essay, is so that when you're in an argument, you provide evidence, and the person goes "but all truth is relative" or "nothing is true, it's all just oppression of the many by the powerful" you can send it them and say "stop evading the actual evidence!".
I was reading Eliezer's cartoon proof of Lob's theorem the other day and I didn't get it. My assumption was that in order to understand it, I would need a decent background in mathematical logic, e.g. actually know what Peano Arithmetic is as opposed to abstracting it away as a talking head that tells us things. (I know vector calculus, linear algebra, programming, and basic logic but that's about as far as I go.) If Lob's theorem is something that I should be able to understand the proof of given that background, I'd be interested to know that.
I was reading Eliezer's cartoon proof of Lob's theorem the other day and I didn't get it.
In order to understand it, I am currently reading Forever Undecided: A Puzzle Guide to Gödel. The book features a whole chapter about Löb's theorem.
The book does not have any prerequisites. It starts out with plain English logic puzzles that you need to solve (detailed solutions at the end of each chapter), and later introduces you to formal logic by translating those puzzles into propositional logic.
I have not yet finished the book, so I can't tell if it fits the purpose of understanding Löb's theorem. But what I can already tell is that it is really engaging and fascinating. Highly recommended!
ETA: To give a taste of Raymond M. Smullyan's style, check out his 'World's shortest explanation of Gödel's theorem':
We have some sort of machine that prints out statements in some sort of language. It needn't be a statement-printing machine exactly; it could be some sort of technique for taking statements and deciding if they are true. But let's think of it as a machine that prints out statements.
In particular, some of the statements that the machine might (or might not) print look like these:
P*x (w...
There are three ways to answer the free will/determinism question: I) yes, they're incompatible, but we have free will, II) yes, they're incompatible, and we don't have free will, III) they're not incompatible.
I've often heard EY's free will solution referred to as a form of (III), compatibilism. If this is the case, then I don't think I understand his argument. So far as I can tell, EY's solution is this:
1) free will is incompatible with determinism / the natural world is relevantly deterministic // we therefore do not have free will.
2) here is an error ...
One thing I read Eliezer as saying, in Dissolving the Question, is that the phenomenology of free will is more interesting than the metaphysics:
Your homework assignment is to write a stack trace of the internal algorithms of the human mind as they produce the intuitions that power the whole damn philosophical argument.
(This comment is not a full answer to that "homework assignment.")
In other words, it is a fact that humans do reasonably reliably possess the intuition, "I have free will." We do have that intuition; our having it is something to be explained. And it is a fact that when we examine the processes that we are made of — physics — we do not (contra Penrose) see anywhere for free will to sneak in. Brains use the same atoms that billiard balls and computers do.
(I don't know if you are a coder. A "stack trace" is a snapshot of what is going on, at a particular moment, at every level of abstraction in a computer program. Stack traces are often seen when a program crashes, to let the programmer follow the trail of the code bug or bad data that let to the crash. We might obtain a stack trace of consciousness through introspective techniques such...
I don't think that's EY's solution - I don't think his discussion of Free Will has anything to do with moral responsibility being a matter of convention.
From what I recall, the argument is something more like this: When people talk of "Free will", it's not clear what exactly they are referring to. If you try to pin down a more precise meaning that matches people's intuitions, you get something like "the subjective sensation of evaluating different available courses of action one might take" - and that is compatible with determinism (you can run decision algorithms in a perfectly deterministic binary world e.g. a simulated tile-based game world).
Does that make sense?
As far as I can reconstruct EDT's algorithm, it goes something like this:
1) I know that smoking is correlated with lung cancer.
2) I've read in a medical journal that smoking and lung cancer have a common cause, some kind of genetic lesion. I don't know if I have that lesion.
3) I'd like to smoke now, but I'm not sure if that's the best decision.
4) My friend, a causal decision theorist, told me that smoking or not smoking cannot affect the lesion that I already have or don't. But I don't completely buy that reasoning. I prefer to use something else, which I will call "evidential decision theory".
5) To figure out the best action to take, first I will counterfactually imagine myself as an automaton whose actions are chosen randomly, taking into account the lesion that I have or don't, using the frequencies observed in the world. So an automaton with the lesion will have a higher probability of smoking and a higher probability of cancer.
6) Next, I will figure out what the automaton's actions say about its utility, using ordinary conditional probabilities and expected values. It looks like the utility of automatons that smoke is lower than the utility of those that don't, because the former ones are more likely to get cancer.
7) Now I will remember that I'm not an automaton, and choose to avoid smoking based on the above reasoning!
Does that make sense?
What are the best arguments for/against some of MIRI's core positions.
-Tool AI and oracle AI are different. Oracles are agents in a box. Tools are not agents, so they can't take actions in the world or optimize an unfriendly utility function any more google maps optimizes a utility function. Why not just tell the AI to figure out physics/math/CS?
-emotions (like happiness/sadness) are vague concepts in the same way that objects are fuzzy concepts (think of
ErinFlight said:
Thinking about it, I realized that this might be a common concern. There are probably plenty of people who've looked at various more-or-less technical or jargony Less Wrong posts, tried understanding them, and then given up (without posting a comment explaining their confusion).
So I figured that it might be good to have a thread where you can ask for explanations for any Less Wrong post that you didn't understand and would like to, but don't want to directly comment on for any reason (e.g. because you're feeling embarassed, because the post is too old to attract much traffic, etc.). In the spirit of various Stupid Questions threads, you're explicitly encouraged to ask even for the kinds of explanations that you feel you "should" get even yourself, or where you feel like you could get it if you just put in the effort (but then never did).
You can ask to have some specific confusing term or analogy explained, or to get the main content of a post briefly summarized in plain English and without jargon, or anything else. (Of course, there are some posts that simply cannot be explained in non-technical terms, such as the ones in the Quantum Mechanics sequence.) And of course, you're encouraged to provide explanations to others!