Less Wrong is a community blog devoted to refining the art of human rationality. Please visit our About page for more information.

One-Magisterium Bayes

9 Post author: tristanm 29 June 2017 11:02PM

[Epistemic Status: Very partisan / opinionated. Kinda long, kinda rambling.]

In my conversations with members of the rationalist community as well as in my readings of various articles and blog posts produced by them (as well as outside), I’ve noticed a recent trend towards skepticism of Bayesian principles and philosophy (see Nostalgebraist’s recent post for an example), which I have regarded with both surprise and a little bit of dismay, because I think progress within a community tends to be indicated by moving forward to new subjects and problems rather than a return to old ones that have already been extensively argued for and discussed. So the intent of this post is to summarize a few of the claims I’ve seen being put forward and try to point out where I believe these have gone wrong.

It’s also somewhat an odd direction for discussion to be going in, because the academic statistics community has largely moved on from debates between Bayesian and Frequentist theory, and has largely come to accept both the Bayesian and the Frequentist / Fisherian viewpoints as valid. When E.T. Jaynes wrote his famous book, the debate was mostly still raging on, and many questions had yet to be answered. In the 21st century, statisticians have mostly come to accept a world in which both approaches exist and have their merits.

Because I will be defending the Bayesian side here, there is a risk that this post will come off as being dogmatic. We are a community devoted to free-thought after all, and any argument towards a form of orthodoxy might be perceived as an attempt to stifle dissenting viewpoints. That is not my intent here, and in fact I plan on arguing against Bayesian dogmatism as well. My goal is to argue that having a base framework with which to feel relatively high confidence in is useful to the goals of the community, and that if we feel high enough confidence in it, then spending  extra effort trying to prove it false might be wasting brainpower than can potentially be used on more interesting or useful tasks. There could always be a point we reach where most of us strongly feel that unless we abandon Bayesianism, we can’t make any further progress. I highly doubt that we have reached such a point or that we ever will.

This is also a personal exercise to test my understanding of Bayesian theory and my ability to communicate it. My hope is that if my ideas here are well presented, it should be much easier for both myself and others to find flaws with it and allow me to update.

I will start with an outline of philosophical Bayesianism, also called “Strong Bayesianism”, or what I prefer to call it, “One Magisterium Bayes.” The reason for wanting to refer to it as being a single magisterium will hopefully become clear. The Sequences did argue for this point of view, however, I think the strength of the Sequences had more to do with why you should update your beliefs in the face of new evidence, rather than why Bayes' theorem was the correct way to do this. In contrast, I think the argument for using Bayesian principles as the correct set of reasoning principles was made more strongly by E.T. Jaynes. Unfortunately, I feel like his exposition of the subject tends to get ignored relative to the material presented in the Sequences. Not that the information in the Sequences isn’t highly relevant and important, just that Jaynes' arguments are much more technical, and their strength can be overlooked for this reason. 

The way to start an exposition on one-magisterium rationality is by contrast to multi-magisteria modes of thought. I would go as far as to argue that the multi-magisterium view, or what I sometimes prefer to call tool-boxism, is by far the most dominant way of thinking today. Tool-boxism can be summarized by “There is no one correct way to arrive at the truth. Every model we have today about how to arrive at the correct answer is just that – a model. And there are many, many models. The only way to get better at finding the correct answer is through experience and wisdom, with a lot of insight and luck, just as one would master a trade such as woodworking. There’s nothing that can replace or supersede the magic of human creativity. [Sometimes it will be added:] Also, don’t forget that the models you have about the world are heavily, if not completely, determined by your culture and upbringing, and there’s no reason to favor your culture over anyone else’s.”

As I hope to argue in this post, tool-boxism has many downsides that should push us further towards accepting the one-magisterium view. It also very dramatically differs in how it suggests we should approach the problem of intelligence and cognition, with many corollaries in both rationalism and artificial intelligence. Some of these corollaries are the following:

  • If there is no unified theory of intelligence, we are led towards the view that recursive self-improvement is not possible, since an increase in one type of intelligence does not necessarily lead to an improvement in a different type of intelligence.
  • With a diversification in different notions of correct reasoning within different domains, it heavily limits what can be done to reach agreement on different topics. In the end we are often forced to agree to disagree, which while preserving social cohesion in different contexts, can be quite unsatisfying from a philosophical standpoint.
  • Related to the previous corollary, it may lead to beliefs that are sacred, untouchable, or based on intuition, feeling, or difficult to articulate concepts. This produces a complex web of topics that have to be avoided or tread carefully around, or a heavy emphasis on difficult to articulate reasons for preferring one view over the other.
  • Developing AI around a tool-box / multi-magisteria approach, where systems are made up of a wide array of various components, limits generalizability and leads to brittleness. 

One very specific trend I’ve noticed lately in articles that aim to discredit the AGI intelligence explosion hypothesis, is that they tend to take the tool-box approach when discussing intelligence, and use that to argue that recursive self-improvement is likely impossible. So rationalists should be highly interested in this kind of reasoning. One of Eliezer’s primary motivations for writing the Sequences was to make the case for a unified approach to reasoning, because it lends credence to the view of intelligence in which intelligence can be replicated by machines, and where intelligence is potentially unbounded. And also that this was a subtle and tough enough subject that it required hundreds of blog posts to argue for it. So because of the subtle nature of the arguments I’m not particularly surprised by this drift, but I am concerned about it. I would prefer if we didn’t drift.

I’m trying not to sound No-True-Scotsman-y here, but I wonder what it is that could make one a rationalist if they take the tool-box perspective. After all, even if you have a multi-magisterium world-view, there still always is an underlying guiding principle directing the use of the proper tools. Often times, this guiding principle is based on intuition, which is a remarkably hard thing to pin down and describe well. I personally interpret the word ‘rationalism’ as meaning in the weakest and most general sense that there is an explanation for everything – so intelligence isn’t irreducibly based on hand-wavy concepts such as ingenuity and creativity. Rationalists believe that those things have explanations, and once we have those explanations, then there is no further use for tool-boxism.

I’ll repeat the distinction between tool-boxism and one-magisterium Bayes, because I believe it’s that important: Tool-boxism implies that there is no underlying theory that describes the mechanisms of intelligence. And this assumption basically implies that intelligence is either composed of irreducible components (where one component does not necessarily help you understand a different component) or some kind of essential property that cannot be replicated by algorithms or computation.

Why is tool-boxism the dominant paradigm then? Probably because it is the most pragmatically useful position to take in most circumstances when we don’t actually possess an underlying theory. But the fact that we sometimes don’t have an underlying theory or that the theory we do have isn’t developed to the point where it is empirically beating the tool box approach is sometimes taken as evidence that there isn't a unifying theory. This is, in my opinion, the incorrect conclusion to draw from these observations.

Nevertheless, it seems like a startlingly common conclusion to draw. I think the great mystery is why this is so. I don’t have very convincing answers to that question, but I suspect it has something to do with how heavily our priors are biased against a unified theory of reasoning. It may also be due to the subtlety and complexity of the arguments for a unified theory. For that reason, I highly recommend reviewing those arguments (and few people other than E.T. Jaynes and Yudkowsky have made them). So with that said, let’s review a few of those arguments, starting with one of the myths surrounding Bayes theorem I’d like to debunk:

Bayes Theorem is a trivial consequence of the Kolmogorov Axioms, and is therefore not powerful.

This claim usually presented as part of a claim that “Bayesian” probability is just a small part of regular probability theory, and therefore does not give us any more useful information than you’d get from just studying probability theory. And as a consequence of that, if you insist that you’re a “Strong” Bayesian, that means you’re insisting on using only on that small subset of probability theory and associated tools we call Bayesian.

And the part of the statement that says the theorem is a trivial consequence of the Kolmogorov axioms is technically true. It’s the implication typically drawn from this that is false. The reason it’s false has to do with Bayes theorem being a non-trivial consequence of a simpler set of axioms / desiderata. This consequence is usually formalized by Cox’s theorem, which is usually glossed over or not quite appreciated for how far-reaching it actually is.

Recall that the qualitative desiderata for a set of reasoning rules were:

  1. Degrees of plausibility are represented by real numbers.
  2. Qualitative correspondence with common sense.
  3. Consistency. 

You can read the first two chapters of Jaynes’ book, Probability Theory: The Logic of Science if you want more detail into what those desiderata mean. But the important thing to note from them is that they are merely desiderata, not axioms. This means we are not assuming those things are already true, we just want to devise a system that satisfies those properties. The beauty of Cox’s theorem is that it specifies exactly one set of rules that satisfy these properties, of which Bayes Theorem as well as the Kolmogorov Axioms are a consequence of those rules.

The other nice thing about this is that degrees of plausibility can be assigned to any proposition, or any statement that you could possibly assign a truth value to. It does not limit plausibility to “events” that take place in some kind of space of possible events like whether a coin flip comes up heads or tails. What’s typically considered the alternative to Bayesian reasoning is Classical probability, sometimes called Frequentist probability, which only deals with events drawn from a sample space, and is not able to provide methods for probabilistic inference of a set of hypotheses.

For axioms, Cox’s theorem merely requires you to accept Boolean algebra and Calculus to be true, and then you can derive probability theory as extended logic from that. So this should be mindblowing, right? One Magisterium Bayes? QED? Well apparently this set of arguments is not convincing to everyone, and it’s not because people find Boolean logic and calculus hard to accept.

Rather, there are two major and several somewhat minor difficulties encountered within the Bayesian paradigm. The two major ones are as follows:

  • The problem of hypothesis generation.
  • The problem of assigning priors. 

The list of minor problems are as follows, although like any list of minor issues, this is definitely not exhaustive:

  • Should you treat “evidence” for a hypothesis, or “data”, as having probability 1?
  • Bayesian methods are often computationally intractable.
  • How to update when you discover a “new” hypothesis.
  • Divergence in posterior beliefs for different individuals upon the acquisition of new data.

Most Bayesians typically never deny the existence of the first two problems. What some anti-Bayesians conclude from them, though, is that Bayesianism must be fatally flawed due to those problems, and that there is some other way of reasoning that would avoid or provide solutions to those problems. I’m skeptical about this, and the reason I’m skeptical is because if you really had a method for say, hypothesis generation, this would actually imply logical omniscience, and would basically allow us to create full AGI, RIGHT NOW. If you really had the ability to produce a finite list containing the correct hypothesis for any problem, the existence of the other hypotheses in this list is practically a moot point – you have some way of generating the CORRECT hypothesis in a finite, computable algorithm. And that would make you a God.

As far as I know, being able to do this would imply that P = NP is true, and as far as I know, most computer scientists do not think it’s likely to be true (And even if it were true, we might not get a constructive proof from it).  But I would ask: Is this really a strike against Bayesianism? Is the inability of Bayesian theory to provide a method for providing the correct hypothesis evidence that we can’t use it to analyze and update our own beliefs?

I would add that there are plenty of ways to generate hypotheses by other methods. For example, you can try to make the hypothesis space gargantuan, and encode different hypotheses in a vector of parameters, and then use different optimization or search procedures like evolutionary algorithms or gradient descent to find the most likely set of parameters. Not all of these methods are considered “Bayesian” in the sense that you are summarizing a posterior distribution over the parameters (although stochastic gradient descent might be). It seems like a full theory of intelligence might include methods for generating possible hypotheses. I think this is probably true, but I don’t know of any arguments that it would contradict Bayesian theory.

The reason assigning prior probabilities is such a huge concern is that it forces Bayesians to hold “subjective” probabilities, where in most cases, if you’re not an expert in the domain of interest, you don’t really have a good argument for why you should hold one prior over another. Frequentists often contrast this with their methods which do not require priors, and thus hold some measure of objectivity.

E.T. Jaynes never considered to this be a flaw in Bayesian probability, per se. Rather, he considered hypothesis generation, as well as assigning priors, to be outside the scope of “plausible inference” which is what he considered to be the domain of Bayesian probability. He himself argued for using the principle of maximum entropy for creating a prior distribution, and there are also more modern techniques such as Empirical Bayes.

In general, Frequentists often have the advantage that their methods are often simpler and easier to compute, while also having strong guarantees about the results, as long as certain constraints are satisfied. Bayesians have the advantage that their methods are “ideal” in the sense that you’ll get the same answer each time you run an analysis. And this is the most common form of the examples that Bayesians use when they profess the superiority of their approach. They typically show how Frequentist methods can give both “significant” and “non-significant” labels to their results depending on how you perform the analysis, whereas the Bayesian way just gives you the probability of the hypothesis, plain and simple.

I think that in general, once could say that Frequentist methods are a lot more “tool-boxy” and Bayesian methods are more “generally applicable” (if computational tractability wasn’t an issue).  That gets me to the second myth I’d like to debunk:

Being a “Strong Bayesian” means avoiding all techniques not labeled with the stamp of approval from the Bayes Council.

Does this mean that Frequentist methods, because they are tool box approaches, are wrong or somehow bad to use, as some argue that Strong Bayesians claim? Not at all. There’s no reason not to use a specific tool, if it seems like the best way to get what you want, as long as you understand exactly what the results you’re getting mean. Sometimes I just want a prediction, and I don’t care how I get it – I know that a specific algorithm being labeled “Bayesian” doesn’t confer it any magical properties. Any Bayesian may want to know the frequentist properties of their model. It’s easy to forget that different communities of researchers flying the flag of their tribe developed some methods and then labeled them according to their tribal affiliation. That’s ok. The point is, if you really want to have a Strong Bayesian view, then you also have to assign probabilities to various properties of each tool in the toolbox.

Chances are, if you’re a statistics/data science practitioner with a few years of experience applying different techniques to different problems and different data sets, and you have some general intuitions about which techniques apply better to which domains, you’re probably doing this in a Bayesian way. That means, you hold some prior beliefs about whether Bayesian Logistic Regression or Random Forests is more likely to get what you want on this particular problem, you try one, and possibly update your beliefs once you get a result, according to what your models predicted.

Being a Bayesian often requires you to work with “black boxes”, or tools that you know give you a specific result, but you don’t have a full explanation of how it arrives at the result or how it fits in to the grand scheme of things. A Bayesian fundamentalist may refuse to work with any statistical tool like that, not realizing that in their everyday lives they often use tools, objects, or devices that aren’t fully transparent to them. But you can, and in fact do, have models about how those tools can be used and the results you’d get if you used them. The way you handle these models, even if they are held in intuition, probably looks pretty Bayesian upon deeper inspection.

I would suggest that instead of using the term “Fully Bayesian” we use the phrase “Infinitely Bayesian” to refer to using a Bayesian method for literally everything, because it more accurately shows that it would be impossible to actually model every single atom of knowledge probabilistically. It also makes it easier to see that even the Strongest Bayesian you know probably isn’t advocating this.

Let me return to the “minor problems” I mentioned earlier, because they are pretty interesting.  Some epistemologists have a problem with Bayesian updating because it requires you to assume that the “evidence” you receive at any given point is completely true with probability 1. I don’t really understand why it requires this. I’m easily able to handle the case where I’m uncertain about my data. Take the situation where my friend is rolling a six-sided die, and I want to know the probability of it coming up 6. I assume all sides are equally likely, so my prior probability for 6 is 1/6. Let’s say that he rolls it where I can’t see it, and then tells me the die came up even. What do I update p(6) to?

Let’s say that I take my data as saying “the die came up even.” Then p(6 | even) = p(even | 6) * p(6) / p(even) = 1 * (1/6) / (1 / 2) = 1/3. Ok, so I should update p(6) to 1/3 now right? Well, that’s only if I take the evidence of “the die came up even” as being completely true with probability one. But what actually happened is that my friend TOLD ME the die came up even. He could have been lying, maybe he forgot what “even” meant, maybe his glasses were really smudged, or maybe aliens took over his brain at that exact moment and made him say that. So let’s say I give a 90% chance to him telling the truth, or equivalently, a 90% chance that my data is true. What do I update p(6) to now?

It’s pretty simple. I just expand p(6) over “even” as p(6) = p(6 | even) p(even)  + p(6 | odd) p(odd). Before he said anything, p(even) = p(odd) and this formula evaluated to (1/3)(1/2) + (0)(1/2) = 1/6, my prior. After he told me the die came up even, I update p(even) to 0.9, and this formula becomes (1/3)(9/10) + (0)(1/10) = 9/30. A little less than 1/3. Makes sense.

In general, I am able to model anything probabilistically in the Bayesian framework, including my data. So I’m not sure where the objection comes from. It’s true that from a modeling perspective, and a computational one, I have to stop somewhere, and just accept for the sake of pragmatism that probabilities very close to 1 should be treated as if they were 1, and not model those. Not doing that, and just going on forever, would mean being Infinitely Bayesian. But I don’t see why this counts as problem for Bayesianism. Again, I’m not trying to be omniscient. I just want a framework for working with any part of reality, not all of reality at once. The former is what I consider “One Magisterium” to mean, not the latter.

The rest of the minor issues are also related to limitations that any finite intelligence is going to have no matter what. They should all, though, get easier as access to data increases, models get better, and computational ability gets better.

Finally, I’d like to return to an issue that I think is most relevant to the ideas I’ve been discussing here. In AI risk, it is commonly argued that a sufficiently intelligent agent will be able to modify itself to become more intelligent. This premise assumes that an agent will have some theory of intelligence that allows it to understand which updates to itself are more likely to be improvements. Because of that, many who argue against “AI Alarmism” will argue against the premise that there is a unified theory of intelligence. In “Superintelligence: The Idea that Eats Smart People”, I think most of the arguments can be reduced to basically saying as much.

From what I can tell, most arguments against AI risk in general will take the form of anecdotes about how really really smart people like Albert Einstein were very bad at certain other tasks, and that this is proof that there is no theory of intelligence that can be used to create a self-improving AI. Well, more accurately, these arguments are worded as “There is no single axis on which to measure intelligence” but what they mean is the former, since even multiple axes of intelligence (such as measure of success on different tasks) would not actually imply that there isn’t one theory of reasoning. What multiple axes of measuring intelligence do imply is that within a given brain, the brain may have devoted more space to better modeling certain tasks than others, and that maybe the brain isn’t quite that elastic, and has a hard time picking up new tasks.

The other direction in which to argue against AI risk is to argue against the proposed theories of reasoning themselves, like Bayesianism. The alternative, it seems, is tool-boxism. I really want to avoid tool-boxism because it makes it difficult to be a rationalist. Even if Bayesianism turns out to be wrong, does this exclude other, possibly undiscovered theories of reasoning? I’ve never seen that touched upon by any of the AI risk deniers. As long as there is a theory of reasoning, then presumably a machine intelligence could come to understand that theory and all of its consequences, and use that to update itself.

I think the simplest summary of my post is this: A Bayesian need not be Bayesian in all things, for reasons of practicality. But a Bayesian can be Bayesian in any given thing, and this is what is meant by “One Magisterium”.

I didn’t get to cover every corollary of tool-boxing or every issue with Bayesian statistics, but this post is already really long, and for the sake of brevity I will probably end it here. Perhaps I can cover those issues more thoroughly in a future post. 

Comments (104)

Comment author: nostalgebraist 30 June 2017 09:39:27PM *  6 points [-]

As the author the post you linked in the first paragraph, I may be able to provide some useful context, at least for that particular post.

Arguments for and against Strong Bayesianism have been a pet obsession of mine for a long time, and I've written a whole bunch about them over the years. (Not because I thought it was especially important to do so, just because I found it fun.) The result is that there are a bunch of (mostly) anti-Bayes arguments scattered throughout several years of posts on my tumblr. For quite a while, I'd had "put a bunch of that stuff in a single place" on my to-do list, and I wrote that post just to check that off my to-do list. Almost none of the material in there is new, and nothing in there would surprise anyone who had been keeping up with the Bayes-related posts on my tumblr. Writing the post was housekeeping, not nailing 95 theses on a church door.

As you might expect, I disagree with a number of the more specific/technical claims you've made in this post, but I am with you in feeling like these arguments are retreading old ground, and I'm at the point where writing more words on the internet about Bayes has mostly stopped being fun.

It's also worth noting that my relation to the rationalist community is not very goal-directed. I like talking to rationalists, I do it all the time on tumblr and discord and sometimes in meatspace, and I find all the big topics (including AGI stuff) fun to talk about. I am not interested in pushing the rationalist community in one direction or another; if I argue about Bayes or AGI, it's in order to have fun and/or because I value knowledge and insight (etc.) in general, not because I am worried that rationalists are "wasting time" on those things when they could be doing some other great thing I want them to do. Stuff like "what does it even mean to be a non-Bayesian rationalist?" is mostly orthogonal to my interests, since to me "rationalists" just means "a certain group of people whose members I often enjoy talking to."

Comment author: tristanm 30 June 2017 10:11:48PM *  2 points [-]

Thanks for your response. I did find your post very interesting and enjoyable to read.

Incidentally, it is mostly my worry that retreading old ground might be less valuable to the community, and that it might be useful to accept a common framework, not necessarily that anyone was arguing the rationalist community as a whole should move in a certain direction (in reverse momentum from wherever they were headed before), or accept a different framework. I'm probably more goal directed than most of the rationalist community is, but that could be due to an idealism that hasn't had time to have been tempered yet.

Comment author: TheAncientGeek 01 July 2017 11:52:30AM 2 points [-]

I keep being surprised at how little rationalists care about what's true.if you got something right first-time around there is no need to revisit it, if you didn't there is.There is no general rule against revisiting.

Comment author: tristanm 02 July 2017 03:31:40AM 0 points [-]

On the contrary, I think rationalists are often overly hesitant to act at all, or pursue much of any concrete goals, until they have reached a quite high threshold of certainty about whether or not they are correct about those goals first. If rationalists really didn't care about what's true, you'd probably see a lot more aggressive risk taking by them. But our problem seems to be risk aversion, not recklessness.

Comment author: TheAncientGeek 02 July 2017 08:08:08AM 0 points [-]

Let mere rephrase it: I don't see why you care so little about what is true. You are arguing for string Bayesianism on the ground that it would be nice if it worked, not on the grounds that it works.

Comment author: tristanm 02 July 2017 02:15:47PM 0 points [-]

I am arguing against tool-boxism, on the grounds that if it were accepted as true (I don't think it can actually be true in a meaningful sense) you basically give up on the ability to converge on truth in an objective sense. Any kind of objective principles would not be tool-boxism.

It seems that those who feel that tool-boxism is false, seem to converge on Bayesianism as a set of principles, not that they are the full story, or that there are no other consequences or ways to extend them, but as a set of principles with no domain in which they can both be meaniningfully applied and where they give the wrong answer.

Comment author: TheAncientGeek 03 July 2017 08:23:46AM *  1 point [-]

I am arguing against tool-boxism, on the grounds that if it were accepted as true (I don't think it can actually be true in a meaningful sense) you basically give up on the ability to converge on truth in an objective sense.

You need to distinguish between truth and usefulness. If the justification of using different tools is purely on the basis of efficiency (in the limit, being able to solve a problem at all), then nothing is implied about the ability to converge on truth. Toolbox-ism does not necessarily imply pluralism in the resulting maps. There is also a thing where people advocate the use of multiple theories with different content, leading to an overall pluralism/relativism, but in view of the usefulness/truth distinction that is a different thing.

It seems that those who feel that tool-boxism is false, seem to converge on Bayesianism as a set of principles, not that they are the full story,

If they are not the full story, then you need other tools. You are saying contradictory things. Sometimes you say Bayes is the only tool you need, sometimes you say it can only do one thing.

but as a set of principles with no domain in which they can both be meaniningfully applied and where they give the wrong answer.

Not giving the wrong answer is not a sufficient criterion for giving the right answer. To get the right answer, you need to get the hypothesis that corresponds to reality, somehow, and you need to confirm it. Recall that Bayes does not give you any method for generating hypotheses, let alone one guaranteed to generate the one true on in an acceptable period of time. So Bayes does not guarantee truth -- truth as correspondence, that is.

Comment author: ChristianKl 02 July 2017 06:09:38PM *  0 points [-]

I am arguing against tool-boxism, on the grounds that if it were accepted as true (I don't think it can actually be true in a meaningful sense) you basically give up on the ability to converge on truth in an objective sense. Any kind of objective principles would not be tool-boxism.

This sounds like you argue against it on the grounds that you don't like a state of affairs where tool-boxism is true, so you assume it isn't. This seems to me like motivated reasoning.

It's structurally similar to the person who says they are believing in God because if God doesn't exist that would mean that life is meaningless.

Comment author: tristanm 02 July 2017 07:49:41PM 0 points [-]

I don't think it's possible to have unmotivated reasoning. Nearly all reasoning begins by assuming a set of propositions, such as axioms, to be true, before following all the implications. If I believe objectivity is true, then I want to know what follows from it. Note that Cox's theorem proceeds similarly, by forming a set of desiderata first, and then finding a set of rules that satisfies them. Do you not consider this chain of reasoning to be valid?

(If I strongly believed "life is meaningless" to be false, and I believed that "God does not exist implies life is meaningless" then concluding from those that God exists is logically valid. Whether or not the two first propositions are themselves valid is another question)

Comment author: TheAncientGeek 03 July 2017 08:07:42AM 0 points [-]

There's motivation and there's motivation. Bad motivation is when an object-level proposition is taken as the necessary output of an epistemological process, and the epistemology is chose to beg the question. Good motivation is avoiding question-begging in your epistemology.

Comment author: ChristianKl 02 July 2017 08:30:59PM 0 points [-]

One thing about that chain of reasoning is that it's very unbayesian. We have catch-phrases like "0 and 1 aren't probabilities". Even if they are, how do you get your 1 as probability for the thesis of objectivity being true?

Comment author: entirelyuseless 01 July 2017 02:39:05PM 0 points [-]

I keep being surprised at how little rationalists care about what's true.

I keep being surprised when I see anyone at all act a little bit like they care about what's true, including me.

Comment author: cousin_it 30 June 2017 10:16:15AM *  4 points [-]

Keeping in my tradition of telling people to be less confident...

I strongly agree that the world is built on logic that can be understood by the individual human mind. And I think it's likely that there are simple principles for correct reasoning, which might lead to intelligence explosion. Yay to you for resisting backwards drift on that!

But maybe let's not tie that to the idea that all correct reasoning must approximate Bayes. Ironically, LW is the best source for arguments why Bayesian probability is itself an approximation to some more precise theory of uncertainty (UDT, Absent-Minded Driver, Psy-Kosh's problem, Counterfactual Mugging, etc) and the many problems that remain even then (nature of observation, nature of priors, logical uncertainty, etc). In the end, a theory of uncertainty doesn't just have to be correct in itself, it must also accurately model uncertainty, so it's tied up with what it means to be an agent. We haven't even scratched the surface of that.

Comment author: tristanm 30 June 2017 01:20:39PM 0 points [-]

During a physics lecture on quantum mechanics I was in once, the professor stated that theories like quantum field theory, string theory, and other types of quantum gravity were contained within plain quantum mechanics, because all of them had to work within the quantum framework (in the sense that they were quantum mechanics with more assumptions added).

I wonder if something similar is true for Bayesian probability, and the theories like UDT, Logical induction and things like that. Do any of these extensions violate Bayesian principles, making them overlap with them rather than contain them?

Comment author: cousin_it 30 June 2017 01:37:35PM *  2 points [-]

I think they violate. The Absent-Minded Driver problem is the simplest example, constructed to violate the independence axiom of vNM. Logical induction also, because the only position fully compatible with Bayes is logical omniscience, and we want to model logical non-omniscience (not knowing all true theorems). To tell an agent what to do in a situation, we need a model of uncertainty for the agent in the situation, which can be as complex as the agent and the situation. Bayesian probability is more of a tractable limit case, like Newtonian mechanics or Nash equilibrium.

Comment author: MrMind 03 July 2017 01:56:39PM 0 points [-]

These are not violation of Bayesian probability. VNM rationality exists independently of Bayes, logical induction might be a coherent extension of Bayes probability where classical logic (which is the one presupposing omniscience) is not applicable, UDT similarly presupposes logical omniscience, counterfactual mugging is a problem of decision theory, not probability, etc.
Let's keep Bayesian probability, decision theory, VNM rationality, classical logic, etc. all well separated.

Comment author: cousin_it 03 July 2017 02:29:20PM 0 points [-]

If you separate Bayesian probability from decision theory, then it has no justification except self-consistency, and you can no longer say that all correct reasoning must approximate Bayes (which is the claim under discussion).

Comment author: ksvanhorn 05 July 2017 05:27:11AM 0 points [-]

Sure it does. Haven't you heard of Cox's Theorem? It singles out (Bayesian) probability theory as the uniquely determined extension of propositional logic to handle degrees of certainty. There's also my recent paper, "From Propositional Logic to Plausible Reasoning: A Uniqueness Theorem"

https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fauthors.elsevier.com%2Fa%2F1VIqc%2CKD6ZCKMf&data=02%7C01%7C%7C12e6bb32616e4a953bb808d4bfe40576%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636344433443102669&sdata=9lY8lw3AEn8Hw5IuPxo2YPcLadVhyXR5b98rULWC8nE%3D&reserved=0

or

https://arxiv.org/abs/1706.05261

Comment author: MrMind 03 July 2017 02:44:02PM 0 points [-]

Correct inference must approximate Bayes. Correct reasoning is inference + hypothesis generations / update + what counts as evidence?
Decision theories are concerned with the last piece of the puzzle.
If I'm wrong, please show me a not obviously wrong theory that violates Bayes theorem...

Comment author: whpearson 30 June 2017 07:29:49AM 3 points [-]

I think that bayes with resource constraints looks a lot like toolbox-ism.

Take for example the problem of reversing MD5 hashes. You could bayesianly update your probabilities of which original string goes to which hash, but computationally there is no uncertainty so there is no point storing probabilities, so you strip those out. Or you could just download the tool of a rainbow table and use that and not have to compute the hashes yourself at all.

Or having to compute the joint probability of two sets of data that comes in at different times. Say you always have the event A at time T, but what happened to B at T can come in immediately or days to weeks later. You might not be able to store all the information about A if it is happening quickly (e.g. network interface activity). So you have to chose which data to drop. Perhaps you age it out after a certain time or drop things that are usual, but they still stop you being able to bayesianly update properly when that B comes in. Bayes doesn't say what you should drop. You should drop the unimportant stuff, but figuring out what is unimportant a priori seems implausible.

So I still use bayes upon occasion when it makes sense and still think AI could be highly risky. But I think I'm a toolbox-ist.

Comment author: Oscar_Cunningham 30 June 2017 10:45:56AM 2 points [-]

I agree. In particular I've noticed that a lots of frequentist methods can be described in terms of the Bayesian method by replacing a step where you take the expected value of some quantity by instead taking the worst case of this quantity. This improves the computational efficiency by avoiding a difficult integral. Of course we choose the worst case rather than the best case because humans are risk averse. This shows an interesting fact: Bayesian methods are the same for every agent, but when resource constraints force you away from Bayesian methods your inferences can end up depending on your utility function. (And because humans are risk averse the frequentist method often looks like it is being more responsible and conservative than the Bayesian method even though the Bayesian method will in fact always produce predictions with an optimal amount of risk-averseness if you use it with an accurate utility function.)

Comment author: IlyaShpitser 30 June 2017 12:31:34AM 3 points [-]

Are you a statistician?

If yes: what's your favorite paper you wrote on Bayes?

If not: why are you telling experts what to do?

Comment author: gilch 30 June 2017 03:46:51AM *  2 points [-]

-1

A request for the "argument from authority" fallacy. Freethinkers discuss ideas directly on their merits, not on their author's job description. A rationalist doesn't ignore any evidence, of course (even authors' job descriptions), but try to weight them accurately, okay?

Comment author: IlyaShpitser 30 June 2017 05:35:48AM *  4 points [-]

Will do, thank you.

edit:

"Chances are, if you’re a statistics/data science practitioner with a few years of experience applying different techniques to different problems and different data sets, and you have some general intuitions about which techniques apply better to which domains, you’re probably doing this in a Bayesian way."

Freethinkers who might know a little bit about statistics/data science would, presumably by your lights, stop reading here.


I have seen a lot of this type of overconfident stuff out of LW over the years. That is, in Bayesian terms, I already had a prior, and it already updated in a "read a textbook and stop blogging" direction.

Comment author: tristanm 30 June 2017 04:40:25PM 1 point [-]

In general, there is no good way for me to know what your prior is on the subject I'm going to write, unless I already knew you really well. But it's unreasonable to expect me to know what all of my audiences' priors are and try to write something that agrees with them. I don't think it's possible. I'm not optimizing for the argmax of the less wrong commenters anyway. I want to write something that's probably wrong, so I can update from it if I need to. Your kind of responses, and saying "read a textbook and stop blogging" seems to go against the spirit of free thought and debate anyway. Personally, I think the tendency of people to respond in that way is probably why communities like these dissipate after a while. I wrote a post about this subject as well.

Comment author: IlyaShpitser 30 June 2017 04:46:41PM *  2 points [-]

Why write about technical stuff you don't really know very well? How did you expect that to go?

I find this attitude of "lets blog about stuff that's probably wrong with the aim of updating later when people correct you" kind of a weird way to go. Why not just go and read directly about what you want to learn about?

Your way helps you, of course, but generates negative externalities for everybody else (who have to either spend time correcting you if they know better, or get misled by you if they don't).


Wasn't scholarship (e.g. reading stuff) one of the virtues?

Comment author: TheAncientGeek 01 July 2017 11:57:35AM 1 point [-]

Scholarship was mentioned as one of the virtues, but EY didn't put it into practice much. People tend to learn by imitation.

Comment author: tristanm 30 June 2017 04:57:02PM 1 point [-]

At what point do you consider yourself to have read enough? At what point do you decide that you've read enough textbooks and now it's ok to blog?

Also, I don't see how this creates negative externalities on people. That kind of assumes a bizarre situation where people are either forced to read, or forced to respond to things. Apply that reasoning to basically the entire internet, social media, every day discussions with people, and you basically have to quarantine yourself from most of the world to avoid that risk. Or you conclude that all speech has to be meticulously curated so that there is very low risk of misleading, offending, upsetting, or otherwise wasting someone's time.

Comment author: IlyaShpitser 30 June 2017 06:44:40PM *  3 points [-]

"At what point do you consider yourself to have read enough?"

How about a single statistics class at a university. At that point one might appreciate the set of things one might not know about yet. In reality, though, I feel that if you want to blog about technical topics, you should be an expert on said technical topics. If you are not an expert, it seems you should listen, not talk.

"Also, I don't see how this creates negative externalities on people."

Conditional on you being wrong, you should expect no negative externalities only if you expect the activity of blogging to be akin to pissing into the wind -- you don't expect folks to take you seriously anyways. If you are not an expert yet, you should not be very confident about avoiding being wrong on technical topics.

If folks do take you seriously, they either get the wrong idea, or have to spend energy correcting you, or leave it alone, and let you mislead others.

Comment author: ChristianKl 01 July 2017 12:12:57PM 2 points [-]

I don't think that having a conversation with someone who's wrong is necessarily bad for myself. Arguing against someone who's wrong can help me to clarify my own thoughts on a topic.

CFAR supports the notion that one of the best ways to learn is to teach. Mixing reading textbooks passively with active argument is good for learning a subject well.

Comment author: IlyaShpitser 02 July 2017 01:45:47PM 1 point [-]

That's fine, but can OP at least preface with [Epistemic status: may not know what I am talking about]?

Comment author: ChristianKl 02 July 2017 02:03:00PM 1 point [-]

What did you expect with "Very partisan / opinionated"? I don't think that's how the average academic expert would preface his professional position if academics would be in the habit of stating the epistemic status.

Comment author: tristanm 30 June 2017 08:21:59PM 2 points [-]

How about a single statistics class at a university.

Not sure why you would say this (assuming I haven't even done that) and then immediately admit that you expect something much higher. What that level of expertise is I'm not sure, but probably having a Ph.D in statistics?

I have an undergraduate degree in math / physics, and I've been working at a data science job for 3 years, while spending most of my free time studying these subjects. I wouldn't call myself an expert, but at least personally, I think I've reached a point where I can feasibly have discussions with people about statistics / ML, and not say things that are totally far off from where at least a certain mode of experts are on the subject.

Of course, the topic I was discussing is actually somewhere on the border of statistics, mathematics, and philosophy, and my guess is there are few academic programs that focus specifically on that overlapping region. That makes it very unlikely for anyone on this site to be at the level of expertise you demand. And if the subject is really that esoteric, it also makes it more unlikely that someone would somehow damagingly misuse what they read here. There are no infohazards (as far as I know) in my post, and there really aren't any concrete suggestions for actions to take, either.

Comment author: IlyaShpitser 01 July 2017 07:42:13AM *  1 point [-]

Maybe there is a cultural/generational difference here.


I have seen very little on Bayes out of LW over the years I agree with -- take it as a datapoint if you wish. Most of it is somewhere between at least somewhat wrong and not even wrong.


Hanson had a post somewhere on how folks should practice holding strong opinions and arguing for them, but not taking the whole thing very seriously. Maybe that's what you are doing.

Comment author: tristanm 03 July 2017 07:48:03PM 0 points [-]

There may indeed be a cultural difference here.

LessWrong has tended towards skepticism (though not outright rejection) of academic credentials ( consider Eliezer's "argument trumps authority" discussions in the Sequences). However, this site is more or less a place for somewhat informal intellectual discussion. It is not an authoritative information repository, and as far as I can tell, does not claim to be. Anyone who participates in discussions here is probably well aware of this fact, and is fully expected to be able to consider the arguments here, not take them at face value.

If you disagree with some of the core ideas around this community (like Bayesian epistemology), as well as what you perceive to be the "negative externalities" of the tendency towards informal / non-expert discussion, then to me it seems likely that you disagree with certain aspects of the culture here. But you seem to have chosen to oppose those aspects, rather than simply choosing not to participate.

Comment author: TheAncientGeek 01 July 2017 12:05:09PM 0 points [-]

The problem of how much knowledge is enough has an age old solution: academic credentials.

Comment author: entirelyuseless 01 July 2017 02:34:57PM 0 points [-]

I think the real answer is about people's motives.

Reading stuff without talking about it isn't going to impress anyone, since they won't even know.

Comment author: MrMind 03 July 2017 01:50:45PM 1 point [-]

If not: why are you telling experts what to do?

Because "experts" are fucking it up left and right.

Comment author: cousin_it 03 July 2017 02:40:58PM *  2 points [-]

Ilya is a student and coauthor of Judea Pearl, whose work on causality and Bayes nets was cited by Eliezer many times. He's an expert at the stuff that LW is amateuring in.

Comment author: MrMind 03 July 2017 02:54:06PM *  0 points [-]

A: Ilya is a statistician.

B: Ilya is an expert in Bayes probability, and is never wrong.

So:

C: Every statistician is an expert in Bayes probability, and they are never wrong.

Corollary: the replication crysis is a conspiracy of the Bayes Shadow Government.

Comment author: IlyaShpitser 03 July 2017 03:07:58PM *  2 points [-]

Psychologists are not statisticians, though. Generally they are relatively naive users of stats methods (as are a lot of other applied folks, e.g. doctors that publish, cognitive scientists, social scientists, epidemiologists, etc.) Ideally, methods folks and applied folks collaborate, but this does not always happen.


You can fish for positive findings with B methods just fine, the issue isn't F vs B, the issue is bad publication incentives.


There is also a little bit of "there is a huge replication crisis on, long story short, we should read this random dude's blog (with apologies to the OP)."


Pearl is, apparently, only half Bayesian.


I am wrong a lot -- I can point you to some errors in my papers if you want.

Comment author: MrMind 03 July 2017 03:55:11PM 0 points [-]

The replication crysis is decomposable into many pieces, two of which are surely bad incentives and relative inexperience of the "applied folks". Another though is, that's the main point, that frequentist methods are a set of ad-hoc, poorly explained, poorly understood heuristics. No wonder that they are used improperly.
On the other hand, I've seen the crysis explained mostly by Bayesian statisticians, so I'm possibly in a bubble. If you can point me to a frequentist explanation I would be glad to pop it.

I am wrong a lot -- I can point you to some errors in my papers if you want.

Apparently though, cousin_it thinks you cannot be criticized or argued against...

Comment author: IlyaShpitser 03 July 2017 04:00:23PM *  2 points [-]

"Another though[t] is, that's the main point, that frequentist methods are a set of ad-hoc, poorly explained, poorly understood heuristics."

I don't think so. This is what LW repeatedly gets wrong, and I am kind of tired of talking about it. How are you so confident re: what frequentist methods really are about, if you aren't a statistician? This is incredibly bizarre to me.

Rather than argue about it constantly, which I am very very tired of doing (see above "negative externalities"), I can point you to Larry Wasserman's book "All of Statistics." It's a nice frequentist book. Start there, perhaps. Larry is very smart, one of the smartest statisticians alive, I think.


Apparently though, cousin_it thinks you cannot be criticized or argued against...

My culture thrives on peer review, as much as we grumble about it. Emphasis on "peer," of course.

You should probably be a bit more charitable to cousin_it, he's very smart too.

Comment author: MrMind 03 July 2017 04:30:14PM 1 point [-]

what frequentist methods really are about, if you aren't a statistician?

I was under the impression that it was sufficient to read statistics books. Apparently though, you need also to be anointed by another statistician to even talk about the subject.

My culture thrives on peer review, as much as we grumble about it. Emphasis on "peer," of course.

You seem to imply that no statistician has ever criticized frequentist methods. LW is just parroting what others, more expert men already said.

You should probably be a bit more charitable to cousin_it, he's very smart too.

Isn't it, as long as you're making an incorrect statement, irrelevant how intelligent you are? Jaynes was wrong about quantum mechanics. Einstein was wrong about the unified field.
Everybody can be wrong, no matter how respected or intelligent they are.

Comment author: IlyaShpitser 03 July 2017 04:34:22PM *  3 points [-]

"I was under the impression that it was sufficient to read statistics books."

Ok, what have you read?

I am not the "blogging police," I am just saying, based on past experience, that when people who aren't statisticians talk about these issues, the result is very low quality. So low that it would have been better to stay silent. Statistics is a very mathematical field. These types of arguments are akin to "should we think about mathematics topologically or algebraically?"


"You seem to imply that no statistician has ever criticized frequentist methods."

See "Tom Knight and the LISP machine":

http://catb.org/jargon/html/koans.html

One of these koans is pretty Bayesian, actually, the one about tic-tac-toe.


"Isn't it, as long as you're making an incorrect statement, irrelevant how intelligent you are?"

Sure is, but how certain are you it's incorrect? If uncertain, intelligence is useful information you should Bayes Theorem in.

And anyways, charity is about interpreting reasonably what people say.

Comment author: MrMind 04 July 2017 01:12:18PM 0 points [-]

Ok, what have you read?

The pretty standard Bayesian curriculum: De Finetti, Jaynes-Bretthorst, Sivia.

See "Tom Knight and the LISP machine":

I love Lisp koans much more than I love Lisp... Anyway, it's still a question of knowing a subject, not being part of a cabal.

Sure is, but how certain are you it's incorrect? If uncertain, intelligence is useful information you should Bayes Theorem in.

Well, I prefer evidence to signalling: if the problems is only my tediousness, refusing to accept a settled argument, someone can simply point me to a paper, a blog post or a book saying "here, this shows clearly that the replication crysis happened for this reason, not because of the opaqueness of frequentist methods". I am willing to update. I have done it in the past many times, I'm confident I can do this time too.

Here, all this "He is very intelligent! No, you are very intelligent!" is... sad.

Comment author: TheAncientGeek 03 July 2017 05:23:25PM 0 points [-]

LW is just parroting what others, more expert men already said

Who else has said that science could and should be wholesale replaced by Bayes?

Comment author: MrMind 04 July 2017 12:35:48PM 0 points [-]

No one?

Comment author: tristanm 30 June 2017 03:36:16PM *  0 points [-]

If I wanted to tell people what I thought they ought to do, I'd have written about decision theory instead. Depending on your decision theory, it might tell you to do something non Bayesian, because you might not have a Bayesian technique right in front of you, but maybe you have a good heuristic that you know from experience works well. All I'm saying is that, probably, your reasoning approximates Bayesian reasoning, even when the "methods" you are using don't look Bayesian. The way you model those methods as a whole probably does though.

Even if I were writing about decision theory, I don't really see why making an argument for a particular way of thinking is equivalent to "telling people what to do", though. Everything that gets written on Less wrong are either arguments or proposals, never commands. Eliezer isnt a statistician either, and yet here we are on his site dedicated to trying to figure out the right way to think. Besides that, I'm pretty sure there are tons of low hanging fruit in my essay that you could easily argue against, without going directly to argument from authority.

Comment author: IlyaShpitser 30 June 2017 04:14:42PM *  3 points [-]

I certainly agree with you that Eliezer isn't a statistician. I may disagree with you on the implications of this.


"All I'm saying is that, probably, your reasoning approximates Bayesian reasoning, even when the "methods" you are using don't look Bayesian."

If by "my reasoning" you mean me as a human using my brain, I don't really see in what sense this is true. I do lots of things with my brain that aren't Bayesian. If by "my reasoning" you mean stuff I do with data as a statistician, that's simply false. For example, stuff I do with influence functions has no Bayesian analogue at all.

edit: there is probably some way I could set up some semi-parametric influence function stuff in a Bayesian way -- I am not sure.

Comment author: Kaj_Sotala 04 July 2017 05:19:52PM 1 point [-]

I’ve noticed a recent trend towards skepticism of Bayesian principles and philosophy (see Nostalgebraist’s recent post for an example), which I have regarded with both surprise and a little bit of dismay, because I think progress within a community tends to be indicated by moving forward to new subjects and problems rather than a return to old ones that have already been extensively argued for and discussed.

Not sure if this is the best characterization: much of LW's stance towards Bayesianism always came from the Word of Eliezer, rather than through any thorough discussion and debate. I'd say that skepticism of Bayesianism within our community isn't really "returning to subjects that have already been extensively discussed", but rather as "subjecting foundational premises to the kind of criticism they need to undergo before people can trust them to be true, and before people really understand their extent and limitations".

Comment author: pcm 03 July 2017 10:46:04PM 1 point [-]

Tool-boxism implies that there is no underlying theory that describes the mechanisms of intelligence.

If I try to apply this to protein folding instead of intelligence, it sounds really strange.

Most people who make useful progress at protein folding appear to use a relatively tool-boxy approach. And they all appear to believe that quantum mechanics provides a very good theory of protein folding. Or it least it would be, given unbounded computing power.

Why is something similar not true for intelligence?

Comment author: ChristianKl 01 July 2017 03:54:47PM 1 point [-]

On LW we frequently invent new vocabulary in a way that's confusing for outsiders. It seems to me like "One Magisterium Bayesianism" is a new term that's not taken from anywhere and is likely relatively opaque.

Maybe it would make more sense to speak of Bayesian Monism?

Comment author: gjm 03 July 2017 12:54:15AM 1 point [-]

Doesn't "monism" pretty much mean belief in only one kind of thing rather than employing only one procedure for finding truth? I think calling the position described here "Bayesian Monism" would be actively misleading.

Comment author: ChristianKl 03 July 2017 01:42:47AM 0 points [-]

I think that tistanm doesn't just advocates Bayesianism as a method but advocates that reality is shaped in a way that it's basic nature is represented by probability.

Comment author: MrMind 03 July 2017 01:37:18PM 0 points [-]

that it's basic nature is represented by probability.

This would be the exact opposite of what Bayesianisms says (that is, probability is an optimal epistemic construction).

Comment author: gjm 03 July 2017 10:57:27AM 0 points [-]

No doubt, but that isn't the same as monism. You could have a world made of many kinds of stuff in which Bayesian inference is optimal, or a world made of one kind of stuff in which Bayesian inference produces terrible results.

Comment author: username2 30 June 2017 12:45:06AM 1 point [-]

One-Magisterium Bayes: A Defense of Tribalism

Comment author: tristanm 30 June 2017 12:57:35AM 4 points [-]

Did you read the post? It should be made pretty clear that I'm not advocating Bayesian fundamentalism (and I describe what I believe that means, and why it doesn't really square with actually being Bayesian).

Comment author: MrMind 03 July 2017 01:33:39PM *  0 points [-]

I side with you on this issue. It irks me all the time when the Bayesian foundations are vaguely criticized with an air of superiority, as if dismissing them is a sign of having transcended to some higher level of existence (neorationalists, I'm looking at you).
On the other hand, I could accept tool-boxing, in accordance to the principle of "one truth, many methods to find it" if and only if:

  • it effectively showed better results than the Bayesian methods
  • it wouldn't suddenly forget the pluri-decennial findings on the fallibility of human intuitions.

On the other hand:

Should you treat “evidence” for a hypothesis, or “data”, as having probability 1?

This is provably true: P(X|X) = 1.

P(X) = P(X /\ X) = P(X|X)P(X) <=> P(X|X) = 1.

Comment author: tristanm 03 July 2017 06:06:36PM 0 points [-]

That point was mostly referring to when you perform the "Bayesian update", the rule you use can be either strict conditionalization (P(H) = P(H|E)), which assumes P(E) = 1, or Jeffreys' conditionalization, (P(H) = P(H|E)P(E) + P(H|~E)P(~E)). The latter seems to be the most intuitively correct rule, but I guess there are some subtle issues with using that rule that I need to dive deeper into to really understand.

Comment author: MrMind 04 July 2017 12:40:53PM *  0 points [-]

The latter seems to be the most intuitively correct rule

So if I extract an red ball from an urn, should I condition the probability of finding a black ball in the next turn on not having extracted a red ball?

Besides, P(H) is most definitely not equal to P(H|E). P(H) is on the other hand demonstrably equal to P(H|E)P(E)+P(H|-E)P(-E), the usual decomposition of unity. I think we are talking about two completely different things here.

Comment author: tristanm 04 July 2017 01:52:22PM 0 points [-]

I'm talking about the following issue, found at this link:

A. The problem of uncertain evidence. The Simple Principle of Conditionalization requires that the acquisition of evidence be representable as changing one's degree of belief in a statement E to one — that is, to certainty. But many philosophers would object to assigning probability of one to any contingent statement, even an evidential statement, because, for example, it is well-known that scientists sometimes give up previously accepted evidence. Jeffrey has proposed a generalization of the Principle of Conditionalization that yields that principle as a special case. Jeffrey's idea is that what is crucial about observation is not that it yields certainty, but that it generates a non-inferential change in the probability of an evidential statement E and its negation ~E (assumed to be the locus of all the non-inferential changes in probability) from initial probabilities between zero and one to Pf(E) and Pf(~E) = [1 − Pf(E)]. Then on Jeffrey's account, after the observation, the rational degree of belief to place in an hypothesis H would be given by the following principle:

Principle of Jeffrey Conditionalization: Pf(H) = Pi(H/E) × Pf(E) + Pi(H/~E) × Pf(~E) [where E and H are both assumed to have prior probabilities between zero and one] Counting in favor of Jeffrey's Principle is its theoretical elegance. Counting against it is the practical problem that it requires that one be able to completely specify the direct non-inferential effects of an observation, something it is doubtful that anyone has ever done. Skyrms has given it a Dutch Book defense.

Comment author: ChristianKl 01 July 2017 11:13:40AM 0 points [-]

It feels to me like you argue from time to time against strawmen:

For axioms, Cox’s theorem merely requires you to accept Boolean algebra and Calculus to be true, and then you can derive probability theory as extended logic from that.

While probability extends basic logic it doesn't extended advanced logic (predicate calculus) as David Chapman argues in Probability theory does not extend logic.

I’m skeptical about this, and the reason I’m skeptical is because if you really had a method for say, hypothesis generation, this would actually imply logical omniscience, and would basically allow us to create full AGI, RIGHT NOW.

This seems to confuse the idea of having a useful method for hypothesis generation with having a perfect method for hypothesis generation.

As far as I know, being able to do this would imply that P = NP is true, and as far as I know, most computer scientists do not think it’s likely to be true

Saying that you have one unified theory that can give you the correct hypothesis in every case without looking at all alternatives might violate P = NP. On the other hand P = NP doesn't mean that there aren't subproblems in which there's an algorithm for finding a perfect or even good hypothesis.

If P ≠ NP that supports the tool box paradigm. Different tools will perform well for generating hypothesis in different domains and there's no perfect unified theory.

Is the inability of Bayesian theory to provide a method for providing the correct hypothesis evidence that we can’t use it to analyze and update our own beliefs?

It's not required for arguing that tool box thinking is better to argue that it's not possible to analyse and update beliefs with Bayesian thinking.

Comment author: MrMind 03 July 2017 01:09:57PM *  0 points [-]

While probability extends basic logic it doesn't extended advanced logic (predicate calculus) as David Chapman argues in Probability theory does not extend logic.

I'm not convonced that probability cannot be made to extend to predicate calculus. You need to interpret "for every" and "exists" as transfinite "and" and "or", but they are not some other abstruse ingredients impossible to fit.

Comment author: ChristianKl 03 July 2017 04:05:38PM 0 points [-]

As far as Chapman describes the situations various mathematicians have put a lot of effort into trying to made a system that extends probability from predicate calculus but no one succeeded in creating a coherent system.

There are two ways to disagree with that: 1) Point to a mathematician who actually successfully modeled the extension. 2) Say that no mathematician really tried to do that.

Comment author: MrMind 03 July 2017 04:33:46PM *  0 points [-]

Say that no mathematician really tried to do that.

I tend to lean on this. There has been work to fix and strenghten Cox's theorem, as also to extend probability to arbitrary preorders or other categories. I've yet to see someone try to extend probability to, say, intuitionistic or modal logic.

Comment author: tristanm 02 July 2017 05:37:34AM 0 points [-]

There are two common types of strawmen arguments that I've encountered within this debate.

One is the strawman argument that Bayesians typically give against frequentists, where they show how a particular frequentist test gives the wrong answer on a particular problem, but a straightforward application of Bayes theorem gives the right answer. Frequentists easily counter that a wiser frequentist would have used a different test for this problem that gives the right answer.

The other strawman argument is the one anti-Bayesians make, where they chastise Bayesians for claiming they have the complete theory of rationality / epistemology and no more work needs to be done. This is obviously false, since no Bayesian has ever claimed this, not even Jaynes. A complete theory would need ways to represent hypotheses, and ways to generate them, and the axioms of probability do not make any additional assumptions about what a hypothesis is.

I'm still looking for a well posed inference problem, where a straightforward application of Bayesian principles gives the wrong answer, but a straightforward application of a different set of principles gets the right answer.

Comment author: ChristianKl 02 July 2017 06:10:19PM 0 points [-]

This seems a bit motte-and-bailey. In your post, you argue for Bayesianism as a theory of reasoning. Of course you can say that problems that you can't solve well with Bayesianism aren't well posed inference problems. Unfortunately, nature doesn't care about posing well posed inference problems.

Even if Bayesianism is better for a small subject of reasoning problems that doesn't imply that it's good to reject tool-boxism.

Comment author: TheAncientGeek 03 July 2017 08:04:17AM 0 points [-]

Yep. If Bayes only does one thing. you need other tools to do the other jobs. Which, by the way, implies nothing about converging, or not, on truth.

Comment author: TheAncientGeek 02 July 2017 08:30:46AM *  0 points [-]

Bayesian has more than on or meaning.

What you have there is a defence of the Jaynesian variety, but Yudkowsky is making much stronger claims. For instance he thinks Bayes can replace science, but you can't replace science with inference alone.

Also, if Bayes is inference alone, it can't be the sole basis of intelligence.

Comment author: tadasdatys 01 July 2017 08:03:29AM 0 points [-]

if you really had a method for say, hypothesis generation, this would actually imply logical omniscience, and would basically allow us to create full AGI, RIGHT NOW.

This is correct. Arguments against Bayesianism ultimately boil down to "it's not enough for AGI". And they are stupid, because nobody has ever said that it was. But then arguments in favor of Bayesianism boil down to "it's True". And they are stupid, because "True" is not quite the same as "useful". I think this whole debate is pointless as there is very little the two sides disagree with, besides some wordings.

Having said that, I think the question "how to reason well" should be seen as equivalent to "how to build an AGI", which probably places me on the anit-Bayesian side.

Comment author: MrMind 03 July 2017 01:45:25PM 0 points [-]

Having said that, I think the question "how to reason well" should be seen as equivalent to "how to build an AGI", which probably places me on the anit-Bayesian side.

There are some equivalences, but there are also some differences. We already know some uncomputable methods that are optimal between all computable inference, and they are perfectly Bayesian.
But when it comes down to computable optimality, we are in high waters.

Comment author: TheAncientGeek 30 June 2017 02:28:14PM *  0 points [-]

return to old ones that have already been extensively argued for and discussed.

It has been extensively discussed, but a lot of people still think "Bayes is the One Epistemology to Rule them All" is the correct conclusion.

If there is no unified theory of intelligence, we are led towards the view that recursive self-improvement is not possible, since an increase in one type of intelligence does not necessarily lead to an improvement in a different type of intelligence.

Shouldn't you want to believe what is true, not what leads to some arbitrary end-point?

With a diversification in different notions of correct reasoning within different domains, it heavily limits what can be done to reach agreement on different topics. In the end we are often forced to agree to disagree, which while preserving social cohesion in different contexts, can be quite unsatisfying from a philosophical standpoint.

Same problem. If it is actually the case that there is more than one method of reasoning, why pretend otherwise?

Related to the previous corollary, it may lead to beliefs that are sacred, untouchable, or based on intuition, feeling, or difficult to articulate concepts.

So can Bayes. Just set your priors so high that you will never accumulate significant contradictory evidence durign your lifetime.

Often times, this guiding principle is based on intuition, which is a remarkably hard thing to pin down and describe well

Everything is based on intuition, ultimately. (eg your "Qualitative correspondence with common sense.").

Bayesian methods are often computationally intractable.

That's minor????

I would add that there are plenty of ways to generate hypotheses by other methods

There is a very simple argument: if you need to supplement Bayes with a method of hypothesis generation then you are no longer using Bayes alone and you are therefore not even using Strong Bayes (NB:: strong) yourself.

E.T. Jaynes never considered to this be a flaw in Bayesian probability, per se. Rather, he considered hypothesis generation, as well as assigning priors, to be outside the scope of “plausible inference” which is what he considered to be the domain of Bayesian probability. He himself argued for using the principle of maximum entropy for creating a prior distribution, and there are also more modern techniques such as Empirical Bayes.

That just means Jaynes did not believe in Bayes is the One Epistemology to Rule them All. So why do you?

Comment author: MrMind 03 July 2017 02:03:37PM 0 points [-]

Everything is based on intuition, ultimately. (eg your "Qualitative correspondence with common sense.").

That means, in the context of Cox's theorem, a very specific set of primitive intuitions about plausibilities, not that everything is reduced to how one feels about something.

Comment author: TheAncientGeek 03 July 2017 05:11:02PM 0 points [-]

Yes and yes and no. Yes, because everything, not just Coxs theorem, is based on some foundational assumption, however much you try to eliminate unjustified propositions from your epistemology. Yes because having an epistemology with a few bedrock assumptions is not the same as deciding every damn thing with gut feelings. No, because the plausibility, to you, of a plausible assumption that you cannot otherwise justify is not that different to a feeling.

Comment author: MrMind 04 July 2017 12:46:37PM 0 points [-]

No, because the plausibility, to you, of a plausible assumption that you cannot otherwise justify is not that different to a feeling.

Well, that is true for any kind of axiom. In any case, it's a finite set of simple intuitions that define the content of "common sense" for Cox's theorem, so that if two people disagree, they can point exactly to the formulas they disagree on.

Comment author: TheAncientGeek 04 July 2017 03:00:54PM 0 points [-]

Well, that is true for any kind of axiom.

That's rather the point. It saves time to assume something like that from the outset.

they can point exactly to the formulas they disagree on.

Which may lead to agreeing to disagree rather than convergence.

Comment author: MrMind 05 July 2017 07:36:18AM 0 points [-]

I've rarely seen disagreement on basic axioms. p -> p seems to be rather uncontroversial, although it's based on 'intuition'.
On the other hand, that's the purpose of deduction: reduce the need of intuition only to the smallest and least controversial set of assertions. This does not imply, as your original formulation seems to, that then intuition can be used for everything.

Comment author: TheAncientGeek 05 July 2017 08:24:17AM *  0 points [-]

his does not imply, as your original formulation seems to, that then intuition can be used for everything.

I never said anything of the kind. My point was that intuition is unavoidably involved in everything.

I've rarely seen disagreement on basic axioms.

Then check out the controversy over Euclid's fifth postulate, mathematical intuitionism, the Axiom of Choice, whether existence is a predicate, etc, etc.

. p -> p seems to be rather uncontroversial, although it's based on 'intuition'.

Some would say that it's based on truth tables,and defies intuition!

See the logical versus material implication controversy:-

http://www.askphilosophers.org/question/4103

Comment author: MrMind 06 July 2017 09:46:35AM 0 points [-]

I never said anything of the kind. My point was that intuition is unavoidably involved in everything.

On this I think we agree. I'll just add that sometimes "intuitions" points to "a short mental calculation" and some other times to "a biased heuristic". The fact that we don't have access to which is which is the danger of accepting sentences like "intuition is the basis of everything".
I would rather prefer two different words for the two different kind of intuitions, but there aren't.

Then check out the controversy over Euclid's fifth postulate, mathematical intuitionism [...]

Yes, but there are also never been controversy on the first postulate... Some axioms are more basic than others. And indeed challenged axioms produce strong revolutions.

Some would say that it's based on truth tables,and defies intuition!

This I don't know how to interpret. Truth table are useful as long as they agree on the axioms. Or one could say that truth tables are based on intuition...

Comment author: TheAncientGeek 06 July 2017 10:49:03AM *  0 points [-]

On this I think we agree. I'll just add that sometimes "intuitions" points to "a short mental calculation" and some other times to "a biased heuristic".

It can mean either of those, but it can also mean an assumption you can neither prove nor do without.;

Some axioms are more basic than others.

If what you want is convergence on objective truth, it is the existence of axioms that people don't agree on that is the problem.

And indeed challenged axioms produce strong revolutions.

And pluralism. Intuitonistic and classical maths co-existing, Euclidean and non-Euclidean geometry co-exisitng.

. Truth table are useful as long as they agree on the axioms. Or one could say that truth tables are based on intuition...

Truth tables give you a set of logical functions, some of which resemble traditional logical connectives, such as "and" and "implies" to some extent. But only to some extent. The worry is that they don't capture all the features of ordinary langauge usage.

Comment author: turchin 30 June 2017 02:07:53PM *  0 points [-]

OP said: "If there is no unified theory of intelligence, we are led towards the view that recursive self-improvement is not possible, since an increase in one type of intelligence does not necessarily lead to an improvement in a different type of intelligence."

I think that some forms of self-improvement (SI) could be done without recursivety. I created a list of around 30 types of SI, starting from accelerating hardware and up to creating better plans. Most of them are not naturally recursive.

If SI will produce limited 2 times improvement on each level, and will not use recusivity option, it still enough to create 2 power 30 improvement of the system, or around 1 billion times improvement.

(Below some back of envelop Fermi like estimation, so numbers are rather random, and are given just to illustrate the idea.)

It means that near-human level AI could reach the power of around 1 billion humans without the use of recursivity option. Power of 1 billion is probably more than total power of all human science, where around 50 million researchers work.

Such AI would outperform human science 20 times, and could be counted as superintelligence. Surely, with power is more than enough to kill everybody - or to solve most of important humanity problems.

Such self- improvement is reachable without the use of the understanding of the nature of intelligence and doesn't depend on the assumption that such understanding is needed for SI. So we can't use the agrument about messiness of intelligence as agument for AI safety.

Comment author: MrMind 03 July 2017 02:00:41PM 0 points [-]

I think that "recursive self-improvement" means that one improvement leads the AGI to be better at improving itself, not that it must use the same trick every time.
If accelerating hardware allows for better improvements over other dimensions, then better hardware is still part of recursive improvement.

Comment author: turchin 04 July 2017 10:27:44AM *  1 point [-]

Sure, but it is not easy to prove in each case. For example, if an AI increases its hardware speed two times, and buys two times more hardware, its total productivity would grow 4 times. But we can't say that the first improvement was done because of the second.

However, if it got an idea that improving hardware is useful, it is a recursive act, as this idea helps further improvements. Moreover, it opens the field of other ideas, like the improvement of improvement. That is why I say that true recursivity is happening on ideas level, but not on the hardware level.

Comment author: whpearson 04 July 2017 04:39:53PM 0 points [-]

As the resident lesswrong "Won't someone think of the hardware" person this comment rubs me up the wrong way a fair bit.

First there is not a well defined thing as hardware speed. Hardware speed might refer to various things clock speed, operations per second, memory bandwidth, memory response times. Depending on what your task is, your productivity might be bottle necked by one of these things and not the other. Some things like memory response times are due to the speed of signals traversing the mother board and are hard to improve while we still have the separation of memory and processing power.

Getting twice the hardware might less than twice the improvement. If there is some serial process then amdhal's law comes into effect. If the different nodes need to make sure they have a consistent view of something you need to add latency so that sufficient numbers of them can have a good state of the data with a consensus algorithm.

Your productivity might be bottle necked by external factors not processing power at all (not getting data fast enough). This is my main beef about the sped up people thought experiment. The world is moving glacially for them and data is coming in at a trickle.

If you are searching a space, and you add more compute you might be searching less promising areas with the new compute, so you might not get twice the productivity.

I really would not expect twice the compute to lead to twice the productivity except in the most embarrassingly parallel situation like computing hashes.

I think your greater point is weakened, but not by much. We have lots of problems trying to distribute and work on problems together, so human intelligence is not purely additive either.

Comment author: turchin 04 July 2017 07:45:50PM 0 points [-]

Thanks for elaborating, I agree that accelerating hardware twice will not actually produce twice intelligence. I used this oversimplified example of hardware acceleration as an example of non-recursive self-improvment, and diminishing returns only underlines its non-recursive nature.