Help with Bayesian priors

WikiLogicOrg

2 min read

8 Help with Bayesian priors

by WikiLogicOrg

14th Aug 2016

2 min read

8

I posted before about an open source decision making web site I am working on called WikiLogic. The site has a 2 minute explanatory animation if you are interested. I wont repeat myself but the tl;dr is that it will follow the Wikipedia model of allowing everyone to collaborate on a giant connected database of arguments where previously established claims can be used as supporting evidence for new claims.

The raw deduction element of it works fine and would be great in a perfect world where such a thing as absolute truths existed, however in reality we normally have to deal with claims that are just the most probable. My program allows opposing claims to be connected and then evidence to be gathered for each. The evidence will create a probability of it being correct and which ever is highest, gets marked as best answer. Principles such as Occams Razor are applied automatically as long list of claims used as evidence will be less likely as each claim will have its own likelihood which will dilute its strength.

However, my only qualification in this area is my passion and I am hitting a wall with some basic questions. I am not sure if this is the correct place to get help with these. If not, please direct me somewhere else and I will remove the post.

The arbitrarily chosen example claim I am working with is whether “Alexander the Great existed”. This has the useful properties of 1: an expected outcome (that he existed - although, perhaps my problem is that this is not the case!) and 2: it relies heavily on probability as there is little solid evidence.

One popular claim is that coins were minted with his face on them. I want to use Bayes to find how likely a face appearing on a coin is for someone who existed. As I understand it, there should be 4 combinations:

Existed; Had a coin minted
Existed; Did not have a coin minted
No Existed; Had a coin minted
No Existed; Did not have a coin minted

The first issue is that there are infinite people who never existed and did not have a coin made. If I narrow it to historic figures who turned out not to exist and did not have a coin made it becomes possible but also becomes subjective as to whether someone actually thought they existed. For example, did people believe the Minotaur existed?

Perhaps I should choose another filter instead of historic figure, like humans that existed. But picking and choosing the category is again so subjective. Someone may also argue that woman inequality back then was so great that the data should only look at men, as a woman’s chance of being portrayed on a coin was skewed in a way that isn’t applicable to men.

I hope i have successfully communicated the problem i am grappling with and what i want to use it for. If not, please ask for clarifications. A friend in academia suggested that this touches on a problem with Bayes priors that has not been settled. If that is the case, is there any suggested resources for a novice with limited free time, to start to explore the issue? References to books or other online resources or even somewhere else I should be posting this kind of question would all be gratefully received. Not to mention a direct answer in the comments!

New to LessWrong?

8

New Comment

26 comments, sorted by

top scoring

Click to highlight new comments since: Today at 4:34 PM

[-]MrMind9y20

Let:
E = Alexander the Great actually existed
C = A coin is minted with the face of Alexander the Great
~A = not A

You want to know P(E|C), which by Bayes is

P(C|E)P(E)/P(C)

which by partition of unity is

P(C|E)P(E) / (P(C|E)P(E) + P(C|~E)P(~E))

So you need only to assess 1 (existed, had a coin) and 3 (not existed, had a coin), where P(E) is your prior probability that Alexander existed. There's no need to do complicated categorical reasoning.

I think that P(C|E) is pretty close to 1, I would suspect that any great emperor worth his salt would have minted a coin with his head on it.
So the crucial point becomes P(C|~E), as it should be: what is the probability that a coin would be minted with a mythological figurehead? That is what is to be analyzed and that is what will drive the probability of his existence.
Let's say that we know that it's definitely possible that Alexander the Great existed, and it's also definitely possible that he did not exist, but we know nothing else (that is, coins are the first evidence we want to introduce). This would put P(E) = P(~E) = 1/2. Leaving P(C|~E) unknown, we get:

P(E|C) = P(C|E)P(E) / (P(C|E)P(E) + P(C|~E)P(~E)) =>
P(E|C) = .5 / .5 + x .5 =>
P(E|C) = 1 / (1+ x)

All this is coherent with common sense: if the ancients were happily minting mythic figures all the times (x close to 1), this means that P(E|C) = .5 = P(E), and coins cannot be used as an argument for Alexander's existence, while on the other hand if almost never happened (x close to 0), then a coin becomes a sure sign of his historicity, with intermediate values (say x = .3) increasing the odds by a small or a great percentage.
Notice one thing of my model: I've assumed that P(C|E) is close to 1, and this means that a coin can only increase the probability of Alexander existing, however slightly, but never decrease it. This wouldn't have been true if there was a possibility that a great emperor did not had a coin minted with his face, which would be your case n° 2.

[-]Gram_Stone9y20

I'm pretty sure that you're actually asking some deep questions right now.

I'm not all that well-versed in epistemology or probability theory, but when you write:

A friend in academia suggested that this touches on a problem with Bayes priors that has not been settled.

I think this is a reference to the problem of priors.

I think 'a problem with Bayes priors that has not been settled' kind of understates the significance.

And:

The first issue is that there are infinite people who never existed and did not have a coin made. If I narrow it to historic figures who turned out not to exist and did not have a coin made it becomes possible but also becomes subjective as to whether someone actually thought they existed. For example, did people believe the Minotaur existed? Perhaps I should choose another filter instead of historic figure, like humans that existed. But picking and choosing the category is again so subjective. Someone may also argue that woman inequality back then was so great that the data should only look at men, as a woman’s chance of being portrayed on a coin was skewed in a way that isn’t applicable to men.

I believe this is referred to as the reference class problem. It seems that in a Bayesian framework, the reference class problem is something of a subproblem of the problem of priors. It seems that you're only trying to define a suitable reference class in the first place because you're trying to produce a reasonable prior.

It's my understanding that one approach to the problem of priors has been to come up with a 'universal' prior, a prior which is reasonable to adopt before making any observations of the world. One example is Solomonoff's algorithmic probability. It seems however than even this may not be a satisfactory solution, because this prior defies our intuition in some ways. For example, humans might find it intuitive to assign nonzero prior probability to uncomputable hypotheses (e.g. our physics involves hypercomputation), but algorithmic probability only assigns nonzero probability to computable hypotheses, an agent with this prior will never be able to have credence in uncomputable hypotheses. Another problem is that, with this prior, hypotheses are penalized for their complexity, but utility can grow far more quickly than complexity. Increasing the number of happy people in a program from 1000 people to 1,000,000 people seems to increase its utility a lot without increasing its complexity by much. Taking this up to larger and larger numbers that become difficult to intuitively comprehend, it may be that such a prior would result in agents whose decision-making is dominated by very improbable outcomes with very high utilities. We can also ask if this apparently absurd result is a point against Solomonoff induction or if it's a point against how humans think, but if we humans are thinking the right way, we still don't know what it is that's going right inside of our heads and how it compares to Solomonoff induction.

For any other readers, sorry if I'm mistaken on any of this, it is quite technical and I haven't really studied it. Do correct me if I've made a mistake.

Back to my point, I think that you accidentally hit upon a problem that doesn't seem to take too many prerequisites to initially run into, and that, after a bit of squinting, turns out to be way harder than it seems it should be at first glance, given the small number of prerequisites necessary to realize that the problem exists. Personally, I would consider the task of providing a complete answer to your questions an open research problem.

Stanford Encyclopedia of Philosophy's entry on Bayesian epistemology might be some good reading for this.

[-]WikiLogicOrg9y00

Thanks for taking the time to write all that for me. This is exactly the nudge in the right direction i was looking for. I will need at least the next few months to cover all this and all the further Google searches it sends me down. Perfect, thanks again!

[-]ChristianKl9y20

I think reading Tedlocks book about Superforcasting could help you because it talks about how his Superforcasters made their decisions and outperformed the CIA analysts.

References to books or other online resources or even somewhere else I should be posting this kind of question would all be gratefully received.

How about http://stats.stackexchange.com/?

[-]WikiLogicOrg9y00

Thanks for the suggestion. Added to reading list and commented on the stats site.

[-]entirelyuseless9y20

With the Alexander the Great issue, you not only have the question of the probability that someone would have a coin minted and so on, but also the probability that he actually did have a coin minted with his face on it, given that someone says he did. And you cannot assign that to 100% even if you have seen the coin yourself, because it may not have been intended to be the face of Alexander the Great, but someone else.

And basically all of those probabilities are subjective, and cannot be made objective. So your project might work for an individual to map out their own beliefs, allowing them to assign subjective probabilities to things they consider priors. But it cannot work for a community, because people will disagree with the priors you assign, and so there will be no reason for them to agree with the probabilities your program comes up with.

[-]WikiLogicOrg9y00

Sure, but why will they disagree? If I say there is 60% chance of x and you say no it is more like 70% then i can ask you why you think its 10% more likely. I know many will say "its just a feeling" but what gives that feeling? If you ask enough questions, i am confident one can drill down to the reasoning behind the feeling of discomfort at a given estimate. Another benefit of WL is it should help people get better at recognizing and understanding their subconscious feelings so they can be properly evaluated and corrected. If you do not agree, it would be really interesting to hear your thoughts on this. Thanks

[-]TheAncientGeek9y30

Sure, but why will they disagree?

From the correct perspective, it is more extraordinary that anyone agrees.

If I say there is 60% chance of x and you say no it is more like 70% then i can ask you why you think its 10% more likely. I know many will say "its just a feeling" but what gives that feeling? If you ask enough questions, i am confident one can drill down to the reasoning behind the feeling of discomfort at a given estimate.

Yes but that is not where the problems stop, it is where they get really bad. Object level disagreements can maybe be solved by people who agree on an epistemology. But people aren't in complete agreement about epistemology. And there is no agreed meta epistemology to solve epistemological disputes..that's done with same epistemology as before. And that circularity means we should expect people to inhabit isolated, self sufficient philosophical systems.

benefit of WL is it should help people get better at recognizing and understanding their subconscious feelings so they can be properly evaluated and corrected.

Corrected by whose definition of correct? Do you not see that you are assuming you will suddenly be able to solve the foundational problems that philosophers have been wrestling with for millennia.

,If you do not agree, it would be really interesting to hear your thoughts on this. Thanks

[-]WikiLogicOrg9y00

From the correct perspective, it is more extraordinary that anyone agrees.

Correct by whose definition? In a consistent reality that is possible to make sense of, one would expect evolved beings to start coming to the same conclusions.

Corrected by whose definition of correct?

From this question i assume you are getting at our inability to know things and the idea that what is correct for one, may not be for another. That is a big discussion but let me say that i premise this on the idea that a true skeptic realizes we can not know anything for sure and that is a great base to start building our knowledge of the world from. That vastly simplifies the world and allows us to build it up again from some very basic axioms. If it is the case that your reality is fundamentally different from mine, we should learn this as we go. Remember that there is actually only one reality - that of the viewers.

Do you not see that you are assuming you will suddenly be able to solve the foundational problems that philosophers have been wrestling with for millennia.

There were many issues wrestled with for millennia that were suddenly solved. Why should this be any different? You could ask me the opposite question of course but that attitude is not the one taken by any human who ever discover something worth while. Our chances of success may be tiny but they are better than zero, which is what they would be if no one tries. Ugh... i feel like i am writing inspirational greeting card quotes but the point still stands!

Object level disagreements can maybe be solved by people who agree on an epistemology. But people aren't in complete agreement about epistemology. And there is no agreed meta epistemology to solve epistemological disputes..that's done with same epistemology as before. Is there any resources you would recommend for me as a beginner to learn about the different views or better yet, a comparison of all of them?

[-]TheAncientGeek9y00

Correct by whose definition? In a consistent reality that is possible to make sense of, one would expect evolved beings to start coming to the same conclusions.

I wouldn't necessarily expect that for the reasons given. You have given contrary opinion, not a counter argument.

From this question i assume you are getting at our inability to know things and the idea that what is correct for one, may not be for another. That is a big discussion but let me say that i premise this on the idea that a true skeptic realizes we can not know anything for sure and that is a great base to start building our knowledge of the world from.

I don't see how it addresses the circularity problem.

That vastly simplifies the world and allows us to build it up again from some very basic axioms. If

Or that. Is everyone going to be on the same axioms?

It is the case that your reality is fundamentally different from mine, we should learn this as we go. Remember that there is actually only one reality - that of the viewers.

The existence of a single reality isn't enough to guarantee convergence of beliefs for the reasons given.

Do you not see that you are assuming you will suddenly be able to solve the foundational problems that philosophers have been wrestling with for millennia.

There were many issues wrestled with for millennia that were suddenly solved. Why should this be any different?

That doesn't make sense. The fact that something was settled eventually doesn't mean that you probably problems are going to be settled at a time convenient for you.

I could ask me the opposite question of course but that attitude is not the one taken by any human who ever discover something worth while. Our chances of success may be tiny but they are better than zero, which is what they would be if no one tries. Ugh... i feel like i am writing inspirational greeting card quotes but the point still stands!

Yes I feel that you are talking in vague but positive generalities.

[-]WikiLogicOrg9y00

Yes I feel that you are talking in vague but positive generalities.

First, on a side note, what do you mean by "but positive"? As in idealistic? Excuse my vagueness. I think it comes from trying to cover too much at once. I am going to pick on a fundamental idea i have and see your response because if you update my opinion on this, it will cover much of the other issues you raised.

I wrote a small post (www.wikilogicfoundation.org/351-2/) on what i view as the starting point for building knowledge. In summary it says our only knowledge is that of our thought and the inputs that influence them. It is on a similar vein to "I think therefore i am" (although, maybe it should be "thoughts, therefore thoughts are" to keep the pedantics happy) . I did not mention it in the article but if we try and break it down like this, we can see that our only purpose is to satisfy our urges. For example, if we experience a God telling us we should worship them and be 'good' to be rewarded, we have no reason to do this unless we want to satisfy our urge to be rewarded. So no matter our believes, we all have the same core drive - to satisfy our internal demands. The next question is whether these are best satisfied cooperatively or competitively. However i imagine you have a lot of objections thus far so i will stop to see what you have to say about that. Feel free to link me to anything relevant explaining alternate points of view if you think a post will take too long.

[-]TheAncientGeek9y00

What I mean by "vague but positive" is that you keep saying there is no problem, but not saying why.

I wrote a small post (www.wikilogicfoundation.org/351-2/) on what i view as the starting point for building knowledge. In summary it says our only knowledge is that of our thought and the inputs that influence them.

That's a standard starting point. I am not seeing anything that dissolves the standard problems.

So no matter our believes, we all have the same core drive - to satisfy our internal demands.

We all have the same meta-desire, whilst having completely different object level desires. How is that helping?

[-]entirelyuseless9y00

I don't agree. If you're right, we can do it right here and now, since we do not agree, which means that we are giving different probabilities of your project working -- in particular, I say the probability of your project being successful is very close to zero. You presumably think it has some reasonable probability.

I think the probability is close to zero because trying to "drill down" to force agreement between people results in fights, not in agreement. But to address your argument directly, each person will end up saying "it is just a feeling" or some equivalent, in other words they will each support their own position by reasons which are effective for them but not for the other person. You could argue that in this case they should each adopt a mean value for the probability, or something like that, but neither will do so.

And since I have given my answer, why do you think there is a reasonable probability that your project will succeed?

[-]WikiLogicOrg9y00

I think the probability is close to zero because trying to "drill down" to force agreement between people results in fights, not in agreement.

We are not in agreement here! Do you think its possible to discuss this and have one or both of us change our initial stance or will that attempt merely result in a fight? Note, i am sure it is possible to result in a fight but i do not think its a forgone conclusion. On the contrary, i think most worthwhile points of view were formed by hearing one or more opposing views on the topic.

they will each support their own position by reasons which are effective for them but not for the other person

Why must that be the case? On a shallow level it may seem so but i think if you delve deep, you can find a best case solution. Can you give an example where two people must fundamentally disagree? I suspect any example you come up with will have a "lower level" solution where they will find it is not in their best interest. I recognize that the hidden premise on my thinking that agreement is always possible, stems from the idea that we are all trying to reach a certain goal and a true(er) map of reality helps us get there and cooperation is the best long term strategy.

[-]entirelyuseless9y00

I agree that we are not in agreement. And I do think that if we continue to respond to each other indefinitely, or until we agree, it will probably result in a fight. I admit that is not guaranteed, and there have been times when people that I disagree with changed their minds, and times when I did, and times when both of us did. But those cases have been in the minority.

"We are all trying to reach a certain goal and a truer map of reality helps us get there..." The problem is that people are interested in different goals and a truer map of reality is not always helpful, depending on the goal. For example, most of the people I know in real life accept false religious doctrines. One of their main goals is fitting in with the other people who accept those doctrines. Accepting a truer map of reality would not contribute to that goal, but would hinder it. I want the truth for its own sake, so I do not accept those doctrines. But they cannot agree with me, because they are interested in a different goal, and their goal would not be helped by the truth, but hindered.

[-]WikiLogicOrg9y00

I find it is more likely that the times it degenerates into a fight is due to the lack of ability on one of the debaters. The alternative is to believe that people like ourselves are somehow special. It is anecdotal but I used to be incredibly stubborn until i met some good teachers and mentors. Now i think the burden of proof lies on the claim that, despite our apparent similarities, a large portion of humans are incapable of being reasoned with no matter how good the teacher or delivery. Of course i expect some people physically cannot reason due to brain damage or whatever. But these are a far smaller group than what i imagine you are suggesting.

I would claim their main goal is not fitting in but achieving happiness which they do by fitting in (albeit this may not be the most optimum path). And i claim this is your goal as well. If you can accept that premise, we again have to ask if you are special in some way for valuing the truth so highly? Do you not aim to be happy? I think you and i also have the same core goal we just realize that its easier to navigate to happiness with a map that closely matches reality. Everybody benefits from a good map. That is why a good teacher can convert bull headed people like i used to be by starting with providing tools for mapping reality such education in fallacies and biases. When packaged in an easy to digest manner, tools that help improve reality maps are so useful that very few will reject them just like very few people reject how to add and subtract.

[-]ChristianKl9y10

It is anecdotal but I used to be incredibly stubborn until i met some good teachers and mentors.

I guess when you say stubborn you mean that you tried to be independent and didn't listen to other people. That's not the issue with the person who's religious because most of his friends are religious.

Now i think the burden of proof lies on the claim that, despite our apparent similarities, a large portion of humans are incapable of being reasoned with no matter how good the teacher or delivery.

A good teacher who teaches well can get a lot of people to adopt a specific belief but that doesn't necessarily mean that the students get the belief through "reasoning". If the teacher would teach a different belief on the concept he would also get that accross.

Now i think the burden of proof lies on the claim that, despite our apparent similarities, a large portion of humans are incapable of being reasoned with no matter how good the teacher or delivery.

What evidence do you have that education in fallicies or biases helps people think better? There seem to be many people who want to believe that's true but as far as I know the decision science literature doesn't consider that belief to be true.

[-]entirelyuseless9y00

You seem to be proposing a simplistic theory of goals, much like the simplistic theory of goals that leads Eliezer to the mistaken conclusion that AI will want to take over the world.

In particular, happiness is not one unified thing that everyone is aiming at, that is the same for them and me. If I admit that I do what I do in order to be happy, then a big part of that happiness would be "knowing the truth," while for them, that would be only a small part, or no part at all (although perhaps "claiming to possess the truth" would be a part of it for them -- but it is really not the same to value claiming to possess the truth, and to value the truth.)

Additionally, using "happiness" as it is typically used, I am in fact less happy on account of valuing the truth more, and there is no guarantee that this will ever be otherwise.

[-]DanArmak9y10

I'm late to this debate; it's been well analyzed on the meta level but I want to add something on the object level. The question of 'were coins of Alexander minted?', which you want as your input data, sounds like a very simple one, but it may not be. The analysis of ancient coins in large part assumes a known history and dating.

The coins called 'Alexander coins' today didn't carry the image of Alexander, but of Herakles. They had Alexander's name on them, and sometimes the title 'King', but that's just two words; not strong evidence of who exactly the coin-minters thought Alexander was and what he was supposed to have accomplished. So you have to separate 'some Greek King named Alexander existed' at one extreme, from 'he did what's commonly ascribed to him and is the person we think of as Alexander the Great' at the other.
These coins were minted in many countries for hundreds of years, becoming something of a common coinage. People probably kept minting them because they were a standard coin, not because they really wanted to honor Alexander 300 years after his death. So the existence of most coins isn't independent evidence.
When comparing coins minted of other people, you need to factor in the the different amounts of coins minted due to different technology and economy, the differential survival of the coins over time, and the probability of us finding them.
How do we identify a coin as being Alexandrian? Usually by matching it to other coins we've already identified. How do we identify the first Alexandrian coin we've ever seen? By noticing the name Alexander on it. What's the probability coins were minted of Alexander, perhaps bearing his likeness (and not that of Herakles), but without his name? Putting the name of a king on a coin (whether or not he was pictured) became common (but not universal) somewhat after Alexander.
There were five kings of ancient Macedonia named Alexander, the third of them being the Great. Not to mention some non-Macedonian Alexanders. Coins with the name Alexander don't specify which one is meant, or even 'of Macedonia'. (A minority was actually minted in Macedonia, but if lots of other countries were minting coins of a foreign monarch, who's to say Macedonia didn't as well?) So the dating of the coins has to be precise enough (to within 50 years or so), which can be really hard (by the time you've found a coin who knows where it's been?)

Clarification: I don't seriously doubt the historicity of Alexander, but I'm also not versed in the subject, so I'm just going by expert opinion. My point is more that coins are really complex and/or weak as a source of evidence, and handling P(Alexander existed|Coins of Alexander) is really hard.

[-]WikiLogicOrg9y00

The issues you raised are interesting but actually make this a pretty good example of my problem - how do you account for weak evidence and assign it a proper likelihood. One way i am testing this is by taking an example which i think is agreed to be 'most likely' (that he existed as opposed to not existing). Then i want to work backwards and see if we there is a method for assessing probability that seems to work well on small scale questions, like probability's of minted coins and give me the expected answer when i add it all together.

At this point i am still trying to work out the objective priors issue. The method either needs to be immediately agreeable by all potential critics or have an open and fair way of arguing over how to formulate the answer. When i work that out i will move to the next stages although no guarantee i keep using the Alexander example.

[-]DanArmak9y00

My point was that 'probability of minted coins' isn't a much "smaller-scale" question than 'probability of Alexander', that is, it isn't much simpler and easier to decide.

In our model of the world, P(coins) doesn't serve as a a simple 'input' to P(Alexander). Rather, we use P(Alexander) to judge the meaning of the coins we find. This is true not only on the Bayesian level, where all links are bidirectional, but in our high-level conscious model of the world, where we can't assign meaning to a coin with the single word Alexander on it without already believing that Alexander did all the things we think he did.

There's very little you can say about these coins if you don't already believe in Alexander.

[-]buybuydandavis9y00

But picking and choosing the category is again so subjective.

No. Use all information available. What problem are you actually looking to analyze? What information do you have?

Someone may also argue that woman inequality back then was so great that the data should only look at men, as a woman’s chance of being portrayed on a coin was skewed in a way that isn’t applicable to men.

That may be some useful information to include. Willfully ignoring relevant information, or not seeing how to use some information that seems like it may be relevant does not mean that the problem is "subjective", it means that we are often lazy and confused. And that's fine.

Include what you can transform into meaningful probabilities.

That thinking is hard is not a problem unique to bayesian methods.

[-]WikiLogicOrg9y00

Who decides on what information is relevant? If i said i want to use men without beards and Alexander never had one, that would be wrong (at least my intuition tells me it would be) as i am needless disregarding information that skews the results. You say use all the info but what about collecting info on items such as a sword or a crown. I feel that is not relevant and i think most would agree. But where to draw the line? Gram_Stone pointed me to the reference class problem which is exactly the issue i face.

[-]buybuydandavis9y00

I don't recall Jaynes discussing it much. Anyone?

For him, I think the reference class is always the context of your problem. Use all information available.

A brief google for "jaynes reference class" turned up his paper on The Well Posed Problem.

http://bayes.wustl.edu/etj/articles/well.pdf

He analyzes the Bertrand Paradox, and finds that in the real world, the mathematical "paradox" is resolved by identifying the transformation group (and thereby prior) that in reality is applicable.

My take on this is that "non-informative priors" and "principle of indifference" are huge misnomers. Priors are assertions of information, of transformation groups or equivalence classes believed appropriate to the problem. If your prior is "gee I don't know and don't care", then you're just making shit up.

[-]WikiLogicOrg9y00

Thanks for the links and info. I actually missed this last time around, so cannot comment much more until i get a chance to research Jaynes and read that link.

[-]ZeitPolizei9y00

Yeah, the estimates will always be subjective to an extent, but whether you choose historic figure, or all humans and fictional characters that ever existed or whatever, shouldn't make huge differences to your results, because, in Bayes' formula, the ratio P(C|E)/P(C) ¹ should always be roughly the same, regardless of filter.

¹ C: coin exists
E: person existed

Moderation Log