I see how this will work for a continuous distribution like the beta distribution. Visually the effect of a high number of samples will be that the curve is more sharply centered on the most probable part of the curve. The outlier cases are more quickly becoming improbable as we move outwards.
But then this must mean that the discrete, "perfect", "infinite-sample" likelihood distribution used in the Wikipedia example must have a very high influence on the posterior, almost marginalising the effect if the prior. Do I reason correctly here?
And does this "infinite-sample" likelihood distribution really have such a strong effect in the Wikipedia example? (I don't know how to judge this)
I suspect we should make clear two points under discussion: first, the rate of defective material that a machine spits out, and second, there is the question of how much knowing that material is defective tells us about what machine processed it.
satt's comment handles the second point; when we are trying to estimate which machine produced a single defective product, the sample size of products is, by necessity, one. (Because we've implicitly assumed that the defectivity of products is independent, sampling more of them isn't really any more interesting tha...
In the introductory example in the Wikipedia article on the Bayesian theorem, they start out with a prior distribution for P(machine_ID | faulty_product)* and then updates this using a likelihood distribution P(faulty_product | machine_ID) to acquire a posterior distribution for P(machine_ID | faulty_product).
How did they come up with the likelihood distribution? Maybe they sampled 100 products from each machine and for each sample counted the number of faulty products. Maybe they sampled 1.000.000 products from each machine...
We don't know which sample size is used: the likelihood distribution doesn't reveal this. Thus this matter doesn't influence the weight of the Bayesian update. But shouldn't it do so? Uncertain likelihood distributions should have a small influence and vice versa. How do I make the bayesian update reflect this?
I read the links provided by somervta in the 'Error margins' discussion from yesterday, but I'm not skillful enough to adapt them to this example.
* technically they just make the prior distribution a clone of the distribution P(machine_ID) but I like to keep the identity across the Bayesian update so I gave the prior and the posterior distribution the same form: P(machine_ID | faulty_product).