How did they come up with the likelihood distribution? Maybe they sampled 100 products from each machine and for each sample counted the number of faulty products. Maybe they sampled 1.000.000 products from each machine...

We don't know which sample size is used: the likelihood distribution doesn't reveal this.

Implicitly, they have an infinite sample size, because the distribution on P(B|A_1) is infinitely precise. Suppose we also wanted to learn P(B|A_1) from the history of the factory: then we might model it as having a fixed rate of defective outputs, and the probability we assign to particular defect rates is a beta distribution. We might start off with a Jeffreys prior and then update as we see the tool produce defective or normal products, eventually ending up with, say, a beta(5.5,95.5) for tool A_1.

Exercise for the reader: given that hyperparameter distribution for P(B|A_1) (and similar ones for A_2 and A_3), do we need the full hyperparameter distribution for all three tools to determine the probability that a known defective output came off of A_1, or can we get the same answer using only a handful of moments from each distribution?

John Maynard Smith's

Evolutionary Geneticsis a classic textbook. The second edition has simulation/programming exercises after every chapter. Have fun :)