This is a problem that machine learning can tackle. Feel free to contact me by PM for technical help.
To make sure I understand your problem:
We have many copies of the Big Book. Each copy is a collection of many sheets. Each sheet was produced by a single tool, but each tool produces many sheets. Each shop contains many tools, but each tool is owned by only one shop.
Each sheet has information in the form of marks. Sheets created by the same tool at similar times have similar marks. It may be the case that the marks monotonically increase until the tool is repaired.
Right now, we have enough to take a database of marks on sheets and figure out how many tools we think there were, how likely it is each sheet came from each potential tool, and to cluster tools into likely shops. (Note that a 'tool' here is probably only one repair cycle of an actual tool, if they are able to repair it all the way to freshness.)
We can either do this unsupervised, and then compare to whatever other information we can find (if we have a subcollection of sheets with known origins, we can see how well the estimated probabilities did), or we can try to include that information for supervised learning.
Subscribe to RSS Feed
= f037147d6e6c911a85753b9abdedda8d)
Have to define your features somehow.
Really? I was under the opposite impression, that stylometry was, since the '60s or so with the Bayesian investigation of Mosteller & Wallace into the Federalist papers, one of the areas of triumph for Bayesianism.
No, not really. I think I would describe GIGO in this context as 'data which is equally consistent with all theories'.
I don't understand what this means. Can you say more?