PrawnOfFate comments on Welcome to Less Wrong! (5th thread, March 2013) - Less Wrong

27 Post author: orthonormal 01 April 2013 04:19PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (1750)

You are viewing a single comment's thread. Show more comments above.

Comment author: HumanitiesResearcher 17 April 2013 01:14:57AM *  7 points [-]

Hi everyone,

I'm a humanities PhD who's been reading Eliezer for a few years, and who's been checking out LessWrong for a few months. I'm well-versed in the rhetorical dark arts, due to my current education, but I also have a BA in Economics (yet math is still my weakest suit). The point is, I like facts despite the deconstructivist tendency of humanities since the eighties. Now is a good time for hard-data approaches to the humanities. I want to join that party. My heart's desire is to workshop research methods with the LW community.

It may break protocol, but I'd like to offer a preview of my project in this introduction. I'm interested in associating the details of print production with an unnamed aesthetic object, which we'll presently call the Big Book, and which is the source of all of our evidence. The Big Book had multiple unknown sites of production, which we'll call Print Shop(s) [1-n]. I'm interested in pinning down which parts of the Big Book were made in which Print Shop. Print Shop 1 has Tools (1), and those Tools (1) leave unintended Marks in the Big Book. Likewise with Print Shop 2 and their Tools (2). Unfortunately, people in the present don't know which Print Shop had which Tools. Even worse, multiple sets of Tools can leave similar Marks.

The most obvious solution that I can see is

  • to catalog all Marks in the Big Book by sheet (a unit of print production, as opposed to the page), then
  • sort sheets by patterns of Marks, then
  • make some associations between the patterns of Marks and Print Shops, and then
  • propose Print Shops [x,y,z] to be the sites of production for the Big Book.

If nothing else, this method can define the n-number of Print Shops responsible for the Big Book.

The Bayesian twist on the obvious solution is to add some testing onto the associations, above. Specifically,

  • find some books strongly associated with Print Shops [x,y,z], in order to

  • assign probability of patterns of Marks to each Print Shop, then

  • revise initial associations between Print Shops [x,y,z] and the Big Book proportionally.

I'm far from an expert in Bayesian methods, but it seems already that there's something missing here. Is there some stage where I should take a control sample? Also, how can I find a logical basis for the initial association step, when there are many potential Print Shops? Lastly, how can I account for the decay of Tools, thus increasing Marks, over time?

Comment author: PrawnOfFate 17 April 2013 02:14:27AM *  0 points [-]

How about talking clearly about whatever you are currently hinting at?

Comment author: Kindly 17 April 2013 04:07:59PM 4 points [-]

I dunno, I find the complexity-hiding capitalized nouns things strangely attractive. Maybe there should be more capitalized nouns. Why isn't Sheets capitalized?

This is probably coming back to my fascination with graph theory, which has similar but even more exotic terminology. "A spider is a subdivision of a star, which is a kind of tree made up only of leaves and a root; a star with three arcs is called a claw."

Comment author: HumanitiesResearcher 18 April 2013 05:17:47AM 1 point [-]

I was openly warned by a professor (who will likely be on the dissertation committee) not to talk about this project widely.

The capitalized nouns are to highlight key terms. I believe the current description is specific enough to describe the situation accurately and without misleading people, but not too specific to break my professor's (correct) advice.

Have I broken LW protocol? Obviously, I'm new here.

Comment author: beoShaffer 18 April 2013 05:20:18AM 0 points [-]

I was openly warned by a professor (who will likely be on the dissertation committee) not to talk about this project widely.

Did they say why?

Comment author: HumanitiesResearcher 21 April 2013 04:23:22PM 3 points [-]

Yes. He said that I should be careful about sharing my project because, otherwise, I'll be reading about it in a journal in a few months. His warning may exaggerate the likelihood of a rival researcher and mis-value the expansion of knowledge, but I'm deferring to him as a concession of my ignorance, especially regarding rules of the academy.

Comment author: IlyaShpitser 22 April 2013 04:40:06PM 6 points [-]

"Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats."

Comment author: Vaniver 22 April 2013 07:47:47PM 6 points [-]

This is heavily context-dependent. Many fields are idea-rich and implementation-poor, in which case you do have to ram ideas down people's throats, because there's a glut of other ideas you have to compete against. But in fields that are implementation-rich and idea-poor, ideas should be guarded until you've implemented them. There are no doubt academic fields where the latter case applies.

Comment author: gwern 22 April 2013 08:01:50PM 1 point [-]

But in fields that are implementation-rich and idea-poor, ideas should be guarded until you've implemented them. There are no doubt academic fields where the latter case applies.

Can you name any?

Comment author: shminux 22 April 2013 09:02:48PM *  5 points [-]

I've been privately told of several such cases in high-energy physics. Below is an excerpt from the Politzer's Nobel lecture. He discovered Asymptotic freedom (that quarks are essentially connected by the miniature rubber bands which have no tension when the quarks are close to each other).

I slowly and carefully completed a calculation of the Yang-Mills beta function. I happen to be ambidextrous and mildly dyslexic. So I have trouble with left/right, in/out, forward/backward, etc. Hence, I derived each partial result from scratch, paying special attention to signs and conventions. It did not take long to go from dismay over the final minus sign (it was indeed useless for studying low energy phenomena) to excitement over the possibilities. I phoned Sidney Coleman. He listened patiently and said it was interesting. But, according to Coleman, I had apparently made an error because David Gross and his student had completed the same calculation, and they found it was plus. Coleman seemed to have more faith in the reliability of a team of two, which included a seasoned theorist, than in a single, young student. I said I’d check it yet once more. I called again about a week later to say I could find nothing wrong with my first calculation. Coleman said yes, he knew because the Princeton team had found a mistake, corrected it, and already submitted a paper to Physical Review Letters.

He does not explicitly say that Gross was tipped off, but it's easy to read between the lines. The rest of his lecture, titled The Dilemma Of Attribution is also worth reading.

Comment author: Vaniver 22 April 2013 10:29:42PM 1 point [-]

It may be more precise to say there are academic groups to which that description applies, and that discretion is worthwhile in their proximity. Examples of those still living will remain private for obvious reasons.

Comment author: MugaSofer 17 April 2013 01:49:26PM 0 points [-]

I think Gwern's right on this.

Comment author: PrawnOfFate 17 April 2013 02:16:59PM 2 points [-]

But Humanities has rejected that!

Comment author: HumanitiesResearcher 18 April 2013 05:22:35AM 0 points [-]

Yep. It's not the Bible. I suspect that there are already good stats compiled on the Q-source, etc.

In a way it's not only futile but limiting to play the guessing game. There are lots of possible applications of Bayesian methods to the humanities. Maybe this discussion will help more projects than my own.

Comment author: MugaSofer 19 April 2013 01:18:16PM -2 points [-]

Ah, OK. They hadn't when I wrote it.

Comment author: Nornagest 22 April 2013 05:39:03PM 1 point [-]

That was my first thought too; there's a huge textual analysis tradition relating to the Bible and what I know of it maps pretty closely to the summary, although it's also mature enough that there wouldn't be much reason to obfuscate it like this. But it's not implausible that it applies to some other body of literature. I understand there are some similar things going on in classics, for example.

The specifics shouldn't matter too much, though. Although some types of mark are going to be a lot more machine-distinguishable than others, and that's going to affect the kinds of analysis you can do -- differences in spelling and grammar, for example, are far machine-friendlier than differences in letterforms in a manuscript.

Comment author: HumanitiesResearcher 17 April 2013 01:42:44PM 0 points [-]

Thanks for the feedback. I actually cleared up the technical language considerably. I don't think there's any need to get lost in the weeds of the specifics while I'm still hammering out the method.