You're looking at Less Wrong's discussion board. This includes all posts, including those that haven't been promoted to the front page yet. For more information, see About Less Wrong.

Dreaded_Anomaly comments on IBM's "Watson" program to compete against "Jeopardy" champions tonight - Less Wrong Discussion

10 Post author: NihilCredo 14 February 2011 03:28PM

You are viewing a comment permalink. View the original post to see all comments and the full post content.

Comments (26)

You are viewing a single comment's thread.

Comment author: Dreaded_Anomaly 16 February 2011 02:52:45AM *  6 points [-]

Followup: I was able to attend a panel discussion tonight with several members of the team working on Watson. (My university is hosting panels for all three nights, as many of the team members were once students here. See watson.rpi.edu for recordings of the panel discussions.)

I spoke with one person from IBM after the episode aired, and confirmed that Watson is programmed with statistics from every Jeopardy episode. That allows it to search for the daily double efficiently, and afterward it does specifically turn to the lowest-point questions in each category in order to learn in which it might do best. It also employs game theory to determine how much to bet on daily doubles and final jeopardy, and which questions to pick when it has control.

They explained to us why Watson missed the final jeopardy in tonight's game. The category was "U.S. Cities" and the answer was "Its largest airport was named for a World War II hero; its second for a World War II battle." Watson learned that the category names don't always strictly imply the answer type, so it didn't consider that to be a strong indicator. It recognized that the clue was in two parts, but the second part was missing the noun and verb from the first, so Watson couldn't really get anything from it. Toronto's largest airport is named after a WWII vet, and there are cities named Toronto in the U.S. We were told that its confidence on Toronto was ~13%, and its second choice was Chicago (the correct answer) with a confidence of ~11%.

We were also told that its confidences are very well calibrated, so that, e.g., it will be right on average 9 out of 10 of the times that it displays 90% confidence.

Comment author: Psy-Kosh 17 February 2011 02:59:54AM 2 points [-]

The confidences are supposed to be probabilities? But they often summed to > 100%

Or is it "the procedure for generating the confidences is such that it'll be well calibrated for the highest ranking answer"?

Comment author: Dreaded_Anomaly 17 February 2011 03:47:05AM *  2 points [-]

No, sorry, that should say confidences everywhere, not probabilities. I had written it out incorrectly and then edited it, but I missed that one. Fixed now.

Comment author: Psy-Kosh 17 February 2011 03:51:16AM 0 points [-]

What I meant was "for the top three answers, the confidences would sometimes sum to > 100, so how does that work?"

Is the procedure defined as well calibrated only for the top answer, or is there something I'm missing?

Comment author: Dreaded_Anomaly 17 February 2011 04:34:20AM 1 point [-]

The confidence level compares the answer to other answers Watson's given in the past, based on how much the answer is supported by the evidence Watson has and uses. All the answers are generated and scored in parallel. It's not a comparison among the answers generated for a specific question, so it shouldn't necessarily add up to 100.

Quote from Chris Welty at last night's panel: "When [Watson] says 'this is my answer, 50% sure,' half the time he's right about that, and half the time he's wrong. When he says 80%, 20% of the time he's wrong."

Comment author: Psy-Kosh 17 February 2011 05:27:31AM 0 points [-]

Ah, thanks.