gallabytes comments on Search Engines and Oracles - Less Wrong Discussion
You are viewing a comment permalink. View the original post to see all comments and the full post content.
You are viewing a comment permalink. View the original post to see all comments and the full post content.
Comments (7)
I'm not as convinced this is as easy as you seem to think it is. One of the fundamental problems of all attempts to do natural language programming and/or queries is that natural languages have nondeterministic parsing. There's lots of ambiguities floating about in there, and lots of social modeling is necessary to correctly parse most sentences.
To take your "first ruler of Russia" example, to infer the correct query, you'd need to know:
I did some experimentation on how Wolfram Alpha handles ambiguity in "who is the ruler of $place?" using places with varying degrees of difficulty.
Monarchies
Britain
Scottland
Canada * Ruler: Elizabeth II
Spain:
Jordan:
Non-monarchies
America
*Ruler: Barack Obama
Germany
Ireland
France
China
Republic of China:
Korea:
South Korea:
North Korea:
Ruler: Kim Jong Il (not Kim Jong Un), along with some rather odd chronology (apparently the position was vacant from 1994 to 1997); his end date is listed as Dec. 17, 2011; no mention of Kim Jong Un Georgia
Ruler: (none in office); the current Governor, Nathan Deal, is listed as a past governor with and end date of today, as are Senators Johnny Isakson and Saxby Chambliss, neither of whom has ever been governor; former governor Roy Barnes is listed with the correct dates, but his successor Sonny Perdue is not listed; the page also states that it is assuming "Georgia" is referring to the US state and gives a link to look up the country instead
Republic of Georgia
New Mexico
Mexico
"Who is the prime minister?": exceeded max computation time
General Observations The "ruler" of a place is determined based solely on the title
Close matches don't count (prime minister of Germany, president of Spain)
It tries to answer every question but says when it doesn't understand something (president of britain, ruler of korea)
Sometimes it gets the facts blatantly wrong, but almost right (Kim Jong Il still rules North Korea; Georgia has no governor)
It handles both very slightly ambiguous (China vs Republic of China; Mexico vs New Mexico) and moderately ambiguous ("Georgia" as State of Georgia vs Republic of Georgia) reasonably
I tried "first ruler of Russia" and got a list of the rulers of post-USSR Russia. Then I tried "first king of Russia" and it told me that the total area of Russia is 2.779E9 ancient kings.
AI has a long way to go.
Really, I'm proposing doing something that has to be much easier than the problem people seem to get fixated on, i.e. the general purpose question answering machine, which was a staple of science fiction decades ago (leading to the parody Q: "What is the meaning of life? A: 42). Besides which the goal of one crisp answer seems aimed at a childish mentality -- except in the (rarer than we may think) cases when there really is one crisp answer.
After all, I wrote "What occurred to me though, is that computer science could do something quite useful intermediate between "general purpose question answerer" and the old database paradigm of terms ANDed or ORed together."
Between two people, either there would be some implicit understanding (like We're talking about pre-USSR because that's the subject of the seminar we're in) or the question-ee might have to say "What are the parameters of the Russia you're talking about?"
Then again the semi-smart search engine I'd like to see could just decline to resolve ambiguities, and return all articles treating any reasonable interpretation of the phrase, and make it the user's job to add qualifying phrases as needed.
I am dreaming up a Simpson's episode in which a computer can convince a panel of experts that it is Bart Simpson, and the ensuing debate as to whether that was "passing the Turing Test".
Why? That seems really unhelpful. I'd much prefer the engine to answer like a human expert, who habitually start with "That depends on what you mean by..." I imagine it could assess its confidence in its choices for interpretation, discard any with less than (say) 10% probability, and if more than one remains, give them to me in an ordered list, to click on the one I mean. Kind of like a wikipedia disambiguation page. (If more than one term need clarification, do both one the same page.) I'm confident this would solve the issue you describe, at least in cases where the confidence assessment isn't very wrong, and it sounds to me very valuable. Because when you say:
...you aren't describing children, you are describing most people.
I wrote "the goal of one crisp answer seems aimed at a childish mentality -- except in the (rarer than we may think) cases when there really is one crisp answer."
which chaosmage unfortunately truncated.
Anyway, more often than not, I at any rate, want food for thought, not "the answer".
An earlier e.g. from my initial article: <<repair of transmission of "1957 Ford Fairlane">> to which I'd want to add articletype:diy, an expert might indeed provide one crisp answer, somewhat like the "feeling lucky" answer from Google, but I'd feel I was missing something. I'd like to see every relevant answer with some attempt at ranking them. I might prefer a youtube, or I might prefer text, diagrams, and the occasional photo. I might watch a youtube then print out the text version to take out to the garage. Some articles might reference tools that I never heard of, and doubt I could lay my hands on, while others provided ways to do it without those tools. I might watch the youtube with the special tool and the youtube without, and conclude the latter takes more dexterity than I have and that I'd better find a way to borrow that tool.
To the extent people want just an answer, or worse, "the answer" rather than "food for thought" it may be at least in part due to the cultural environment saying "We have the answer!" "Answers at 6:00", and so on. Then again, in some situations I'd just want to know the answer, or the most popular match.