Search Engines and Oracles

HalMorris

8 Search Engines and Oracles

8th Jul 2014

2 min read

8

Some time ago, I was following a conversation about Wolfram Alpha (http://www.wolframalpha.com/), an attempt to implement a sort of general purpose question answerer, something people have dreamed about computers doing for decades. Despite the theoretical availability to find out virtually anything from the Internet, we seem pretty far from any plausible approximation of this dream (at least for general consumption). My first attempt was:

Q: "who was the first ruler of russia?"

A: Vladimir Putin

It's a problematic question that depends on questions like "When did Russia become Russia", or "What do we count, historically as Russia", or even what one means by "Ruler", and a reasonably satisfactory answer would have had to be fairly complicated -- either that, or the question would have to be reworded to be so precise that one name could serve as the answer.

On another problematic question I thought it did rather well:

Q: what is the airspeed velocity of an unladen african swallow?

What occurred to me though, is that computer science could do something quite useful intermediate between "general purpose question answerer" and the old database paradigm of terms ANDed or ORed together. (Note that what Google does is neither of these, nor should it be placed on a straight line between the two -- but discussion of Google would take me far off topic).

A simple example of what I'd really like is a search engine that matches *concepts*. Does anyone know of such a thing? If it exists, I should possibly read about it and shut up, but let me at least try to be sure I'm making the idea clear:

E.g., I'd like to enter <<rulers of russia>>, and get a list of highly relevant articles.

Or, I'd like to enter <<repair of transmission of "1957 Ford Fairlane">> and get few if any useless advertisements, and something much better than all articles containing the words "repair" "transmission" and "1957 Ford Fairlane" -- e.g., *not* an article on roof repair that happened to mention that "My manual transmission Toyota truck rear-ended a 1957 Ford Fairlane".

It seems to me mere implementation of a few useful connectives like "of", and maybe the recognition of an adjective-noun phrase, and some heuristics like expanding words to *OR*ed lists of synonyms (ruler ==> (president OR king OR dictator ...)) would yield quite an improvement over the search engines I'm familiar with.

This level of simple grammatical understanding is orders of magnitude simpler than the global analysis and knowledge of unlimited sets of information sources, such as a general purpose question answerer would require.

I'd like to know if anyone else finds this interesting, or knows of any leads for exploring anything related to these possibilities.

By the way, when I entered "rulers of russia" into Wolfram-Alpha, the answer was still Putin, with brief mention of others going back to 1993, so "Russia" seems to be implicitly defined as the entity that has existed since 1993, and there is an attempt at making it an *answer to the (assumed) question* rather than a good list of articles that could shed light on various reasonable interpretations of the phrase.

Oracle AI

Personal Blog

8

New Comment

Rendering 0/8 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:11 AM

Moderation Log

8 Search Engines and Oracles

by HalMorris

8th Jul 2014

2 min read

8

Q: "who was the first ruler of russia?"

A: Vladimir Putin

On another problematic question I thought it did rather well:

Q: what is the airspeed velocity of an unladen african swallow?

E.g., I'd like to enter <<rulers of russia>>, and get a list of highly relevant articles.

I'd like to know if anyone else finds this interesting, or knows of any leads for exploring anything related to these possibilities.

Oracle AI

Personal Blog

8

New Comment

Rendering 0/8 comments, sorted by

top scoring

(show more) Click to highlight new comments since: Today at 12:11 AM

Moderation Log

More from HalMorris

Curated and popular this week

8Comments

Comment Permalink

gallabytes12y10

"Despite the theoretical availability to find out virtually anything from the Internet, we seem pretty far from any plausible approximation of this dream"

I'm not as convinced this is as easy as you seem to think it is. One of the fundamental problems of all attempts to do natural language programming and/or queries is that natural languages have nondeterministic parsing. There's lots of ambiguities floating about in there, and lots of social modeling is necessary to correctly parse most sentences.

To take your "first ruler of Russia" example, to infer the correct query, you'd need to know:

That they mean Russia the landmass not Russia the nation-state
What they mean by "ruler of Russia" (for example, does Kievan Rus count as Russia?)

NoSignalNoNoise12y10

I did some experimentation on how Wolfram Alpha handles ambiguity in "who is the ruler of $place?" using places with varying degrees of difficulty.

Monarchies

Britain

Ruler: Elizabeth II (the queen)
Prime Minister: David Cameron
President: "Wolfram|Alpha didn't understand your query. Showing instead results for query: president"
Chancellor: "Using closest Wolfram|Alpha interpretation: chancellor; listed Werner Faymann as Chancellor of Austria and Angela Merkel as Chancellor of Germany; "who is chancellor of the exchequer"

... (read more)

1HalMorris12y

Really, I'm proposing doing something that has to be much easier than the problem people seem to get fixated on, i.e. the general purpose question answering machine, which was a staple of science fiction decades ago (leading to the parody Q: "What is the meaning of life? A: 42). Besides which the goal of one crisp answer seems aimed at a childish mentality -- except in the (rarer than we may think) cases when there really is one crisp answer. After all, I wrote "What occurred to me though, is that computer science could do something quite useful intermediate between "general purpose question answerer" and the old database paradigm of terms ANDed or ORed together." Between two people, either there would be some implicit understanding (like We're talking about pre-USSR because that's the subject of the seminar we're in) or the question-ee might have to say "What are the parameters of the Russia you're talking about?" Then again the semi-smart search engine I'd like to see could just decline to resolve ambiguities, and return all articles treating any reasonable interpretation of the phrase, and make it the user's job to add qualifying phrases as needed. I am dreaming up a Simpson's episode in which a computer can convince a panel of experts that it is Bart Simpson, and the ensuing debate as to whether that was "passing the Turing Test".

See in context