Synonyms! Fuzzy! Thesauri! Oh my!
Tim Bray’s sixth installment on seach:
“There are other ways than thesauri to improve the recall of search systems. Perhaps the best known is “Latent Semantic Indexing.” “
The Search for Intelligent Search
The latest in Tim BrayÂ’s series on search
I think this one is the best yet. Here’s a quote:
“Consider what a really intelligent search engine would have to do. It would have to read an arbitrary selection of documents in an arbitrary selection of dialects and styles, and ascertain what they are about. Then, it would have to look at an arbitrary query, once again in an arbitrary dialect and style, and ascertain what it is about. Then it would have to match the about-nesses of the query against that of the documents and return the right documents.”
Precision and Recall
Tim Bray posts his third in the series on Search. This one is on Precision and Recall. Here are a few good quotes:
“While precision and recall are very helpful in talking about how good search systems are, they are nightmarishly difficult to actually use, quantitatively. First of all, the notion of “relevance” is definitely in the eye of the beholder, and not, in the real world, a mechanical yes/no decision. Secondly, any information base big enough to make search engines interesting is going to be too big to actually compute recall ….”
Nobody Uses Advanced Search
Tim Bray’s second installment on search…
“Every search engine has an “advanced search“ screen, and nobody (quantitatively, less than 0.5% of users) ever goes there. This drove us nuts back at Open Text, because our engine was very structurally savvy and could do compound/boolean queries that look like what today weÂ’d call XPath. But nobody used it.”
I used it quite a bit. In fact, the advanced search pare was what I would bookmark on most search engines. However, indeed, I was certainly in the minority.
Search is Commoditized
A quote from Tim Bray’s first in a series on search technology:
“All search engines work more or less the same, and offer more or less the same APIs, and provide more or less the same quality of result.”
Interesting. I can’t wait till the next installment.
Personal and Enterprise Search
For some reason I missed this post from last week by Jon Udell about Indexing and searching Outlook email, but I thought his concluding paragraphs had a much broader impact on Enterprise Search in general.
… The Web has trained us, rightly, to expect that we just type in a word or two and get the “right” answer. I don’t know what the stats are on use of Google’s advanced search, or any advanced search, but my gut tells me such features are rarely used.
Topic-Sensitive page rankings feasible
A new paper by a Stanford group claims substantial increase in calculating Page Rank performance (used by Google), which could make room for personalized topical searches.
“Computer science researchers at Stanford University have developed several new techniques that together may make it possible to calculate Web page rankings as used in the Google search engine up to five times faster. The speed-ups to Google’s method may make it realistic to calculate page rankings personalized for an individual’s interests or customized to a particular topic.” (via BoingBoing)
Corporations seek better search results
“In the field of customer intelligence, search analytics is poised to become a star. However, some say it remains somewhat limited, much like enterprise search itself” (via CNet)
Determine the last result in Google
Ross Rader calls it, “Googlediving” and he provides a simple HOWTO. (link via Doc)
However, it could probably be automated by paging backward through a search result-set via the Googe API.
Ask Google?
AvaQuest has a neat Google hack called GooglePeople that demonstrates “it is possible to scour the Web for answers to questions using the vast data repository provided by Google.”
“GooglePeople uses your question to do a Google search. It then extracts the people names found on the top 10 result pages and chooses the likely answer to your question based on its scoring algorithm.”