Open Source Search Results Clustering Framework

Search May 12th, 2004

My smart search buddies over at BA-Insight (who need to get their blog online!) pointed me to Carrot2, which is described as “ a system for clustering textual data“. (the site is a bit slow)

Generally speaking, Carrot2 is an Open Source alternative to Vivisimo. (Nice!)

Carrot2 has some other interesting features too. Like for example, it can be used as a meta-search component. In addition, it can be integrated with full-featured text search engines such as the Open Source Egothor and some other lesser known engines.

Overall I’m impresses with the various clustering algorithms you can select to display your results. Performance however seems to be lagging, but I’m sure that can be worked out.

Google and Flash Index Friendly

Search April 28th, 2004

As some of my friends can attest one of my long-standing gripes regarding the usage of Flash has been its inability to be indexed by search engines.

I suppose that argument is now moot since I just read that Google is now indexing Flash files (via Outer-Court and The Unofficial Google Weblog).

I still at times have problems with the usage of Flash. I suppose it’s more so now the miss-usage of Flash when it does not add any extra value over what a simple text description or graphic would accomplish.

However, at least I can now “google” for my favorite flash

Big Blue Tiki Masala

Search April 26th, 2004

This doesn’t seem like a spicy chicken dish to me…

“IBM is set to unveil an upgraded version of its enterprise-level search technology. Code-named “Masala,” the new software is an improvement on Big Blue’s DB2 Information Integrator released last year. It is expected to enable simultaneous search of the Web, internal applications and corporate databases … and will be released in beta in early May. The full release is slated for the third or fourth quarter. ”

“By allowing corporate personnel to search a number of different content sources simultaneously, Masala could be effective in many different scenarios. Sales representatives, for example, could use it to learn about prospective clients by searching internal enterprise resource planning (ERP) systems, as well as information available on the Net.” (Via NewsFactor)

I wonder if Masala is related to WebFountain? Ahh! So it is…

How is Masala search related to WebFountain?
Masala and WebFountain share technologies but serve different needs. WebFountain is a hosted solution focused on advanced analytics for the internet, while Masala provides search and analytics capabilities for enterprise content.” [more here]

Visualizing Google News

Search March 31st, 2004

Marcos Weskamp announced on his blog yesterday a new application called newsmap, which displays the constantly changing panorama of Google’s News Aggregator (across countries too). [IMHO: This is probably one of the most useful applications in flash I have yet to see.]

Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator. A treemap visualization algorithm helps display the enormous amount of information gathered by the aggregator. Treemaps are traditionally space-constrained visualizations of information. Newsmap’s objective takes that goal a step further and provides a tool to divide information into quickly recognizable bands which, when presented together, reveal underlying patterns in news reporting across cultures and within news segments in constant change around the globe.”

Gunning for Google Below the Radar

Search March 11th, 2004

Stefanie Olsen of CNET News.com pulls together a good overview of the start-ups targeting Google’s dominance.

Some quotes from the article:

“…Google also faces Lilliputian threats from a fast-growing group of start-ups that hope to replicate its own meteoric rise from unknown upstart to Internet powerbroker….

At the top of the list are companies like Quigo and Industry Brains that aim to improve on search engine advertising techniques. A second group, including Mooter, Eurekster and Dipsie, are advancing ways for people to get personalized query results, something that both Google and Yahoo also are hoping to perfect. Others are developing search tools tailored to specific localities as well as visualization features to assist in better targeting search results around specific topics.”

“…some analysts now predict it’s just a matter of time before Google loses its dominance to rivals in at least some areas of the search market.”

Personal Search Synergies

Search March 9th, 2004

After the excitement of the last week, I’m finally catching up on work and subsequently blogging.

In particular last night I had a few minutes to check out one of the latest Desktop/Personal Search applications.

Specifically, Lookout Soft’s email search add-on for Outlook, which seems like a great tool.

In limited tests I found Lookout’s search accurate and fast (once the initial indexing was completed). In general I think Lookout and similar products such as X1 are immensely useful.

However, in my experiences the indexing functions of these personal search tools always seem to annoy me with larger corpuses — such as my fat 800 MB PST file — even on a relatively fast system.

So I end up uninstalling these tools because they simply get in the way more than they facilitate.

It seems I’m not alone with that opinion as John Battelle mentioned last week:

“Desktop search (ie searching your own hard drive) is one of those things that seems to have gotten worse in the past ten years (why Yahoo, MSFT or Google don’t do it is a mystery, imagine the goodwill…).”

As well what Philipp Lenssen said in reference to John’s post:

“We wonder why Google takes below a single second to find something in billions of pages (and do some clever ranking at the same time), whereas Windows takes anything from minutes to hours – for a small fraction of documents.”

In my opinion, I’d like to see more Personal Search applications borrow from architectures such as P2P, Social Networking and Grid computing.

For example, I imagine a cross between Groove, Eurekster and United Devices.

Specifically, I would take the secure networking and synchronous file updates from Groove and toss out its thick client.

Push indexing, data mining and analytics to a social or peer network via United Devices toolkit and wrap the entire package in a web service-based API that can be easily embedded into productivity applications such as those found in MS Office, Open Office and even Mozilla.

Indeed, I’m glossing over the details, but I’m sure something like this is either already out there or about to be released. Perhaps it’s where MSFT is going with Longhorn. I’m not entirely sure, but I am sure that there are obvious synergies with these technologies that I’ve yet to see completely tapped.

Thumbnails and Archives

Search February 26th, 2004

Another new search engine player ZapMeta, which has page thumbnails as well direct links to older versions of a link via the Internet Archive’s Wayback Machine

What Exit: Google’s Location Search

Search February 19th, 2004

I’m starting to find the beta of Google’s Location Search to be very handy and much faster than using my old stand-by.

Doc has even noticed that you can find local hot-spots in your area via the tool.

If addition, I found a neat little trick to add a localized link to your favorites or “bookmarklet” for quick access.

Go to Google’s Location Search page, enter your address into the address field with nothing in the search terms area, then click search.

The resulting page will be a customize search page for your area. Simply add this link to your favorites or drag it to your links bar.

Now if Google’s Location Search could only find my keys, I’d be a happy camper.

On second thought, finding lost objects isn’t so far-fetched with the proliferation of RFID’s and localized search… hmm …

The New Yahoo in Town

Search February 18th, 2004

So far I’ve found Yahoo’s new search to be for the most part comparable with Google in most respects. Even the interface is minimalist. Well, that is in relation to other more gregarious Yahoo interfaces.

I even like the XML/RSS restrictive search features. Albeit it would be cooler to be able to get the search results as an RSS feed.

However, I found that Yahoo’s image search is suspiciously similar to Google’s.
For example, compare these image search results on Google and Yahoo.

Although it wasn’t apparent from the site, perhaps they’re still using Google in some areas.

Site Search Still Sucks

Search February 2nd, 2004

Jim Rapoza over at eWeek laments over the sorry state of customer facing corporate search. Here are some good quotes from Jim’s article:

“…there is one thing about the Web that remains poor: site search capabilities.”

“As we said in the 1997 article, if visitors or customers can’t find what they want on your site, they will often simply leave.”

“The search capabilities on most company and content-oriented Web sites are as bad now as they were several years ago. In fact, eWEEK Labs was dismayed to find that we could have easily rerun an article we wrote back in June 1997 on how to improve site searches…”

“… However, RSS could also be used on sites to create channels for commonly searched categories of content. Users could then subscribe to or occasionally open these channels to get updates of information changes on a site.”