Below you will find pages that utilize the taxonomy term “Search”
Blinkx Video Search
Not surprisingly we’re going to see much more in the way of video or “multimedia” search from the likes of Yahoo, Google and of course Microsoft, but it looks like the Blinkx Video Search is the first one out of the starting blocks (too bad they don’t offer the results in RSS with enclosures)
“The beta offering, dubbed Blinkx TV, captures and indexes video and audio streams directly from television and radio broadcasters to make available news, sports and entertainment clips, the company said. The engine lets people group specific searches using “smart folders” that continuously collect multimedia content from sources including Fox News, HBO, ESPN, National Public Radio and the BBC World Service.” (via
CNET)
IBM the Google for businesses?
There are few specific details in this article on CNET about IBM’s push into the Enterprise Search market, but it does hint at IBM’s commitment to “higher-margin software and services” during its transition from the PC Hardware space.
“IBM is building software it hopes will make it the Google of corporate-search technology.”
“IBM is constructing a content management and search product line through acquisitions and by sifting through the results of its research and development labs.” (e.g. WebFountain)
Google Auto-Complete
I know, I know… first post in a long time. Trust me; I haven’t abandoned the blog — more on that another time.
Anyway, I just tested Google Suggest, the new Auto-Complete feature that’s currently in beta at Google. My first impression is that this is a wicked fast service! I hope they make it a default feature, but I imagine they’d still have to work out the scaling issues.
Beyond that, I’d like to see the Auto-Complete feature applied to all the other Google properties like GMail, Froogle, Google News and even the Google Tool Bar. I’m sure it’s in the works.
Clusty Clustering Curmudgeon
I’m not sure about the name or whether it will be a Google killjoy, but yesterday VivÃsimo open to the public their consumer search service called Clusty, which utilizes results from Yahoo’s Overture engine.
At the forefront of Clusty is VivÃsimo’s topic clustering of search results (hence the name). Searches can be preformed across web, news, images, shopping, encyclopedic and something called ‘gossip’.
Hidden behind the ‘Customize’ tab in Clusty are options to span your searches across eBay, Slashdot and Blogs! (although I’m not sure why they separate Slashdot and Blogs)
Google Down, IPO Pricing UP?
With Google recently releasing their IPO pricing estimates ranging between $108 to $135 per share with market capitalization between $29 billion and $36 billion, I suspect this doesn’t bode well:
Update: Apparently Google’s problems are due to the latest MyDoom worm variant (via /.)
One-up-man-ship: Google, Yahoo and of course Microsoft
After a week on the beaches of the Outer Banks in North Carolina with family I feel refreshed and recharged — perhaps not recharged enough to ride up l’Alpe d’ Huez, but I digress…
During my week away there were a few notable acquisitions made by Microsoft, Yahoo and Google.
First up is MSFT acquisition of my current favorite personal search tool Lookout, which integrates well with Outlook and complements my archive of NewsGator subscriptions nicely.
Blinkx Contextual Search
Om Malik praises the new contextual desktop search tool called Blinkx, which is currently available in a downloadable beta client as well as a web only interface.
Om goes on to cite some attractive examples such as…
“BlinkX is all about contextual search…Say you are reading through a big Microsoft Word document… the BlinkX bar at the top of the page, will retrieve relevant news item links with brief summaries… The software basically reads the entire document and builds a contextual link database on the fly.”
Pop Goes the GMail
At this point, PGtGM looks to only be a proof-of-concept, but essentially it is destine to be a POP3 proxy for Google’s GMail.
This is/will be a cool hack, but probably be a moot point once some of the rumored features of GMail start to roll out (namely POP3 and RSS/Atom).
Gmail in the Enterprise
I’ve been disconnected all week in PeopleSoft training, but during lunch today I caught Steve Gillmor’s eWeek column from last week about Gmail where he cites a potential example of the “Google Platform” in the enterprise.
“By the time the Gmail beta period ends in three to six months, Brin and his team have promised to enable forwarding and POP3 access. However, more is required of a corporate mail service. Those capabilities must be extended to allow Gmail to provide disconnected operation and IDE for packaged applications.”
Google Groups Beta 2
Google opened beta 2 of Google Groups. They’ve integrated it with Gmail, which gives you the ability to post to Usenet newsgroups as well as create new groups.
All these “new” features are strangely reminiscent of what
My Deja News offered back in 1998…
Hey wait a second! Google purchased Deja back in 2001. What took them so long to integrate services?
I’m not sure, but there is progress however…
For example, you can now get an Atom Feed of your favorite newsgroups.
Open Source Search Results Clustering Framework
My smart search buddies over at BA-Insight (who need to get their blog online!) pointed me to Carrot2, which is described as “ a system for clustering textual data“. (the site is a bit slow)
Generally speaking, Carrot2 is an Open Source alternative to Vivisimo. (Nice!)
Carrot2 has some other interesting features too. Like for example, it can be used as a meta-search component. In addition, it can be integrated with full-featured text search engines such as the Open Source Egothor and some other lesser known engines.
Google and Flash Index Friendly
As some of my friends can attest one of my long-standing gripes regarding the usage of Flash has been its inability to be indexed by search engines.
I suppose that argument is now moot since I just read that Google is now indexing Flash files (via Outer-Court and The Unofficial Google Weblog).
I still at times have problems with the usage of Flash. I suppose it’s more so now the miss-usage of Flash when it does not add any extra value over what a simple text description or graphic would accomplish.
Big Blue Tiki Masala
This doesn’t seem like a spicy chicken dish to me…
“IBM is set to unveil an upgraded version of its enterprise-level search technology. Code-named “Masala,” the new software is an improvement on Big Blue’s DB2 Information Integrator released last year. It is expected to enable simultaneous search of the Web, internal applications and corporate databases … and will be released in beta in early May. The full release is slated for the third or fourth quarter. ”
…
“By allowing corporate personnel to search a number of different content sources simultaneously, Masala could be effective in many different scenarios. Sales representatives, for example, could use it to learn about prospective clients by searching internal enterprise resource planning (ERP) systems, as well as information available on the Net.” (Via NewsFactor)
Visualizing Google News
Marcos Weskamp announced on his blog yesterday a new application called newsmap, which displays the constantly changing panorama of Google’s News Aggregator (across countries too). [IMHO: This is probably one of the most useful applications in flash I have yet to see.]
“Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator. A treemap visualization algorithm helps display the enormous amount of information gathered by the aggregator. Treemaps are traditionally space-constrained visualizations of information. Newsmap’s objective takes that goal a step further and provides a tool to divide information into quickly recognizable bands which, when presented together, reveal underlying patterns in news reporting across cultures and within news segments in constant change around the globe.”
Gunning for Google Below the Radar
Stefanie Olsen of CNET News.com pulls together a good overview of the start-ups targeting Google’s dominance.
Some quotes from the article:
“…Google also faces Lilliputian threats from a fast-growing group of start-ups that hope to replicate its own meteoric rise from unknown upstart to Internet powerbroker….
At the top of the list are companies like Quigo and Industry Brains that aim to improve on search engine advertising techniques. A second group, including Mooter, Eurekster and Dipsie, are advancing ways for people to get personalized query results, something that both Google and Yahoo also are hoping to perfect. Others are developing search tools tailored to specific localities as well as visualization features to assist in better targeting search results around specific topics.”
Personal Search Synergies
After the excitement of the last week, I’m finally catching up on work and subsequently blogging.
In particular last night I had a few minutes to check out one of the latest Desktop/Personal Search applications.
Specifically, Lookout Soft’s email search add-on for Outlook, which seems like a great tool.
In limited tests I found Lookout’s search accurate and fast (once the initial indexing was completed). In general I think Lookout and similar products such as X1 are immensely useful.
Thumbnails and Archives
Another new search engine player ZapMeta, which has page thumbnails as well direct links to older versions of a link via the Internet Archive’s Wayback Machine
What Exit: Google’s Location Search
I’m starting to find the beta of Google’s Location Search to be very handy and much faster than using my old stand-by.
Doc has even noticed that you can find local hot-spots in your area via the tool.
If addition, I found a neat little trick to add a localized link to your favorites or “bookmarklet” for quick access.
Go to Google’s Location Search page, enter your address into the address field with nothing in the search terms area, then click search.
The New Yahoo in Town
So far I’ve found Yahoo’s new search to be for the most part comparable with Google in most respects. Even the interface is minimalist. Well, that is in relation to other more gregarious Yahoo interfaces.
I even like the XML/RSS restrictive search features. Albeit it would be cooler to be able to get the search results as an RSS feed.
However, I found that Yahoo’s image search is suspiciously similar to Google’s.
For example, compare these image search results on Google and Yahoo.
Site Search Still Sucks
Jim Rapoza over at eWeek laments over the sorry state of customer facing corporate search. Here are some good quotes from Jim’s article:
“…there is one thing about the Web that remains poor: site search capabilities.”
“As we said in the 1997 article, if visitors or customers can’t find what they want on your site, they will often simply leave.”
“The search capabilities on most company and content-oriented Web sites are as bad now as they were several years ago. In fact, eWEEK Labs was dismayed to find that we could have easily rerun an article we wrote back in June 1997 on how to improve site searches…”
Mapping Google News
This is nifty:
Google News Map: “Why not parse Google News, find the first name match and draw a map with the latest headlines on the coordinates of the countries.”
Corporate Search in 2004?
Call it a prediction or stating the obvious, but I believe in the coming year corporate search solutions will be generating a steady buzz — driven primarily by innovative products that focus on unlocking the terabytes of knowledge squandered away in the reassess of the corporate network data stores. A position John Battelle seems to agree with in a recent post on the topic:
“…the overwhelming presumption of webwide search on your desktop is certainly rewiring how corporations think about their more private databanks. A robust market has grown up around “enterprise search,” (some companies, such as FAST, were spun off from consumer search companies, and Google maintains a unit focused on the market). There’s a crop of interesting startups to boot, including Tim Bray’s company, Antarctica. It’s entirely possible some of the next big ideas in search may well be developed in this more focused, less public field.”
Social Network Search
You can create a social network search interface by using Micah Alpern’s ‘Blogs I Read’ Google Hack and/or Feedster. However it appears that Eurekster has taken it one step further:
“Eurekster uses the six-degrees of separation concept to learn from your extended network of contacts and deliver you prioritized results based on the success and proximity of the searches they have done.”
The public beta currently available seems to require a bit more effort than the average consumer “searcher” would be willing to afford. However, I would like to see how well this works within the corporate enterprise — assuming of course you can integrate much of the social network mechanics with existing enterprise directory services such as Active Directory or other LDAP-compliant systems.
Presentation Layer of Search
Raul Valdes-Perez, president of Vivisimo is quoted in
an article at New Scientist about improvements to the user experience of search interfaces. In particular he is talking about Google News, MSN’s newly announced Newsbot and Vivisimo as yet to be released spontaneous clustering approach.
“[Raul Valdes-Perez] says that the engineering of search and rank algorithms “has gone about as far as it can go”. Now the way to improve the user experience is to work on the next layer of algorithms that determine the presentation of the “search and rank” results.”
Popularity Metrics
In Tim Bray’s latest essay on search he points out what I feel is an often overlooked aspect of Google’s PageRank when it is applied to enterprise search:
“[PageRank] Won’t Work for You If you’re writing or deploying a search engine for your Intranet or product catalogue or portal, Google’s PageRank trick probably won’t work, because most Intranet and catalogue and portal pages don’t point at each other. The Web is unique in that it has millions of authors independently making decisions about what’s important; aggregating those decisions is what makes PageRank so powerful.”
Google Deskbar
I usually don’t like to add too much extra paraphernalia to the default Windows desktop. However, after about an hour of sporadic usage, I think the Google Deskbar is on its way to becoming required gear for me.
“Google Deskbar enables you to search with Google from any application without lifting your fingers from the keyboard. Installs easily in your Windows taskbar.”
Ask Microsoft?
I just read this via /. about Microsoft’s interest in other search engines:
“Microsoft had also been linked with buys of any of the remaining search players, including Ask Jeeves and Looksmart, though the company has recently dumped Looksmart after deciding its results did not match up to those of its other search partners, notably Overture.”
Google is no sure thing
Quotes from an article in The Economist about Google’s ability to sustain its market-share.
“For Google to stay permanently ahead of other search-engine technologies is almost impossible, since it takes so little—only a bright idea by another set of geeks—to lose the lead.”
“Yahoo!, in fact, will probably be the first to attack. It now … has under its own roof all the elements of the business model that made Google such a success.
Enterprises Want Search
Cathleen Moore’s article in InfoWorld this week regarding the necessity of a Enterprise Search strategy should be considered “stating the obvious”.
Some excellent quotes from the article:
“The explosion of corporate content “both in the physical form of documents, records, and data, and in the human form of personal knowledge” has pressed companies into a crisis: Find a way to tap into and effectively leverage that knowledge, or watch your company’s most vital assets wither on the vine.”
Google eyes book search
In light of Amazon’s recent book search service, this report on CNet about Google in talks with publishers to provide a similar service seems a little strange.
“Google is in talks with several publishers to build a service that would allow Web surfers to search the full text of books online” (via CNet)
IMHO, the service does make sense for Amazon as a way to drive more consumers to book purchases and I suppose it could also turn another revenue source for Google — certainly as a research tool for business and academia, but is the market big enough to support the effort?
Amazon’s Full Text Search
Amazon has now extended its search to include the full text of over 120,000 books. It will even do hit-highlighting on the actual pages that match. Slick!
Search Engine for Research Documents
Penn State University has released a new search engine called SMEALSearch, which is focused on indexing academic and business white papers, articles and reports.
“SMEALSearch is a niche search engine that searches the web and catalogs academic articles as well as commercially produced articles and reports that address any branch of Business. The search engine crawls websites of universities, commercial organizations, research institutes and government departments to retrieve academic articles, working papers, white papers, consulting reports, magazine articles, and published statistics and facts.”
Collaborative Search
Unfortunately, I haven’t been using any IM clients recently due to the fact that their usage is banned and blocked where I work (more on that sometime).
Anyway, it was intriguing to read via Jeremy Zawodny Blog about Yahoo’s new IMVironment (e.g. plug-in) for its Messenger client that allows users to search collaboratively.
“Pull up a map or yellow pages listing together instantly. Working with a colleague in another city? Search together in real time for info, images, news…”
Google acquires Kaltix
Well, that was quick… Here’s quote via San Jose Biz Journal
“Kaltix Corp., a search technology startup, has been purchased by Internet search engine company Google Inc., of Mountain View. Financial terms of the deal were not disclosed.”
(Thanks for the heads-up Anil.)
I posted something about this back in August, but based on their “published research that claims to offer a way to compute search results nearly 1,000 times faster than what’s possible using current methods”, it seems like a smart move for Google.
Search:NG
Fredrick Marckini talks to advertising pioneer Jack Trout about exactly how Microsoft can trump Google…
Jack Trout, “says Google is dangerously close to becoming the generic in the space. Should that happen, the company would be open to brand and product positioning attacks on multiple fronts.”
[Fredrick Marckini] “…asked Jack Trout about the possibility of Microsoft gaining search market share by adding a search interface to its new OS as many expect it will — effectively creating a structural barrier to all other search engines by saving steps, eliminating the need to even launch a browser: “You’ve just defined a ‘next generation’ idea,” he said. “This way you make search an operating system component. That’s tough to unseat.”
Personal Search Tool
I just tested X1 for a few minutes and I’m truly impressed with its ability to quickly index and search my local repository of files and email messages.
“X1 is free PC software that uses an advanced indexing process that lets you find any word in any email or file on your computer, in under a second.”
The Pro version of X1 is just under $50 (US) and adds the ability to search network shares and native file preview options as well.
Teoma eclipsed Google?.
The Wall Street Journal covers Google’s closest competitors (except Microsoft [for now]):
“Some search industry gurus even preach heresy: that Google isn’t the field’s technology leader anymore.”
Teoma’s providing more value by providing refinements from the “community”
“Teoma’s software has, in effect, found the “community” associated with your search, and is listing what related topics that community is “discussing.” For “power blackout,” the refinements Friday included “electrical surge” and “cost of downtime.”
Personalizing PageRank
An article on CNet about a new search startup out of Stanford (link via Anil)
Kaltrix — “A stealth start-up out of Stanford University is hoping to raise the heat on one of the toughest problems in Web search–and possibly out-Google Google in the process.”
“Without discussing Kaltix’s plans publicly, the company’s founders have published research that claims to offer a way to compute search results nearly 1,000 times faster than what’s possible using current methods.”
Visual Enterprise Search Tool
John sent me a link to KartOO Technology’s search engine and visualization UI that was “developed in Flash to present a friendly and clear visual interface while integrating the graphic charter of your company.”
You can see here what I think is a demo of the search and flash-based UI that’s actually meta-search engine of sorts that visually maps result-sets in the UI.
I wonder what their pricing mode is for these tools.
PageRank within the Enterprise
Tim Bray’s latest On Search essay alludes to the effectiveness of Google’s PageRank within the Enterprise:
“…even if it turns out that popularity [PageRank] is the key thing for Internet search, the Internet is a very special place, and itÂ’s quite unlikely that popularity is the killer metadatum for the whole universe of search applications.”
However, I suspect that part of what’s in works with Google’s acquisition and integration of Blogger is to augment PageRank in the enterprise. Yet, utilizing blogs internally to enhance the PageRank-ing of documents and resources indexed with the Google Search Appliance will require some ramp-up time to become useful.
Synonyms! Fuzzy! Thesauri! Oh my!
Tim Bray’s sixth installment on seach:
“There are other ways than thesauri to improve the recall of search systems. Perhaps the best known is “Latent Semantic Indexing.” “
The Search for Intelligent Search
The latest in Tim BrayÂ’s series on search
I think this one is the best yet. Here’s a quote:
“Consider what a really intelligent search engine would have to do. It would have to read an arbitrary selection of documents in an arbitrary selection of dialects and styles, and ascertain what they are about. Then, it would have to look at an arbitrary query, once again in an arbitrary dialect and style, and ascertain what it is about. Then it would have to match the about-nesses of the query against that of the documents and return the right documents.”
Precision and Recall
Tim Bray posts his third in the series on Search. This one is on Precision and Recall. Here are a few good quotes:
“While precision and recall are very helpful in talking about how good search systems are, they are nightmarishly difficult to actually use, quantitatively. First of all, the notion of “relevance” is definitely in the eye of the beholder, and not, in the real world, a mechanical yes/no decision. Secondly, any information base big enough to make search engines interesting is going to be too big to actually compute recall ….”
Nobody Uses Advanced Search
Tim Bray’s second installment on search…
“Every search engine has an “advanced search“ screen, and nobody (quantitatively, less than 0.5% of users) ever goes there. This drove us nuts back at Open Text, because our engine was very structurally savvy and could do compound/boolean queries that look like what today weÂ’d call XPath. But nobody used it.”
I used it quite a bit. In fact, the advanced search pare was what I would bookmark on most search engines. However, indeed, I was certainly in the minority.
Search is Commoditized
A quote from Tim Bray’s first in a series on search technology:
“All search engines work more or less the same, and offer more or less the same APIs, and provide more or less the same quality of result.”
Interesting. I can’t wait till the next installment.
Personal and Enterprise Search
For some reason I missed this post from last week by Jon Udell about Indexing and searching Outlook email, but I thought his concluding paragraphs had a much broader impact on Enterprise Search in general.
… The Web has trained us, rightly, to expect that we just type in a word or two and get the “right” answer. I don’t know what the stats are on use of Google’s advanced search, or any advanced search, but my gut tells me such features are rarely used.
Topic-Sensitive page rankings feasible
A new paper by a Stanford group claims substantial increase in calculating Page Rank performance (used by Google), which could make room for personalized topical searches.
“Computer science researchers at Stanford University have developed several new techniques that together may make it possible to calculate Web page rankings as used in the Google search engine up to five times faster. The speed-ups to Google’s method may make it realistic to calculate page rankings personalized for an individual’s interests or customized to a particular topic.” (via BoingBoing)
Corporations seek better search results
“In the field of customer intelligence, search analytics is poised to become a star. However, some say it remains somewhat limited, much like enterprise search itself” (via CNet)
Determine the last result in Google
Ross Rader calls it, “Googlediving” and he provides a simple HOWTO. (link via Doc)
However, it could probably be automated by paging backward through a search result-set via the Googe API.
Ask Google?
AvaQuest has a neat Google hack called GooglePeople that demonstrates “it is possible to scour the Web for answers to questions using the vast data repository provided by Google.”
“GooglePeople uses your question to do a Google search. It then extracts the people names found on the top 10 result pages and chooses the likely answer to your question based on its scoring algorithm.”
Study shows corporate structure found in email
Hewlett-Packard scientists found a company’s power and communication structure may be as simple as examining patterns of e-mail exchanges.
“Because [Email] can be captured and stored, many scientists are eyeing e-mail as a tool to quantify exchanges that in the past have taken place in hallways or meetings. The researchers in this study said e-mail flow could provide a window into the communications structure of an organization.”
Searching the BlogSphere
This is pretty darn cool. Micah Alpern has released a Google BlogSphere search tool (w/source in PHP) that will search across all the sites whose RSS feeds are found in your OPML list.
I think Micah describes it best:
“Until the semantic web arrives the best method we have to understand a users point of view is to examine the RSS feeds they subscribe to. I currently read RSS feeds from over 70 websites. This list of RSS feeds includes friends, publications, and domain expects; all people whose opinions I value. If Googling my weblog is like searching by backup brain, then searching all sites in my RSS news aggregator is like searching the brains of people I respect and find interesting.” (link via Sebastien Paquet)
Hacking via Google
An interesting story on Wired news about finding open web-based databaes using Google (via Scripting)
Google buys Blogger
Dan Gillmor: “Google … has purchased Pyra Labs, a San Francisco company that created some of the earliest technology for writing weblogs, the increasingly popular personal and opinion journals.”
I’m sure this is going to be interesting.
Google Dance Tool
An interesting tool for Search Engine Optimization (SEO).
“During the update, which takes several days, the 3 Google servers display different results. Whilst the results vary from server to server, they are said to be “dancing”, hence the name Google Dance.”
Latent Semantic Indexing
Heh, I wish this was available when I was in school, “One other possible use of LSI is for an automated essay writer. Given a big enough body of knowledge about a certain topic, an LSI program could take your homework, mark it and even suggest areas you haven’t covered.” (via Ben Hammersley)
Google Intelligence Gathering .Net Application
“Juice is an alternate browser with built-in support for Google’s search API. With Juice, you don’t just search for information, you gather information.” (via ScriptingNews)
Looks interesitng. Although, I have similary functionality right within this Blog by integrating the Google API with post meta data
Googlert
“Googlert is an experimental free service which keeps you updated on what the web is saying about you, your products or your interests. It does this by performing regular Google searches on your behalf and sending an email alert containing any new results that appear.”
Google introduces Froogle
“Froogle is a new service from Google that makes it easy to find information
about products for sale online. By focusing entirely on product search, Froogle
applies the power of Google’s search technology to a very specific task:
locating stores that sell the item you want to find and pointing you directly to
the place where you can make a purchase.”
I wonder if this is available via Google’s web services API … it would be interesting to expand on Calin Uioreanu’s Amazon PHP API powered Simplest-Shop idea with Froogle.
Google Voice Search Demo
Those wacky Googlers …
“To try out this demo, please follow these simple steps:
1. Pick up the phone and call the automated voice search system at (650) 318-0165.
2. After the prompt Say your Search Keywords, say your query to the system.
3. Click this link and a new window will open with your voice search results.
4. Say another query, and the new window with the search results will be updated with the new results.
Google now indexing Office et al ;
pretty cool, google “… now extracts and indexes the text from many file formats, including Word, PowerPoint, Excel, PostScript, RTF, Lotus files, WordStar 2000, RFT (and old IBM format), MacWrite, and on and on. The extracted information is also converted in most cases to HTML which you can view by clicking a link next to the result."
Google gets a rival from Jersey
And it’s not a Soprano … New Jersey “technology company Hawk Holdings has launched an internet search engine called Teoma which is set to rival favourite Google.”
Better than Google?
“Vivisimo’s clustering technology is based on a specially developed algorithm to group or “cluster’’ textual documents … The output is a clear and hierarchical folder structure, allowing users to avoid “link overload’’ and to click only on the specific category of information that they need.”
Collaborative Portal
i’m thinking of experimenting with yourmom.com as a collaborative portal modeled after ODP … just a thought.
