Microsoft pursuit of Google revealed

I was in meetings all morning. So, I missed this report earlier, but Dave just alerted me to the Microsoft and Google news.

“Microsoft approached Google, the internet search engine, two months ago to discuss a partnership or even a merger it emerged today.”

“Google showed little interest in overtures from the company that dominates the market for operating systems.”
(via The Guardian)

This is certainly interesting, yet given MSFT’s track record in this respect, the news is not surprising. My guess is that the initial rejection by Google spured MSFT’s recent MSN Search push.

Doh! The cruller is no more!

Donuts?

“Walk into any local Dunkin’ Donuts and you can purchase a caramel-swirl latte or sourdough bagel, a pumpkin muffin or powdered Munchkin. You can get a jelly stick, chocolate stick, or chocolate-coconut stick, pastries that are shaped somewhat like conventional crullers and contain roughly the same lip-smacking number of empty calories.”

“But you cannot get a cruller anymore… ” (more here)

Your Mom is so versatile you can now have her in RSS

Yes, indeed it’s true yourmom.com now has her very own set of RSS feeds!

I’m sure you’ll sleep better with this wonderful news. In fact, there’s more! Your mom even comes in Atom 0.1 format — thanks to FeedCreator class v1.3

Enjoy :-)

Posted in RSS

JavaScript: Search word hit-highlighting

I found searchhi, which is a slick JavaScript library by Stuart Langridge that will highlight keywords in your documents when the referring link to you page comes in from a seach engine such as Google:

searchhi JavaScript library is a way of automatically highlighting words on a page when that page was reached by a search engine. In essence, if you search, for example, Google for some words, and then follow a link from the search results to a searchhi enabled page, the words you searched for will be highlighted on that page.”

I was actually thinking of using PHP to do this, but Stuart’s JS code seems to be a better alternative given that the performance hit is on the client-side.

Google eyes book search

In light of Amazon’s recent book search service, this report on CNet about Google in talks with publishers to provide a similar service seems a little strange.

“Google is in talks with several publishers to build a service that would allow Web surfers to search the full text of books online” (via CNet)

IMHO, the service does make sense for Amazon as a way to drive more consumers to book purchases and I suppose it could also turn another revenue source for Google — certainly as a research tool for business and academia, but is the market big enough to support the effort?

Movable Type Blog Migration

Over the last week, usually in the mid-to-late evenings — after Catherine falls asleep, I have been slowly migrating my B2-based blog to Movable Type.

I must say that for the most part the process has been fairly straight forward. The MT system installed smoothly and customizing the core MT templates, while time-consuming getting them to fit my old B2 template, were rather easy and extremely flexible.

However, during the migration process I had some interesting obstacles. In particular, I wanted to seamlessly maintain the entire URL-space of my old B2 blog with my new MT blog. My initial thinking was that with a little data-scrubbing and massaging I could export the MySQL table data from B2 and import the data into the MT table-space.

In addition, if I could retain the same entry/post IDs between the old and new system, I could easily redirect links via an Apache mod_rewrite regular expression mapping.

After some initial head-scratching, this idea was a bit more complex that I had thought given that I wanted to include comments as well. Plus, I wasn’t sure if retaining the entry/post IDs would break MT.

I did some quick searches via Google and the great MT Support forums and found Bill Grady’s excellent B2 Export script for MT, which allowed me to dump all of my B2 posts and comments into MT’s import format. This format enabled me to easily import my old post data into MT.

The problem however was that (as far as I can tell) MT’s import format does not allow for the specification of entry/post IDs, which excluded me from using a simple Apache mod_rewrite regular expression to map the URL-space.

Oh well, back to the drawing-board…

After further research, I found the following links regarding interesting solutions that utilize MT archive templates to create global redirects in Apache’s .htaccess or httpd.conf formats.

Unfortunately, these solutions used the entry_id as the key field in the mapping, which cased problems for me because my old B2 blog had post IDs that were inconsistent with MT — Plus, for some reason the post ID in B2 were out of order.

I though I could use the post date as my key field, but for some reason I found a number of inconsistencies between the post dates in the two data sets. Very odd.

Instead I used the entry title as my key field; this required me to insure that the entry titles between both old and new data sets where precisely the same and not contain any duplicates. This way I could use the entry titles to map old post IDs from B2 to the new URL space in Movably Type.

Once the titles were synchronized, I created an MT template to export my newly imported MT entries in a CSV format that I could manipulate in Excel. I used the following MT Archive template:


<$MTEntryID$>,<$MTEntryDate format="%m/%d/%Y %H:%M"$>,<$MTEntryTitle$>,<$MTEntryLink$>

I then export my B2 post data into a CSV file, sorted the data in Excel, by title; opened the newly-exported MT data in Excel and also sorted it by title. I now had two matching sets of data, each with unique entry/posting IDs. The next step was to construct the redirect mapping between old post ID and the entry’s new URL.

Ultimately, I used a bit of PHP to do the redirecting. I did this by constructing an associative array using the post ID from B2 as the index, with the MT entry URL as the value. I also utilized the ‘array_key_exists’ PHP function to determine if the old post ID was found in the array.

Here’s a snippet of code:

$entry_array = array (
"613" => "http://www.hatch.org/blog/2002/06/17/404.php",
"576" => "http://www.hatch.org/blog/2002/04/18/1000_ultrapersonal_computer.php");

// entry lookup $p = post_id
if ($p) {
if (array_key_exists($p, $entry_array)) { $url = $entry_array[$p]; } }

// redirect
header( "Location: ".$url );

Worked like a charm!

I wish I could use this or a similar technique to redirect my old RSS feed to my feed’s new location, but that’s a topic for another day…

Is Comment Spam Cost Effective?

I’m getting my fair share of comment spam like many other bloggers, but I can’t imagine that the cost/time ratio is actually worth it.

I think Sam Ruby sums it up best:

“65 minutes to create. Carefully crafted to appear to be on topic. 10 seconds to wipe out.”

LOL! Dumb asses!

What is WinFS?

J. Wilcox over at Microsoft Monitor tries to determine if the new file system, dubbed WinFS, in Microsoft’s NG:OS Longhorn, will indeed be considered “new“:

“If Mr. LaMonica’s WinFS description is accurate, then WinFS really is Microsoft database technology running on top of the existing NTFS file system.”

Salesforce.com on Social Networking

Ross Mayfield posted an excerpt from an interview with the CSO of Salesforce.com in regard to Social Networking. The interview was conducted by IBDN, but I wasn’t able to find a direct link. However, here’s a quote from the M2M blog:

IBDN: We know that consumers will pay to find a date, but will they pay to find business contacts?

FULBRIGHT: Yes, one name for them is “leads,” and sales and marketing organizations pay thousands of dollars for leads today. Leads are the life-blood of every business. Another type of paid business contact is called “candidates,” and again companies have been paying recruiters or internal referrals thousands of dollars for great candidates for at least 50 years.

Especially now, with tight budgets, businesses must run more efficiently and want to find the right contacts to meet their needs, in as streamlined a manner as possible. To the extent that businesses can start with warm leads instead of cold leads, and an existing pool of candidates when they have an opening, they will save millions of dollars.

I’d like to add that companies such as MediaMap and Vocus specialize in facilitating the “lead” connections — especially in the Public Relations industry, which is an industry that feeds on “social connections”.

NYC Subway RSS Feeds

Heh, RobotPolishers’ has created RSS feeds of MTA’s services updates for the NYC Subway over at Disorient Express:

“So out of frustration, sheer geekiness and a desire to toy around with RSS, I decided to put together some feeds of the MTAs service updates, based on a program scanning their own weekly website updates.”

Posted in RSS

Jahia: Integrated Java CMS and Portal

I had a chance to take a look at the new 4.0 version of Jahia and I must say that I am very impressed. As 100% Java solution, it’s a competitive alternative to SharePoint.

Jahia is not quite Open Source however, you do get the source code, but the license model is “Jahia Collaborative Source License (JCSL)”, which roughly means that you can either pay for the license in dollars or pay with code contributions to the project. Certainly an interesting model that is similar to Sun Collaborative Source License (SCSL).

Some of the other interesting facets to Jahia are the stack of Java Open Source Projects that the default install includes. Like for example Tomcat, Slide WebDAV, the Lucerne Search Engine, Struts, OpenLDAP, and HSQL Database Engine.

As far a features go, it definitely crosses the line between CMS and Portal, by integrating rich CMS functionality with workflow and versioning as well as document management (check-in/check-out) with WebDAV access. This is all on top of a highly-configurable “portlet-based” interface framework.

Nice.

Search Engine for Research Documents

Penn State University has released a new search engine called SMEALSearch, which is focused on indexing academic and business white papers, articles and reports.

“SMEALSearch is a niche search engine that searches the web and catalogs academic articles as well as commercially produced articles and reports that address any branch of Business. The search engine crawls websites of universities, commercial organizations, research institutes and government departments to retrieve academic articles, working papers, white papers, consulting reports, magazine articles, and published statistics and facts.”

However, I’d love to see the code release as an Open Source Project. [hint-hint-nudge-nudge]