Posts

Showing posts from October, 2005

Third parties indexing Google Base

So, before hard evidence comes to light, let me speculate... I can submit structured data to Google Base for them to do with as they please. No doubt there will be a facetted-browser for users to search through that data and maybe even an API. But will third-party search engines be able to index any of it? Web sites can be crawled/scraped, blog publishing provides a pinging mechanism. Will Google Base provide a mechanism that doesn't fall foul of Google's terms and conditions of use?

Splogs

Well Google do seem to be trying to do things to limit the problem of spam blogs hosted on blog*spot. They've introduced a 'type in the letters' test when posting to a blog that looks suspicious. Of course if you're posting via the Blogger API then its just rejected. More recently I note that the Atom feed for a blog*spot hosted blog has a <summary> tag with all the markup removed. By contrast a blog that is FTPed to an alternate host has a <content> tag with markup present. UPDATE: seems to be back to normal - I know it just looks like I might have flipped the 'summary' switch but...

Feeds of Feeds

There doesn't seem to be many rules/conventions governing what you get when you request a feed representing the on-going results from a search. For example at the botton of a Google Blog Search page you can get an RSS or Atom feed with 10 or 100 items. With Technorati you can add a search to your Watchlists and then get an RSS feed. In Bloglines you can subscribe to a search but you never see the feed itself - or at least if you try to edit the subscription you don't. However I am more interested in the feed itself. They all work fine if all you want to do is subscribe to them in a news reader, but what if you want to process the results further? Such feeds supply the necessary <link rel="alternate" type="type="text/html" ... but couldn't they also supply the associated third-party feed itself? Consider the simple example of looking for 'geotagged' blog entries: its easy enough to get a set of references to blog entries which match t