Via the Google blog: Google is trying out and releasing to the world (via Creative Commons license) a new XML format for site maps. This new format is an XML representation of your web site that Search Spiders would read upon entering your site, much like they read the robots.txt file now.
Several things are very interesting about this:
- This is the #1 Search Engine company doing this.
- This is Web 2.0 in all its semantic markup glory.
- This could be the beginning of a web site discovery format with which we could build simple tools that search for relevant content (without having to go through a search engine!). Obviously, though, Google sees an opportunity to leverage this format to improve their own Search.
I’ve had conversations in the past in which I’ve discussed this idea with others. In most of the conversations, though, we focused on an “IA” XML format and not a sitemap, but the basic idea is the same. It is all about exposing structure, not through explicit linking but designer purpose, and all the pitfalls that that entails.
Going further, this is almost like a feed for the permanent content on your site, that grows with time and doesn’t lop off the oldest entries like RSS does. Instead, it keeps the old entries and even gives valuable metadata about them (like when they were last changed). This is interesting to me because we would then have a feed into the latest changes to your site, not just the posts/articles/content published lately (as in RSS). That would let us discover, say, when someone updates an old web page, a feature we don’t don’t currently have.
These ponderings aside, Google freely admits that this may bomb or it may win the day. What do you think?
Update: Dave Winer has an interesting take on the role of RSS in this Gillmor Daily podcast. He says that RSS is very good for news: information that we want to keep up to date with and that changes quickly. He says that most information is not news, and doesn’t change very much. As a result, static information needs a format that is different than RSS, and Dave says that format is OPML, which he created. So, the issue becomes what formats (RSS, OPML, and Google’s SiteMap Protocol) are doing what work, and is the work they’re doing necessary?