A Short Introduction to Microformats: a Stepping Stone on the Way to Semantic Markup, or a Distraction from It?

Up until recently I had been struggling with understanding microformats, those mysterious formats built in XHTML that several folks have been talking about passionately: promising everything from better search engine visibility to better structured code to a realization of true semantic markup. The reason for my struggles was that there were few actual examples of […]

Up until recently I had been struggling with understanding microformats, those mysterious formats built in XHTML that several folks have been talking about passionately: promising everything from better search engine visibility to better structured code to a realization of true semantic markup. The reason for my struggles was that there were few actual examples of their utility, hCard, hCalendar, and Bud Gibson’s xFolk being interesting exceptions. The recent release of microformats.org drew a lot of interest, and since lately I’ve been taking a longer look at them, I decided to write a short introduction to the topic from my developer point of view, coming to the conclusion that microformats don’t offer enough incentives to jump on the bandwagon quite yet.

To begin, what would this new site, a self-proclaimed “anything and everything” microformats resource, say about them? A little surprisingly, nothing all that new description-wise (we’ve had this and this for a while now), but very nicely laid out and all in one place, which is useful for folks trying to learn about them. (I did find it interesting that the developers have migrated the discussion away from the corporate Technorati site). Anyway, here is the microformats “elevator pitch” from the home page:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

Formats for Content Markup

So, they’re formats for marking up content. Here’s an example of a calendar entry using the microformat hCalendar:

<span class="vevent">
 <a class="url" href="http://www.web2con.com/">
  <span class="summary">Web 2.0 Conference</span>: 
  <abbr class="dtstart" title="20051005">October 5</abbr>-
  <abbr class="dtend" title="20051007">7</abbr>,
 at the <span class="location">Argent Hotel, San Francisco, CA</span>
 </a>
</span>

The idea is that another application can read this hCalendar snippet within your XHTML page and instantly recognize it as such. So, Search Engines (an application, don’t ya know) can now come to your site, find all the embedded hCalendar snippets, and then know more about what your page’s topic is. Your search rankings would presumably rise as a result.

So this is cool, right? We’ve got a more semantic way to express our data, so that in addition to the people who actually read our content it is more meaningful to the machines that suck our sites down as code. The only problem is that to succeed the machines that suck down this code must know what an hCalendar entry is, and then must know what to do with it once it finds one.

Microformats Have to Be Supported to Work

Therein lies the rub with microformats. If they aren’t known about by the applications that are reading them, then they’re essentially a waste of our development time (to put it bluntly). In other words, only when application developers agree to “support” the microformats will they actually lead to any of the benefits that have been promised.

So, what about support? Are there widely supported microformats? I suppose that would depend on who you asked, and if they were part of the group that was supporting them. Technorati, for example, whose employees are behind a lot of the development of microformats, supports them rather well. If you use the rel-tag microformat, for example, Technorati will recognize it when it is notified of your post, and include your page on their corresponding tag page. Here’s an example of using the rel-tag microformat to create a “microformat” tag (in effect I’m tagging this page as having to do with microformats):

The code is this:

<a href="http://technorati.com/tag/microformats" rel="tag">microformats</a>

Besides Technorati, there are few services that actually support microformats. They are all potential at the moment, but with this new site they just might gain momentum.

Trend Toward Markup that is More-Semantic

Microformats get really interesting, however, when we take a wider look at them. For example, notice that in using microformats developers are starting to write semantic markup that actually distinguishes content in a meaningful way, providing much more meaning than the 100 or so XHTML tags can on their own. You’ll notice in the above hCalendar example that the <span> tag is used in conjuction with the class attributes to provide semantics. Not pretty, but at least one rung higher on the semantic ladder than plain old XHTML.

The level of semantic integration is also interesting. Instead of going “all the way” and writing new XML formats for these types of data, the microformat folks want us to write extended XHTML formats. Still XML, but not a separate format free of the <span> tag affliction.

To this we must ask: WHY?

Why Microformats and not a New XML Format like RSS?

Could it be for developer adoption reasons? If we ask HTML coders to simply alter their code rather slightly, adding some attributes, would they be more willing to change their markup than if we asked them to write in a completely new format? This makes some sense, assuming that folks are willing to adapt their coding styles to anything other than what they’re currently doing. Unfortunately, the instances where coding conventions are widely adopted are few and far between. (we’re still vacillating with web standards in the first place). We’ve all got our own way of writing tags, and changing that is a big challenge.

Could it be for application adoption? If we create microformats that merely extend XHTML, will that mean that they offer greater utility because we’ve already got browsers that understand that markup? I’m skeptical about that, because right now browsers can understand microformats only as XHTML, with no further meaning. In other words, as of right now browsers don’t do anything special with microformats other than to display them as XHTML. They won’t do anything special until they are programmed to explicitly. This is the same with any new XML format, whether they extend XHTML or not.

We Need a Showcase Application of Microformats

My main concern is that microformats aren’t obviously useful yet, despite such excellent resources as Eric Meyer’s Potential of Microformats presentation. Specifically, nobody has come up with an application for them (that I’ve seen) that makes the benefits of them plain as day. What I’m looking for is an analogue to Housingmaps.com, which shows in an instant why anyone would want to hack Google Maps. There is no such example yet, that has me wanting to go back and rewrite part of my content as well as write it that way in the future.

Interesting notions exist, however. Back in March Adrian Holovaty suggested a rel-proof microformat, which would be applied to facts within news articles to let other applications know about other supporting facts. Interesting, until you think about how political parties might use it as they attempt to control the news. Alas, no implementation of that one has surfaced.

What I’m really excited about, though, are the brand new XML formats that don’t extend XHTML, but live on their own. Several examples come to mind: RSS, OPML, and Google Sitemaps. All three of these formats require their own, separate XML file to work. But all are receiving as much and more attention than microformats. RSS, in particular, has gone to the moon. So at first glance it doesn’t seem that being a separate format is a detriment to adoption: my guess is that usefulness is the real barrier.

To re-address these formats in terms of the goals of microformats…it seems that the three specialized formats that I mentioned are fulfilling those goals nicely. They are a way of thinking about data. They’re easy to use by humans and machines. They solve a specific problem. They’re simple. They take advantage of what people are already doing (tagging). They reuse building blocks from widely-adopted standards (XML). They enable and encourage decentralized development, content, and services.

Not on the Bandwagon Quite Yet

So, given that we have very successful XML formats like RSS, aren’t microformats a step backward? Even though they are a stepping stone on the way to semantic purity, microformats still don’t offer a good clean cut from the muddle of XHTML tags like the other formats do. They’re playing nicely, which is, well, nice, but they’re still obscured by the semantic limitations of XHTML. That’s not to say that they won’t work. I’m just saying that they seem rather messy compared with their independent counterparts: thinking about my content as part non-semantic XHTML and part somewhat-semantic microformats is confusing. I think this is is a challenge that will need to be faced if microformats are to be widely adopted. As a result, I’m still not on the microformats bandwagon. However, given that tomorrow morning something in the world of formats will be different, I’m still open to discussion.

Update: David Weinberger interviews Tantek Çelik and Rohit Khare about microformats. They offer some good details about the issues, and address concerns that came up in the article and comments.

In particular, Tantek talks about keeping all this simple: that microformats are written in XHTML because that’s what developers are used to. This is right, of course, but we also see that developers are now “used to” RSS in very nearly the same sense. RSS is easy, because the tags don’t change…and there are no superfluous span or div tags cluttering it all up. That’s the beauty of XML…I don’t know why we would want to get away from that. That said, I do believe that Tantek and the folks working on microformats are pushing this with enough zeal that they may be able to build up momentum to have it work, but this developer would much rather write simple, semantic tags like I do in RSS than hack an already over-extended XHTML.

Update: Bud Gibson, in addition to adding comments below, has written a post highlighting further his vision of why microformats are important. I’m very happy that he’s taking the time to explain all of this. He says that microformats fill dual roles, one as XHTML and one as XML. His clincher is this: “they can fulfill the data transmission role without requiring any sort of translation prior to presentation to the user”. This is absolutely correct, but I don’t see this as a big problem: millions of blogs are being simultaneously transmitted in XHTML and XML daily. That is one of the reasons why we’re using XML, in my view, because it allows translations between formats with ease.

I think I see now another reason why I just don’t like microformats. I don’t buy them as playing the XML role, as in Bud’s view. The main tag used is <span>, and is stuck in an XHTML file with a whole bunch of non-semantic code. How does it fulfill the XML role, in the same ways that RSS does? I love that with RSS we have a separate URL and every tag is semantic, accurately describing the content it contains. With microformats we have a bunch of code in between used to make it XHTML. I want simple, efficient code used for a single purpose only. I can imagine a separate URL on bokardo, something like bokardo.com/calendar for example, that is simply my personal calendar published for all to see (not in the hcalendar microformat, but perhaps xcalendar?). If my calendar is wrapped up in an XHTML file, how will people find it and parse it out? What if there is no calendar information, or my most recent post doesn’t contain calendar information? How will we be able to generate metadata about microformats if we don’t have static feed URLs that we can rely on? I think I find separate URLs comforting…

Join the microformats discussion | microformats.org | Bokardo Interface

Published: June 24th, 2005