A Short Introduction to Microformats: a Stepping Stone on the Way to Semantic Markup, or a Distraction from It?

by Joshua Porter  |   9 Comments  |  shortlink: http://bokardo.com/p/135

Up until recently I had been struggling with understanding microformats, those mysterious formats built in XHTML that several folks have been talking about passionately: promising everything from better search engine visibility to better structured code to a realization of true semantic markup. The reason for my struggles was that there were few actual examples of their utility, hCard, hCalendar, and Bud Gibson’s xFolk being interesting exceptions. The recent release of microformats.org drew a lot of interest, and since lately I’ve been taking a longer look at them, I decided to write a short introduction to the topic from my developer point of view, coming to the conclusion that microformats don’t offer enough incentives to jump on the bandwagon quite yet.

To begin, what would this new site, a self-proclaimed “anything and everything” microformats resource, say about them? A little surprisingly, nothing all that new description-wise (we’ve had this and this for a while now), but very nicely laid out and all in one place, which is useful for folks trying to learn about them. (I did find it interesting that the developers have migrated the discussion away from the corporate Technorati site). Anyway, here is the microformats “elevator pitch” from the home page:

Designed for humans first and machines second, microformats are a set of simple, open data formats built upon existing and widely adopted standards.

Formats for Content Markup

So, they’re formats for marking up content. Here’s an example of a calendar entry using the microformat hCalendar:

<span class="vevent">
 <a class="url" href="http://www.web2con.com/">
  <span class="summary">Web 2.0 Conference</span>:
  <abbr class="dtstart" title="20051005">October 5</abbr>-
  <abbr class="dtend" title="20051007">7</abbr>,
 at the <span class="location">Argent Hotel, San Francisco, CA</span>
 </a>
</span>

The idea is that another application can read this hCalendar snippet within your XHTML page and instantly recognize it as such. So, Search Engines (an application, don’t ya know) can now come to your site, find all the embedded hCalendar snippets, and then know more about what your page’s topic is. Your search rankings would presumably rise as a result.

So this is cool, right? We’ve got a more semantic way to express our data, so that in addition to the people who actually read our content it is more meaningful to the machines that suck our sites down as code. The only problem is that to succeed the machines that suck down this code must know what an hCalendar entry is, and then must know what to do with it once it finds one.

Microformats Have to Be Supported to Work

Therein lies the rub with microformats. If they aren’t known about by the applications that are reading them, then they’re essentially a waste of our development time (to put it bluntly). In other words, only when application developers agree to “support” the microformats will they actually lead to any of the benefits that have been promised.

So, what about support? Are there widely supported microformats? I suppose that would depend on who you asked, and if they were part of the group that was supporting them. Technorati, for example, whose employees are behind a lot of the development of microformats, supports them rather well. If you use the rel-tag microformat, for example, Technorati will recognize it when it is notified of your post, and include your page on their corresponding tag page. Here’s an example of using the rel-tag microformat to create a “microformat” tag (in effect I’m tagging this page as having to do with microformats):

The code is this:

<a href="http://technorati.com/tag/microformats" rel="tag">microformats</a>

Besides Technorati, there are few services that actually support microformats. They are all potential at the moment, but with this new site they just might gain momentum.

Trend Toward Markup that is More-Semantic

Microformats get really interesting, however, when we take a wider look at them. For example, notice that in using microformats developers are starting to write semantic markup that actually distinguishes content in a meaningful way, providing much more meaning than the 100 or so XHTML tags can on their own. You’ll notice in the above hCalendar example that the <span> tag is used in conjuction with the class attributes to provide semantics. Not pretty, but at least one rung higher on the semantic ladder than plain old XHTML.

The level of semantic integration is also interesting. Instead of going “all the way” and writing new XML formats for these types of data, the microformat folks want us to write extended XHTML formats. Still XML, but not a separate format free of the <span> tag affliction.

To this we must ask: WHY?

Why Microformats and not a New XML Format like RSS?

Could it be for developer adoption reasons? If we ask HTML coders to simply alter their code rather slightly, adding some attributes, would they be more willing to change their markup than if we asked them to write in a completely new format? This makes some sense, assuming that folks are willing to adapt their coding styles to anything other than what they’re currently doing. Unfortunately, the instances where coding conventions are widely adopted are few and far between. (we’re still vacillating with web standards in the first place). We’ve all got our own way of writing tags, and changing that is a big challenge.

Could it be for application adoption? If we create microformats that merely extend XHTML, will that mean that they offer greater utility because we’ve already got browsers that understand that markup? I’m skeptical about that, because right now browsers can understand microformats only as XHTML, with no further meaning. In other words, as of right now browsers don’t do anything special with microformats other than to display them as XHTML. They won’t do anything special until they are programmed to explicitly. This is the same with any new XML format, whether they extend XHTML or not.

We Need a Showcase Application of Microformats

My main concern is that microformats aren’t obviously useful yet, despite such excellent resources as Eric Meyer’s Potential of Microformats presentation. Specifically, nobody has come up with an application for them (that I’ve seen) that makes the benefits of them plain as day. What I’m looking for is an analogue to Housingmaps.com, which shows in an instant why anyone would want to hack Google Maps. There is no such example yet, that has me wanting to go back and rewrite part of my content as well as write it that way in the future.

Interesting notions exist, however. Back in March Adrian Holovaty suggested a rel-proof microformat, which would be applied to facts within news articles to let other applications know about other supporting facts. Interesting, until you think about how political parties might use it as they attempt to control the news. Alas, no implementation of that one has surfaced.

What I’m really excited about, though, are the brand new XML formats that don’t extend XHTML, but live on their own. Several examples come to mind: RSS, OPML, and Google Sitemaps. All three of these formats require their own, separate XML file to work. But all are receiving as much and more attention than microformats. RSS, in particular, has gone to the moon. So at first glance it doesn’t seem that being a separate format is a detriment to adoption: my guess is that usefulness is the real barrier.

To re-address these formats in terms of the goals of microformats…it seems that the three specialized formats that I mentioned are fulfilling those goals nicely. They are a way of thinking about data. They’re easy to use by humans and machines. They solve a specific problem. They’re simple. They take advantage of what people are already doing (tagging). They reuse building blocks from widely-adopted standards (XML). They enable and encourage decentralized development, content, and services.

Not on the Bandwagon Quite Yet

So, given that we have very successful XML formats like RSS, aren’t microformats a step backward? Even though they are a stepping stone on the way to semantic purity, microformats still don’t offer a good clean cut from the muddle of XHTML tags like the other formats do. They’re playing nicely, which is, well, nice, but they’re still obscured by the semantic limitations of XHTML. That’s not to say that they won’t work. I’m just saying that they seem rather messy compared with their independent counterparts: thinking about my content as part non-semantic XHTML and part somewhat-semantic microformats is confusing. I think this is is a challenge that will need to be faced if microformats are to be widely adopted. As a result, I’m still not on the microformats bandwagon. However, given that tomorrow morning something in the world of formats will be different, I’m still open to discussion.

Update: David Weinberger interviews Tantek Çelik and Rohit Khare about microformats. They offer some good details about the issues, and address concerns that came up in the article and comments.

In particular, Tantek talks about keeping all this simple: that microformats are written in XHTML because that’s what developers are used to. This is right, of course, but we also see that developers are now “used to” RSS in very nearly the same sense. RSS is easy, because the tags don’t change…and there are no superfluous span or div tags cluttering it all up. That’s the beauty of XML…I don’t know why we would want to get away from that. That said, I do believe that Tantek and the folks working on microformats are pushing this with enough zeal that they may be able to build up momentum to have it work, but this developer would much rather write simple, semantic tags like I do in RSS than hack an already over-extended XHTML.

Update: Bud Gibson, in addition to adding comments below, has written a post highlighting further his vision of why microformats are important. I’m very happy that he’s taking the time to explain all of this. He says that microformats fill dual roles, one as XHTML and one as XML. His clincher is this: “they can fulfill the data transmission role without requiring any sort of translation prior to presentation to the user”. This is absolutely correct, but I don’t see this as a big problem: millions of blogs are being simultaneously transmitted in XHTML and XML daily. That is one of the reasons why we’re using XML, in my view, because it allows translations between formats with ease.

I think I see now another reason why I just don’t like microformats. I don’t buy them as playing the XML role, as in Bud’s view. The main tag used is <span>, and is stuck in an XHTML file with a whole bunch of non-semantic code. How does it fulfill the XML role, in the same ways that RSS does? I love that with RSS we have a separate URL and every tag is semantic, accurately describing the content it contains. With microformats we have a bunch of code in between used to make it XHTML. I want simple, efficient code used for a single purpose only. I can imagine a separate URL on bokardo, something like bokardo.com/calendar for example, that is simply my personal calendar published for all to see (not in the hcalendar microformat, but perhaps xcalendar?). If my calendar is wrapped up in an XHTML file, how will people find it and parse it out? What if there is no calendar information, or my most recent post doesn’t contain calendar information? How will we be able to generate metadata about microformats if we don’t have static feed URLs that we can rely on? I think I find separate URLs comforting…

Join the microformats discussion | microformats.org | Bokardo Interface

Check out my latest project: Make them Care!, a book on designing great sign-up experiences. Get reminded when it's published.

Links to this Post

Comments

1.  cori schlegel 10:40am, Fri 24th, 2005

As usual, Joshua, great thinking and writing.

Being a Firefox devotee, for me microfotrmats would work very well if there was a Mozilla extension or Greasemonkey script that allowed me to configure how I wanted hCals and the like displayed, the advantage being that with the right extension I could tailor those displays to my own specifications instead of relying on what the browser vendors think would be a good way for me to see those microcontent offerings. That, of course, leaves the other 85% of browser users without a great solution (or really any at all).

The other (obvious?) advantage for designers is the ability to style that content in a consistent way no matter where the content comes from. That opens the door for some really great re-aggregation schemes, like a Technorati-like site for reviews or events. Perhaps these are already out there as we speak.

2.  Bud Gibson 11:13am, Fri 24th, 2005

Josh:

Thanks for the link. This is a good, thoughtful piece, and I will likely respond more fully in a post of my own. There’s a point that strikes me that I would, however, just like to dash off now. It regards what I think is a general misperception, even in the microformats community, about the role of microformats with respect to RSS.

RSS (and atom another xml syndication format) is really just a notification format with room for certain types of predetermined metadata like links, categories, authors, and most germanely the description. The description element contains “the message” and is generally left to be constructed as the author (or the author’s software) sees fit.

One might argue, however, that the relatively unstructured description or message element is the most important element. The reader has signed up for the notifications in order to receive the message, not just metadata about the message.

Recognizing this key point, many lately have come to structuring the description element in their syndication feeds with html markup because they want their message to visually convey meaning. Witness your very own RSS feed. An xhtml microformat just carries this further by adding semantics to the message element without requiring the invention of a whole new markup than what the author is already using. That way, machines will be able to process the message more effectively and produce aggregations like technorati or rubhub or EVDB.

In spite of the examples I just gave, you are right to point out that there need to be more services that take advantage of message semantics for microformats to become immediately compelling. Based on private discussions I have had, expect to see more of those.

To come back to where I started, it is incorrect, in my view, to really consider microformats as an alternative to RSS. Microformats are actually a complement that fulfills a very important role in making the messages conveyed more easily processed by machines. As a bridge between man and machine for the most important part of the data payload, microformats are Web 2.0.

3.  Andy Hume 9:37am, Mon 27th, 2005

Cori:

You don’t need a mozilla plug-in or Greasemonkey script to style hcards as you would like. Try a user style sheet.

The promise of microformats, as Eric Meyer points out, is a potential promise. “They’re not supported yet” is an interesting argument, and I see what you mean, but of course they are supported. They are simply XHTML with enhanced semantics. They are supported inherently within CSS, Javascript and any other technology that walks alongside XHTML.

You can go away and write a little Javascript app to utilise hcard today. Not many people are doing it yet, but that doesn’t mean they’re not supported.

4.  Josh 1:01pm, Tue 28th, 2005

Andy, I guess we have different notions about what “support” means. I’m talking about support in which someone gets added benefit from the developer using a specific microformat. As browsers exist right now, nobody is benefitting like this, except for a few cases (admittedly growing, but very slowly).

You seem to be saying that microformats are being supported because applications treat them as they do any other XHTML…if that is support then what is it called when applications do something useful with them? Fully-supported?

My view, upon re-reading my own article and others, is that microformats are an unnecessary intermediary step. We’ve got applications will full XML support, why not just write a specialized format that doesn’t become part of XHTML? Just like RSS, OPML, and Google sitemaps? The whole power of XML is specialized uses (via specialized tags), so I don’t really buy the “solve a particular problem” reasoning that microformats push. That’s exactly what these other three are examples of, specialized formats for specialized problems.

5.  Marnen Laibow-Koser 8:35pm, Thu 19th, 2007

This is a thought-provoking article, and since I’m just getting into microformats, I’m glad I read it. Like you, I find the use of CSS classes to convey semantic information a little ugly — although we must remember that that’s one of the things CSS is good at.

However, I think you miss an important point when you dismiss microformats as unnecessary. They are usable in many contexts where generation of a separate format is difficult or impossible. For example, I could write something like:

<p class="vevent">Hi everyone. I'm having a <span class="summary">party</span> on <abbr class="dtstart" title="20070704">Independence Day</abbr>!</p>

…and it will at once be usable both as straight HTML and as structured hCalendar data. Because it’s straight HTML, I can put it anywhere — including in an RSS feed description, or in my LiveJournal — and it is immediately legible. With an format such as xCal or iCal, I would have to create a separate file in a completely different format, which I may not have the time, inclination, or ability to do. Besides, why generate the same data twice? (Yes, I know, a developer would translate into the required format on the fly, but microformats are largely aimed at non-developers, and the whole idea is to make them easy to use. Besides, even though I may know how to generate an iCal file, how do I make one automatically available, say, as part of my LiveJournal, where I don’t have access to install a server-side script? hCal obviates this problem.)

Sure, a programmatic XML solution will be more comprehensive and potentially less ugly. But it’s also out of the realm of more users. I like the microformat idea because it opens up the semantic possibilities of the Web to non-developers.

6.  Sportwetten 6:45am, Mon 7th, 2007

Couldn`t agree more with what Jake wrote.