Controlled Vocabularies and Folksonomies: Why Change is Good.

In my last two posts, I’ve been hinting at why I think folksonomies like work: they are harnessing user behavior, rather than predicting or dictating it. The key to this is the ability of the folksonomy to change over time, an ability that controlled vocabularies often lack. In this post I’m going to explain further what I mean by that.

In my last two posts, I’ve been hinting at why I think folksonomies like work: they are harnessing user behavior, rather than predicting or dictating it. The key to this is the ability of the folksonomy to change over time, an ability that controlled vocabularies often lack. In this post I’m going to explain further what I mean by that.

Controlled Vocabularies

In the vast majority of web site designs, a primary task of the Information Architect is to create a controlled vocabulary, an “organized list of words and phrases … that are used to initially tag content”[1]. The aim of building a controlled vocabulary is to include all the words, synonyms, and related terms that users will need to find what they are looking for on the site.

Hierarchical Structures

On any site larger than a few dozen or so pages, controlled vocabularies can grow really big, really fast. So after (or while) the Information Architects create their controlled vocabulary, they organize the content on the site in some way so users aren’t overwhelmed by the sheer amount it. Having it all organized in a flat space, with all the items equal to each other, with no grouping or other organization, would be very confusing to users who are trying to sort through it all to find something specific. Most of the time Information Architects combat this problem by organizing the content hierarchically. Global navigation, a mainstay on 99% of web sites, is almost always the top-level of that hierarchy.

Both the controlled vocabulary and the hierarchical structure of the site are decided before launch. In other words, the designers and IAs are predicting what will work best for users.

The Problem of Single Placement

The problem with this setup is that items are usually placed within a single category in the hierarchy, making the name of that category crucial to a user’s success. If a user doesn’t identify that category as the place to find what they’re looking for, then they have a much higher risk of failing.

One way to combat this problem is to allow items to be placed in more than one category. (a change in the rules of the taxonomy). This can be helpful to users who identify items in different ways. If they don’t recognize one category as housing the item they want, they may still be able to find it under another category. This is more effective than a taxonomy where items can be in only one category because it increases the probability that users will find what they’re looking for.

The Problem of Adding New Content

However, another problem crops up when new content is added to the site. The controlled vocabulary, milled with precision before the launch of the project, is now under stress because it may not have a category that the content fits in. If it doesn’t, then a new category must be added, or the content must be shoe-horned into an existing category. In practice, though, the costs of changing the site design are high (the design was constructed carefully around the original controlled vocabulary) and the content is often placed, sometimes uncomfortably, into an existing category.

Unanticipated Change

The biggest monkey wrench of all, though, is when the change is out of the hands of the Information Architects. Sometimes this change is subtle, like a user needing further explanation about content that already exists on the site. Other changes are more drastic, like a completely new set of users who are unlike the current ones, necessitating renovations to entire sections of the site. And the changes are sometimes language-based, like when mp3 players suddenly became “iPods”, because of Apple’s paradigm changing device.

In other words, the top-down method of creating information architecture doesn’t scale well. When new content is added, new users visit, old users change, or the industry changes, controlled vocabularies have a hard time keeping up. The reason why can be generalized to the following: human usage is always changing, and that change is rarely predictable within a vocabulary (and hierarchy) that is controlled.

This shouldn’t come as a surprise. After all, we’re always changing what we do, but the speed with which we’re changing is increasing. It is not so much that our base needs are any different than they used to be: we’re still focused on hearing the latest news, being entertained by movies and sports, buying stuff, providing for our family, and watching the weather. But the way we do these things is changing: our actual behavior and language is slightly different than it was even yesterday. We’re doing more with less and talking about it in different ways, and we can see the effects in how people use our web sites.

Controlled Vocabularies Resist Change

Web sites built from the top down, using a controlled vocabulary, tend to resist this kind of change. And the bigger the site is, the more it resists change because the costs to change it quickly soar. Notice how little some of the sites you visit actually change: if they didn’t build it right the first time (as if that’s even possible) then they’ve got a huge monkey on their back until they can find the time and resources to redesign.

Folksonomies Embrace Change

Which brings me back to folksonomies. Folksonomies never stop changing, and I think this is their biggest asset. When users add a link to, their act of tagging a link is directly affecting the content other users will be presented with. For example, when I add a post to my bookmark collection, it will appear on the frontpage for a short time as well as the page for each tag I used to describe the article with. This effectively harnesses my behavior and builds navigation for other people, and becomes very powerful when aggregated with the behaviors of other users. Over time, the popular bookmarks emerge, creating ever more relevant navigation for other users. The effect is that users can serendipitously discover new, relevant content they otherwise would have missed.

Users Should Originate Change

So the major difference here is where the change originates. In a controlled vocabulary it comes from those in control: the Information Architects. In a folksonomy like, change comes from those in control, too: the users. Since the users are the target audience, it makes sense to leave them in control of their own information. Because the way a folksonomy changes is built in, and is dictated by the actual users of it, it becomes a mirror of what users are currently finding relevant. Whenever someone enters a bookmark there is a little change in the relevancy of the topic they tag it with, and nobody has to go out of their way to produce a change in the architecture. It just happens.

So, I see a huge benefit to folksonomies because they change along with their users. If users are interested in a certain topic (like folkosonomies), then that topic will bubble to the top for a while. When users aren’t talking about it as much, it will regress to the mean. Because tagging behavior is recorded over time, relevancy for each tag increases and the most relevant results can always be found if you look within a specific tag. So, like with Google, it becomes harder and harder for individuals to game the system. is Unique

As optimistic as I am about them, I’m not trying to claim folksonomies are a panacea. I know that we only have a few examples of them right now, so it’s relatively difficult to draw useful generalizations. However, I believe that is unique. It is fast outgrowing its utility as a simple bookmarking tool and has become a public record of attention and interest of content on the Web. Even so, its usage is still only for first adopters: and many folks feel that folksonomies themselves won’t scale. However, I’m confident they will, seeing how much larger they already are than almost all other sites on the Web. But, like most things, the real test will be when we start hearing about it from our parents.

[1] I took my definition of controlled vocabulary from A Taxonomy Primer, by Amy Warner

Technorati tags:

Published: January 28th, 2005