January 30th, 2007
Is there an Example of a Scalable Taxonomy?
Kevin Gamble (via Dave Weinberger):
“Is there any living, breathing example of a taxonomic approach working (scaling) to keep-up with the hyper-efficiency we see in peer-production systems? I’m being quite serious here. Can you point me to a working model?.”
Why is this an important question?
This is an important question because of the widely-held assumption that taxonomies are the right answer for most of our information organization problems.
The thing is, I’m not happy with any taxonomy, really. I can’t think of a single one that works well for me, let alone works perfectly. Even a site with as simple a taxonomy as Apple.com confuses me, with some links on the 2nd level nav (like software and hardware) that are clearly a wider scope than those on the top level. I have to remember that this is the case when I want to find the software page…I have to remember the taxonomy, which to me is a mark of a poor one.
Even the taxonomies I build for myself don’t work all the time, though they work much better than those that others build that I have to use.
A reasonable response might be that taxonomies are the best tool we’ve got. Most of that argument rests on these facts:
- Taxonomies have been around for a long, long time and are the core of several disciplines including library science and are thus trusted by many practitioners as the Right Way to Do Things.
- Taxonomies are easily implemented without the input of users. This is a bad idea, of course, but that’s a big reason why there are so many of them.
- Folksonomies are new and therefore scary. Even the best example of them, Del.icio.us, has only been around for a couple years and only been working at a huge scale for about a year.
- Folksonomies suffer from the Cold Start Problem (CSP). You have to build up tagging datasets over time, so at the beginning there is really no navigation to build on top of them.
Now, I don’t think that it has to be either/or. We don’t have to build either a taxonomy or a folksonomy, necessarily. They might co-exist in some way, as Thomas Vander Wal has argued.
But the question still stands…are there any examples of knock-down, drag-out taxonomies that scale in today’s world and generally work well for those who use them?
UPDATE Donna Maurer at Digital Web has taken me to task for blurring the question, saying I’m asking for a scalable taxonomy while really wanting one that works. She’s absolutely right…I’m assuming that while it scales the taxonomy still has to be useful. Can’t we have both?
Links to this Post
Comments
1. Elroy Jetson 9:36am, Tue 30th, 2007
I think the answer to your question is no. You can not by definition have a taxonomy that scales well. By the very definition a taxonomy is created by making a decision that limits information. In taxonomies you are forced to squeeze one item into one category. If it lives in more than one category then the taxonomy falls apart.
With a folksonomy, on the other hand, one item can be indexed into an unlimited number of categories. The individual is building the vocabulary of classification instead of learning a vocabulary of classification.
The problem with folksonomies today is that everyone is developing their own rules of implementation. A good example is the difference between tagging in del.icio.us and Netscape. In del.icio.us a tag is only on word, divided by spaces. That leaves phrases up to the individual to implement a structure. Some use an underscore or a dash in place of the space. Some people might just mash the words together. On Netscape, a tag is separated by a comma. Indicating implicitly where one tag ends and another begins.
Another thing that is missing is the inherent classification of information. All information on the internet could potentially be divided into some stock tags. Say informational,conversation,photos, and video. This would assist in resolving the Cold Start Problem.
The last issue with tags is there should be an implied ranking. First tag is the most important category on down the list.
A lot of work needs to go into shaping the folksonomy before we reach a point of general acceptance that taxonomies have enjoyed.
2. Jamie Stephens 10:28am, Tue 30th, 2007
I think you pose a very good question. I have been working with some medical-based taxonomies, namely MeSH Subheadings, to classify medical questions from practicing physicians. This taxonomy is never going to be complete or perfect since it always has to allow for changes in the medical field, but I still think it is going to be better than a broad folksonomy model for our purposes – but not necessarily for the reasons you state above.
Taxoniomies do carry a certain amount of authority, and in this field, that is not such a bad thing. We are working with terms that might not be familiar with our average user and therefore might not want them to classify the objects themselves. The “authotitative” taxonomy actually serves as a learning tool in this case. Also, we are tagging medical information that will be used by physicians at the point of care – although I would be interested to see how physicians might gang-tag this data, I wouldn’t want to rely on their knowledge to accurately tag a wide variety of specific subject matter (perhaps this fits into your “scary” category above).
We’re not really afraid of the newness of folksonomies or the Cold Start Problem as much as we are of the need to have a common vocabulary for terms that are implemented (as the above commenter mentions, everyone is developing their own rules of implementation). I think that if we didn’t have a taxonomy as our guide, then even a trained indexer would have difficulty being consistent in coming up with tags to use that could then help audiences retrieve information in a meaningful way.
That being said, we do not impose upon ourselves a single category for each question that we index. MeSH terms cover big picture items such as the type of inquiry (Therapy, Diagnosis, etc.), population (elderly, children, infant, male, female pregnant female, etc.), and very specific terms dealing with diseases. We may index a question with a number of these terms, thus not relegating the object to a solitary slot in the tree.
Our biggest limitation, funny enough, is that the folks who like to index with MeSH terms are the medical librarians and the consumers of the content are physicians – who don’t necessarily like the MeSH terms as much as the librarians. We end up having to take this well-crafted taxonomy and tweaking it to fit the needs of the people who are using it to retrieve information. Kinda funny really.
I think that the most succinct answer to your question is that there is no taxonomy that is going to to scale the way a broad (gang-tagged) folksonomy will. However, in the case of items that are not being gang-tagged, but rather tagged by a few for the consumption of many, then there is good reason to use a taxonomy that is familiar and usable, despite its scalability. We have the benefit of being able to add to an existing taxonomy for our purposes – perhaps the best of both worlds.
3. Ryan Shaw 12:45pm, Tue 30th, 2007
It doesn’t make sense to talk about scale and “hyper-efficiency” without reference to the purpose of the information organization. Taxonomies (with all their problems) are appropriate in cases where you need to co-locate all the documents that match a particular subject. Folksonomies work well on sites like Flickr and del.icio.us because users of these sites don’t want all the photos of fireworks or every web site about CSS–they just want some good ones. A medical researcher, on the other hand, wants to be sure that she has found everything known about a particular disease. The imperfect artificial language of the taxonomy is a tool for ensuring that she can do this.
4. Bill H-D 5:16pm, Tue 30th, 2007
Kingdom, Phylum, Class, Order, Family, Genus Species…
That one seems to have held up for a while.
5. Gene 6:26pm, Tue 30th, 2007
The answer to Kevin’s question is probably “no.” But I’d suggest it’s a qualified “no” because taxonomies aren’t by nature hyper-efficient, hyper-scalable structures.
While folksonomies can scale easily (i.e. handle more resources and tags), they don’t necessarily make it easier to find resources. And they’re unreliable–at best!–when it comes to collection management.
I actually think the question is a bit daft. Here’s an analogy: “Is there any living, breathing example of a car working (scaling) to keep-up with the hyper-efficiency we see in mass transit systems?”
If you understand cars and mass-transit systems, it’s not a question you’d bother to ask.
6. Josh 6:28pm, Tue 30th, 2007
Bill, here’s an interesting take on the animal kingdom taxonomy:
What do terms like phylum, order and family mean?
7. Josh 6:36pm, Tue 30th, 2007
Gene, I’m not sure about transit system analogy, but don’t you think this is an important question?
I mean, as designers we should be careful not to take practices for granted, assuming certain techniques work when we don’t have great examples to rest on. And you know as well as anybody the assumptions made about taxonomies…they are assumed as the way to work for many designers, no matter what size of system they’re working on.
But let’s not get caught up with just bigness. Is there good evidence that suggests that taxonomies work well in certain sized systems and not in others? Perhaps that’s the angle to take here…to make the question more palatable.
btw: in the aftermath of my Death of IA post a lot of questions like this have arisen. We take a lot of practices for granted…and given the state of systems these days (what works and what doesn’t) perhaps we should be rethinking some of them.
8. Alex Iskold 9:26pm, Tue 30th, 2007
a very large number actually. brains and languages for example.
alex
9. Gene 1:52am, Wed 31st, 2007
I’m not sure about transit system analogy
C’mon… that’s a great analogy.
The original question implies (I think) that you get most of the benefits of a taxonomy with a folksonomy, but with a folksonomy you also get this super-scalability. But that’s not true… there are trade-offs that are at least superficially similar to the ones in the cars/transit analogy. E.g. cars are high cost, high precision transportation; transit is low cost, low precision.
as designers we should be careful not to take practices for granted…
Absolutely. But that cuts both ways–we should also be careful not to force folksonomies onto problems that require another solution.
Is there good evidence that suggests that taxonomies work well in certain sized systems and not in others?
I would point to Digg, which has a basic subject-based classification system, and Amazon’s product taxonomy as two examples that work. There are lots of interesting examples where taxonomies are quite critical (the ICD comes to mind) but they’re not very webby.
But I think the more important point, from a web design POV, is to pick the classification systems that best fit the problem. Folksonomy + taxonomy might fit in some situations, faceted taxonomies in another, and pure folksonomies somewhere else.
In other words, it depends.
10. Michael Chui 4:31am, Wed 31st, 2007
I suspect that taxonomies work remarkably well in bureaucratic scenarios, like desk military or governmental or administrative environments, especially as a way of doing role definition. I think they work as taxonomies well; I am by no means saying that they make the functioning of the system better. But they are likely effective classifiers.
Disclaimer: no personal experience.
11. Josh 6:27am, Wed 31st, 2007
Gene, I think we’re on the same page here.
In hindsight I don’t think I should have included folksonomies being new in my list of reasons why taxonomies are so relied upon…I wasn’t trying to set up an either/or. Apparently I’m conditioned to do so, however, because that’s what I did.
While folksonomies are indeed interesting and scalable, the question about taxonomies still interests me, even by itself.
For example, lets imagine that we’re building a product site like Amazon from scratch. How important is the taxonomy? I’ve argued in the past that the taxonomy on Amazon isn’t all that important, because most people simply search. This suggests to me that either the navigation system or the taxonomy underneath it aren’t doing as well as they could. (or, more to the point, they’re doing just fine but taxonomies don’t scale well).
Digg (news) is an interesting example. What’s interesting about news is that it can benefit greatly from very general categories, like Digg and CNN. But once you get past that initial set, I wonder how valuable they are. I think the difference is that we don’t find things in news, we actually browse…so we’re not looking things up by category, we’re simply seeing what’s new there.
And your final question is where this is headed. Pick the best tool for the job. Wouldn’t you admit that most people pick taxonomies right now as a matter of course?
What I’m really interested in is empowering users to organize their own information. Creating refinding systems that allow people to organize content from multiple sites in beneficial ways. And taxonomies, from this point of view, aren’t that dependable.
12. noodlesandbeef 12:43pm, Wed 31st, 2007
The periodic table of the chemical elements scales quite well to new elements.
13. Gene 1:08pm, Wed 31st, 2007
I’ve argued in the past that the taxonomy on Amazon isn’t all that important, because most people simply search.
From what I can tell, Amazon’s search and its product taxonomy are pretty tightly integrated. Whenever you visit a category node, search is immediately limited to that category (e.g. office products.)
Even general search results offer taxonomy-based refinement (e.g. search for Canon).
So it’s not a simple matter of one thing versus another–search vs. taxonomies vs. folksonomies vs. whatever. It’s about how they fit together with the content, user tasks, context of use, etc.
14. Matthew Hodgson 7:23pm, Wed 31st, 2007
I recently gave a presentation on this very issue. All classification schemes have the same problem – they don’t really match the way that people think about information or create it. It’s an artificial construct designed to help people find information that doesn’t take into consideration how in particular the creator of the information thinks about information.
In the end all you reinforce is that a taxonomy is never perfect – my view of the universe doesn’t match your view of the universe.
I don’t think about things in discrete little packets of information. Do you? Does anyone? If we did think in this way then users wouldn’t have so much angst regarding adding metadata to their stuff. If tagging was so good then people wouldn’t be arguing how the tags I use don’t mean the same as the tags you would use to describe something.
Topic maps, IMHO, are probably the only way to recrify the problem.
Topic maps allow people to describe their information in as many different ways as they like, using taxonomies, folk taxonomies or folksonomies, and make relationships between them in the same way our brains relate bits of information together.
Let the different facets provided by the topic maps give users the navigation they need to browse and discover information, and let the technology of topic maps give us the web that joins all of the terms together in their natural relationships.
If Charles Goldfarb called topic maps “the GPS of the information universe”, its about time we swapped our paper-based street directory-like taxonomies for something a little more people friendly.
15. William Sheridan 10:01am, Fri 10th, 2007
What is scalability anyhow? Scalability: the ease with which a system or component can be modified to fit the problem area. OK! Then most likely, ALL designs have scalability limits – that includes designs for information or knowledge systems.
Could a taxonomy be designed that would scale from the infinitely small to the infinitely large? Not according to the vast majority of thinkers or thinking. To my knowledge only one has been designed that would fit that bill – see at www3.sympatico.ca/cypher2/WebMindMapBook.pdf
This has been described as an index of the whole of human knowledge. Nothing else compares with its comprehensiveness, which is why all other taxonomies do NOT scale very much.