November 23rd, 2006
Why Scale Matters in Tagging Systems
Why and how scale in social tagging systems can leverage the Wisdom of Crowds (much like Google does with links) to make the incorrect tags less influential than certain Aristotelians would have us believe.
Ok, so I got into hot water for my Thoughts on the Impending Death of Information Architecture post…
But I’m completely fascinated by this subject. In that piece I referenced a work by Elaine Petersen entitled Beneath the Metadata: Some Philosophical Problems with Folksonomy. Elaine eloquently argues that since tagging systems can contain incorrect information (non-Aristotelian, she calls it brilliantly), they will eventually fail to serve our needs. She says:
“Although folksonomy advocates are beginning to correct some linguistic and cultural variations when applying tags, inconsistencies within the folksonomic classification scheme will always persist. There are no right or wrong classification terms in a folksonomic world, and the system can break down when applied to databases of journal articles or dissertations.”
This argument, as I’ve mentioned before, is one about relativism. Is it OK to have systems which contain misinformation, even if it happens to be the way someone thinks and tags?
Let me put it more bluntly: Do people have the right to think how they want?
If we re-ask the question in this way, the answer is clear. (And no, I don’t think it’s ridiculous to equate this argument with allowing people to think what they want. At some level it *is* about that, in a weird science-fiction way)
So, of course we have the right to think what we want, at least most people think so. (insert analogous religious argument here about actions and beliefs)
Anyway, if you’ve read Bokardo for any period of time (go here to win prizes) you know that I believe our systems should model our behaviors and thoughts, not the other way around. We shouldn’t have to map what’s in our head to some other idea set every time we use software if we don’t have to.
If I want to tag the New York Yankees as “the best team money can buy”, and someone else thinks that’s just plain wrong, then tough for them. That’s how I want to tag it, that’s how I want to re-find it, and that’s how I think about the Bronx Bombers (or was it the Yankees?). In folksonomies the view of the system is *my* view…warts and all.
Moreover, other folks in Red Sox Nation might tag it similarly, thus propagating the potential falsity in the system for Yankees fans to find (except, of course, the Yankees are the best team money can buy). Note, though, that their version of the system will have their version of tags for the Yankees…we still have a problem, according to Elaine…there is information in the system that doesn’t agree with other information in the system.
Geez…sometimes I don’t even agree with myself.
Scale is the Great Equalizer
But the thing is, and this is where Elaine underestimates folksonomies, scale matters. Even if a few people tag things incorrectly, most people won’t. This doesn’t have to do with the fact that most people are Good, it’s just that if we ask enough people the same question or have them observe the same phenomenon, where their experiences overlap will tend to be the reality of the situation.
At this point, we could go many ways with this topic. One way would be to tie in James Surowiecki’s brilliant book The Wisdom of Crowds, which makes a lengthy dissertation on the subject of aggregating individual viewpoints. If, under certain conditions, we aggregate the individual decisions of many people, the result tends to be equal to or better than an expert’s view. Here’s the Wikipedia entry for the Wisdom of Crowds, which gives a quick but good overview, and is no doubt a great irony in and of itself…(the crowd writing about the Wisdom of…itself…in a relativistic system with no authoritative voice except the accumulated voice of all its members)
Another way we could go with this topic is where Dan Stewart went. Dan, commenting on Dave Weinberger’s lengthy reply to Elaine, points to another, relatively important document Bokardoans should all be familiar with by now (I’ve talked about it enough):
“Elaine makes the argument that if an item on the web is tagged with words that do not describe it, then the system breaks down. In The Anatomy of a Large-Scale Hypertextual Web Search Engine by Sergey Brin and Lawrence Page the authors state, “Also, it is interesting to note that metadata efforts have largely failed with web search engines, because any text on the page which is not directly represented to the user is abused to manipulate search engines. There are even numerous companies which specialize in manipulating search engines for profit.”
So Dan ties in the Google PageRank algorithm to the folksonomy argument. Cool! However, at this point you may be thinking that Dan is a proponent of tagging systems. Alas, no, he is not. He goes on to say:
“Metadata is data about data, and tagging a page on the internet is essentially adding metadata. For the same reason that search engines no longer rely on metadata, social bookmarking could be abused and eventually become worthless.”
I think Dan has this second bit all wrong because he fails to distinguish where the metadata comes from and who is using it. If it comes from the expert, it’s expert-supplied metadata. This is exactly the type of metadata that Brin and Page were talking about, and in particular the <meta> tags of HTML. Those are defined by the author of the page (the expert) in the head portion of the HTML document.
As the Brin/Page quote points out, meta tags weren’t shown to the user of the page. This meant that document authors weren’t writing them for their users and thus had little incentive to make them accurate. Instead, their primary use was to tell user agents (search engines) what the page is about.
Because there is no personal use, meta tags get abused. If it doesn’t make a difference to the author what the meta tags say, then they’ll manipulate them away from what best describes their page to what best gets search engines to return them high in the results. This is the inflection point: at this point they become, essentially, SPAM.
However, tags are not defined by authors. They’re supplied by users. They’re user-supplied metadata. As a result, they’re used by the very people who created them. And, it is in that person’s best interest to keep them useful. Even though they can be incorrect like SPAM, they are not like SPAM in that someone actually has incentive to keep them valuable for human use.
BTW: this all seems to follow The Del.icio.us Lesson.
Further, what is the best example of user-supplied metadata on the Web? Links, of course. Links are essentially references to other documents. Links are created by authors but differ from meta tags because people actually use the links, following them and learning from them. Whereas manipulated meta tags didn’t hurt the user experience, manipulated links seriously kills it. If you are putting up bad links on your pages, people respond negatively…and swiftly. They just won’t come back. It’s definitely in the author’s interest to keep links valuable to users.
…and what does Google use to model how we value content? Links!
And we know why we can aggregate links in this way…because we have a large enough set of them to weed out the inconsistencies even as they continue to exist. We’ve got scale, baby!
This isn’t to say that SPAM isn’t a huge problem…it is. I certainly don’t envy the SPAM harvesters at Google. But if we look at all the people making links…the vast majority are creating valuable, non-spammy ones.
So where Dan sees a divergence and a route away from tagging, I see a convergence and a route toward tagging. Not only are tags user-supplied, personal-use metadata (and that will be their primary reason for being), but they also scale really well on a social level because they’re like links…if we have enough of them the incorrect ones (created by spammers and non-spammers alike) actually get lost in the Crowd…
And what does that leave?
Wisdom, I hope.
Links to this Post
Comments
1. Gene 9:14am, Thu 23rd, 2006
That was a good recovery from your “Impending death” post.
I wanted to add a quote from Karl Fast that relates to your comments here abour scale (and about IA a couple of days ago):.
You can read the whole quote here. One of the benefits of tagging, I think, is that it can be applied to some of those middle problems (i.e. it doesn’t require Google-like scale to be effective).
2. referez.com 2:31pm, Thu 23rd, 2006
We invite you to beta test of Referez.com.
Referez.com is the new recommending system of blog posts by referring statistics.
Referez.com allows people to share the real interest over the Internet in real-time.
If you insert the Reffering-code in your website or blog,
Every time people visit your website,
We collect each URLs using a simple system we call the “Reffering-Code-Systemâ€.
It can make an automatic recommending system not digging and clicking but referring.
So, People find top visited articles in real-time.
We wish you to join our test.
Thanks!
http://www.referez.com/?local=us
3. Pramit Singh 12:40am, Fri 24th, 2006
I like the Referez idea.
4. Jodi 11:11pm, Fri 24th, 2006
I too do not think there’s an incorrect tag assignment. The information system as a whole cannot be considered correct or incorrect.
But for the social design to have value (at the macro level) then scale will certainly expose the ‘data’ more accurately – to permit clusters to form (Yankee fans, Red Sox fans, not Yankee fans, etc) – clusters that cannot be seen when the universe is too small.
Although on the whole I concur, the google example doesn’t hold water for me. The meta data and links are constantly assaulted (for gain), and the quality of the information is in jeopardy; PayPerPost and fake computer generated blogs are the latest (eg. views.azigodi.info).
But then again perhaps these poor info examples will cluster (as their quantities increase) and can be tagged a such.
there and back again.
thanx for the nice read.
Jodi
The nNovation Group
5. Deanna 9:36am, Sat 25th, 2006
Excellent, excellent post. I’m always so relieved when I read things that echo my own thoughts, which often remain at the incoherent, “Yeah, but… but… but…” stage.
6. Tom 5:52pm, Sat 25th, 2006
Regarding post style: I can’t read large blocks of italic text. How about adding a simple grey line on the left (and keep it indented) instead?
Also, with such long posts, subheadings are good: “This is where I talk about X”, “This is where Y comes up”, and finally “This is why Y is way better than X”. Just my 2 cents.
7. Mike Harper 5:13pm, Sun 26th, 2006
Great article that articulates what I felt was wrong with the Elaine Petersen article. Personally I think the Internet is big enough to accommodate truths, alternative truths, half-truths and lies at the content and metadata levels, and people will still find what they are looking for. As you say, scale makes all the difference by adding weight to those things that are most agreed upon.
8. Kurt 3:45am, Tue 12th, 2006
Thanks for all the great info. You seem to really know alot about tagging. I’m looking forward to reading more.
9. Atlanta Real Estate 10:31pm, Wed 2nd, 2008
I also concur, the google example doesn’t hold water for me. The meta data and links are constantly assaulted (for gain), and the quality of the information is in jeopardy
10. Voyance 2:21am, Tue 30th, 2008
It’s very interesting, i’m agree
11. Atlanta Houses 5:38am, Sat 10th, 2009
thanks alot the infrmation you gave about tagging has really helped me.