Mining the Two Types of User-Supplied Content

by Joshua Porter  |   January 31st, 2006  |  shortlink: http://bokardo.com/p/327

Sitting in my chiropractor’s office the other day I read a fascinating article in the offline version of Businessweek. Here’s the online version: Math will Rock Your World.

In addition to finding out that using a laptop 12-14 hours a day can affect my spine, I also found out about the amazing rise of math in business, from analyzing clickstreams to tracking blog conversations. It seems Google and Yahoo already have next year’s math grads lined up for jobs. They simply cannot get enough brain power to do what they want to do.

What do they want to do? Mine data, of course. From the mountains of search queries in Google to the ever-increasing purchase histories at Amazon, we have more data than we know what to do with. Even at the relatively tiny UIE we have more than we can handle. I simply cannot fathom what millions of users could do to a database.

Here’s an interesting bit about Yahoo:

‘At the Sunnyvale (Calif.) campus of Yahoo, chief researcher Prabhakar Raghavan heads a team of 100 mathematicians and computer scientists. Scribbling on a white board covered with equations, Raghavan describes Yahoo’s immense pool of data, featuring the online activity of 200 million registered customers, as Yahoo’s most precious resource. There is a whole world of uninvented businesses, he believes. They’ll come into being as Yahoo discovers new ways to satisfy the urges, curiosities, and desires of this customer base. The hints of these future businesses float in the oceans of Yahoo’s data. Raghavan’s mandate is to sift through that data and form new connections among consumers, e-marketers, and advertisers. Better algorithms, he says, “are critical to survival.”‘

In general, there are two kinds of user-supplied content which can be mined:

  1. User-added content:
    Intentional content. That content which users input themselves. This includes blog posts, comments, reviews, ratings, links, RSS subscriptions, podcasts, and video.
  2. User-generated content:
    Unintentional content. That content which accrues as a byproduct of the actions of users. This includes clickstreams, purchase history, RSS read stats, search history, and other artifacts of behavior. User-generated content serves as evidence that a user passed that way, like footprints.

This distinction may or may not be important. I don’t know. But we are seeing a tremendous amount of work in the area of aggregating these types of content in an effort to build recommendation systems out of them.

In general, though, I think we’re learning some basic rules of thumb. Recommendation systems seem to work better if they are built out of user’s direct preferences, like ratings or reviews. If you try to build them out of say, clickstreams, you won’t get the intentional feedback that you need. For example, Amazon gives recommendations built out of searches on their web site, even if it is something that you’ve only looked at as a gift for someone else. I recently did a search on knitting for my wife and now I’m stuck with knitting books for a while. However, their recommendations built on top of my wish list are much more valuable to me, and I actually find them useful.

Going back to the article, I liked this quote:

“People are complicated,…If you have a system, they figure out how to game it. Machines never do.”

Make them Care! - Struggling to communicate the value of your product or service? I'm writing a new book that shows you how to make people care about your product or service by clearly communicating the most important bits. For designers and marketers creating product web sites. Find out more.

Links to this Post

Comments

1.  Jonathan 10:44pm, Tue 31st, 2006

Your comment about being stuck with knitting books is an interesting addition to a post I read a year back called My TiVo thinks I’m gay. TiVo has somewhat of a disquieting ability to discern your viewing preferences from the things you ask it to record. TiVo uses those perceived preferences to thoughtfully record other stuff it ‘thinks’ you might enjoy. The article talks about how one owner’s TiVo started recording shows that clearly indicated it thought that he was gay. To compensate, the owner started recording programs about war and other ‘manly’ subjects. His TiVo then began overcompensating, thinking his tastes were more in line with those of a WWII Nazi official. In the parlance of show biz; Wackiness ensued!

2.  Josh 6:41am, Wed 1st, 2006

That’s a great pointer! And a great headline, too! Thanks, Jonathan. I love the guy’s reaction and strategy. Brilliant.

3.  Nir Ben-Dor 5:14pm, Wed 1st, 2006

Wow, this post really hit the spot for me. I just wrote an article in Linkadelic Magazine earlier today about the problems of the web. Here is a small part:

What does it mean for the future?

Wherever there is something wrong and an ongoing change process, good will eventually take place. I think that the web is still very much in its infancy, and that there is going to be a gradual change which will make the web a better place for its users. This may be likened to the collapse of a bad regime. Users will jump on new services which put the user in the center and empower the user by taking the preferences of the individual as the main consideration. Not the makers of the web sphere, not the “democratic” groups representing the users, but the user as an individual entity.

the rest is at There’s something very wrong with today’s internet

4.  Jared Spool 8:45am, Fri 3rd, 2006

Isn’t #2 (User-generated content) what Gillmor keeps insisting are “Gestures“?

5.  evano 4:52pm, Fri 3rd, 2006

I’m not sure about Gillmor’s Gestures, but #2 also reminds me of “Attention” or the artifacts of Attention. Or am I just missing the point? (BTW — your comments preview function is something I hope to see replicated everywhere there’s a comment box!)

6.  Josh 11:24pm, Fri 3rd, 2006

Jared, I’m not sure where of if Gillmor’s line of gestures would be drawn…interesting question.

7.  Dewayne Mikkelson 10:41am, Mon 6th, 2006

This is a great quote and it sounds like the service MINT has hit the same problem.
“People are complicated,…If you have a system, they figure out how to game it. Machines never do.”

8.  Pari Sportifs 9:28am, Fri 5th, 2007

That bit about TIVO was also on King of Queens, where Spence`s TIVO recorded figure skating and all kinds of musicals for him, very funny…

9.  Realtor 5:49pm, Fri 21st, 2007

Read, intresting!