Bottom up / top down tagging
Internet started with Yahoo! creating a directory of all things good. Directories of course provide a way of finding things that you do not know about. It is discovery versus search.
Historically search proved to be more important, at least in the early stage of the web. Now, in these early days of Web 2.0 directories are coming back with the vengeance. Their new re-incarnation called tagging, or social bookmarking.
The idea is to have the end user tag the content of the page using his/her own keywords. Taking this to the next step would be to merge tags from different users to create a new directory or a hierarchy.
It seems to me that there are two things worth pointing out. First of, such approach could never converge to a stable taxonomy of things. Without the stable taxonomy we will still end up with a tower of Babel web content. Secondly, we need to step back and ask what is our objective here in the first place.
The problem is there are actually two objectives: organization of the web and personalization of the web. As far as the organization goes, tagging is not that revolutionary. After all, we already have keywords on good behaved sites and on rss feeds. We also have directories - the evolved versions of Yahoo!
So the point is personalization. That is, personal exploration of the web by an individual with particular context. A context is essentially semantics. Context represents things as the user currently understands and relates to them. Different people certainly perceive things differently in life and on the web. The tagging is then needed to improve the end user experience. To organize the end-user browsing, and not to restructure the web.
So how this tagging should be done then? There is no need to re-invent the wheel, as we have some of examples of this sort of thing. Take human language. Of course there are standards, rules and grammar. Of course there is a dictionary. But the reality is that any language is a dynamic entity that evolves continuously in space and time.
The time aspect of the language evolution is obvious as we feel that the language that we spoke 10 years ago is different now. The language is spacialy different since there are so many dialects, take even US as an example. And of course every one of us understands and interprets language differently.
Now here is the critical point. We are taught the language. We do not re-invent the language, we learn it from other people and from the books. It would be absurd if everyone had to re-invent their own language.
Take this point and map it onto tagging. The folksonomy need not to arise from scratch, it needs to be build up and customized. Here is how the tagging should work:
* When the user decides to tag the page, he/she should already see the tags that the content publisher had provided.
* Browser should perform statistical analysis and propose tags based on commonly occurring words or, if it has access to millions of documents, statistically improbable phrases.
* If this page is present in a major directory, the tags and the hierarchy should be shown as well.
* And also, say 5 or 10 most popular tags that other people tagged this page with.
This modification will achieve two important goals.
The user is still free to add / remove and change tags, so personalization is not hurt. More importantly, this will lead to the emergence of the stable set of tags that actually will be close in universality as the language. This will also lead to greater interoperability of web sites and web services.
- Alex
Historically search proved to be more important, at least in the early stage of the web. Now, in these early days of Web 2.0 directories are coming back with the vengeance. Their new re-incarnation called tagging, or social bookmarking.
The idea is to have the end user tag the content of the page using his/her own keywords. Taking this to the next step would be to merge tags from different users to create a new directory or a hierarchy.
It seems to me that there are two things worth pointing out. First of, such approach could never converge to a stable taxonomy of things. Without the stable taxonomy we will still end up with a tower of Babel web content. Secondly, we need to step back and ask what is our objective here in the first place.
The problem is there are actually two objectives: organization of the web and personalization of the web. As far as the organization goes, tagging is not that revolutionary. After all, we already have keywords on good behaved sites and on rss feeds. We also have directories - the evolved versions of Yahoo!
So the point is personalization. That is, personal exploration of the web by an individual with particular context. A context is essentially semantics. Context represents things as the user currently understands and relates to them. Different people certainly perceive things differently in life and on the web. The tagging is then needed to improve the end user experience. To organize the end-user browsing, and not to restructure the web.
So how this tagging should be done then? There is no need to re-invent the wheel, as we have some of examples of this sort of thing. Take human language. Of course there are standards, rules and grammar. Of course there is a dictionary. But the reality is that any language is a dynamic entity that evolves continuously in space and time.
The time aspect of the language evolution is obvious as we feel that the language that we spoke 10 years ago is different now. The language is spacialy different since there are so many dialects, take even US as an example. And of course every one of us understands and interprets language differently.
Now here is the critical point. We are taught the language. We do not re-invent the language, we learn it from other people and from the books. It would be absurd if everyone had to re-invent their own language.
Take this point and map it onto tagging. The folksonomy need not to arise from scratch, it needs to be build up and customized. Here is how the tagging should work:
* When the user decides to tag the page, he/she should already see the tags that the content publisher had provided.
* Browser should perform statistical analysis and propose tags based on commonly occurring words or, if it has access to millions of documents, statistically improbable phrases.
* If this page is present in a major directory, the tags and the hierarchy should be shown as well.
* And also, say 5 or 10 most popular tags that other people tagged this page with.
This modification will achieve two important goals.
The user is still free to add / remove and change tags, so personalization is not hurt. More importantly, this will lead to the emergence of the stable set of tags that actually will be close in universality as the language. This will also lead to greater interoperability of web sites and web services.
- Alex



