Wednesday, February 06, 2008

Tagging leading to semantic web

I'm using gnizr at work for researching and collecting documents of interest to a client of ours. Gnizr provides features I find useful for organizing a pile of documents. Tagging documents with terms of interest is much more flexible than strictly partitioning documents into specific buckets, and the clustermap feature makes it possible to explore the soft-association of documents and terms of interest. I also find the geonames machine tag very convenient for recording when documents are associated with places.

What gnizr is not letting me do (yet) is to record more explicit information- who authored a document, when it was published, what institutions a person or a document is affiliated with, and so on. I think this is, in large part, the promise of employing semantic tags. When we tag a bookmark (in my case, a document), we are asserting that (at least to ourselves) the tag word is related to the document. That's about it. When we tag a document with a geoname, we assert something a bit more specific- that the location is related to the document. With this information alone, we can create a semantic graph whose nodes are the tag words, documents and locations, and whose edges are the relations we implicitly create when tagging. We can take this one step farther and imagine that one tag may be related to another if they are both used to tag the same document. The tags are at least related by that common document. The aggregate tag relations semantic graph can be used as the basis of a tag recommendation system (one of our suggested projects I'm interested in).

I think that by using additional machine tags based on FOAF relations or other standard metadata (e.g. Dublin Core) users could encode more explicit knowledge, (Bob is the author of document x), and thus richer semantic graphs. I'm suggesting that a semantic social bookmarking / tagging system could provide an easy and effective user interface for generating semantic graphs. I think the idea requires further elaboration and refinement, but I'm optimistic that a tighter integration of gnizr and additional semantic graph interaction tools could provide a nice path toward a really useful semantic web platform.

Typically by the time I think of an idea it's become passé- this case is probably no different. So I Googled for "social bookmarking semantic graph" and found a noteworthy blog post that explores extracting semantic relations from Del.icio.us tags using tag co-occurrence and frequency. There you go... at least one of the key ideas has already been floated! I'd better get coding!

No comments: