Have you ever encountered a situation when you want to use syndication technologies for things other than blogging and news?
I've been working on PolVox for awhile. I'm redoing a massive section of it. Originally it grew out of TREC 2006 when our team competed to do sentiment search in general (Web documents). Now that I look back on it I've noticed that I designed PolVox as a sentiment web crawler, that happened to be crawling blog pages, instead of as a blog search engine.
The problem is that this system is huge. Debugging, even with unit tests, is a nightmare. Factor into the equation that it actually runs in the background as multiple separate applications sharing resources through the file system and things get even worse. What I really want is the ability to see one item run through the entire system, from being discovered while monitoring feeds, to being put into the database for later processing, to being pulled out of the database and processed, to being stored in the lucene index. (And then used by all the helper applications)
Logging to a file, and trying to navigate that file using more and less is a pain. Then it hit me, couldn't I create a feed for debugging and let Rome's syndication technology do the work for me? I could create feeds for what's going into my index, what's coming out of it, query logs...
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment