Wednesday, March 25, 2009

Content based filtering vs Information Retreival

From the outside, it may seem that CBF and IR do essentially the same thing. Given a set of documents (corpus/database) and the user's request (or profile), suggest(retrieve) some other documents. However there is one fundamental difference. In case of IR applications, the documents in the collection do not change. The query or the user's request however is ad-hoc and completely unpredictable. In case of CBF, the documents change all the time, but the profile or kind of questions asked to the database is more or less stable.

Given this, it would be interesting to ask, what is more relevant as an application to semantic searches in software corpora? a CBF of IR?

For one, the software corpora is ever changing, with people adding new code, deleting or modifying old code. Secondly, the programmer's requests to a software corpora would be inherently come from a limited pool of questions.

Need I say more?

No comments:

Post a Comment