Saturday, February 27, 2010

Review: IR application into software libraries: Concept graphs

I bumped upon this paper recently Source Code Retrieval using Conceptual Similarity published in 2004, RAIO. The main idea of the paper is to introduce concept graphs to be able to represent software code at two levels of granularity, structured and unstructured. Concept graphs enforce relationships between the variable names (unstructured information) based on the constructs of the programming language. Once the concept graphs are created, graph matching algorithms are used to find similar code segments. The baseline used is the Okapi IRS on the unstructured information in the code. They also present a combined approach where the scores obtained from Okapi and the Concept graphs are used to rank documents and queries (both are software segments). A corpus of size 2392 was used with a subset of 25 documents separated out as queries. In order to prove that the structure of the code plays an important role in matching (of code segments) these 25 documents were modified by performing transformation on the variable names etc. The results were reported using two measures: MMR and P@n n being the order of 5 (for this corpus)

No comments:

Post a Comment