Wednesday, March 17, 2010

Vocabulary handling for software reuse

Vocabulary mismatch is even more of a pronounced problem when dealing with corpus made of source code. There are several tools used in the past and here are some of them that I will need to research and learn about

1) Soundex: Phonetic variations of the words are captured
2) Lexical affinity: A sliding window (grep-like) tool that calculates how close are two identifier names
3) Separate words using "_": counter_activity -> counter activity
4) Separate words using case sensitivity: LoadBuffer -> Load buffer

One could additionally use WORDNET and spelling correction or missing character handling to improve upon the existing techniques

No comments:

Post a Comment