My Research Diaries: Vocabulary handling for software reuse

Wednesday, March 17, 2010

Vocabulary handling for software reuse

Vocabulary mismatch is even more of a pronounced problem when dealing with corpus made of source code. There are several tools used in the past and here are some of them that I will need to research and learn about

1) Soundex: Phonetic variations of the words are captured
2) Lexical affinity: A sliding window (grep-like) tool that calculates how close are two identifier names
3) Separate words using "_": counter_activity -> counter activity
4) Separate words using case sensitivity: LoadBuffer -> Load buffer

One could additionally use WORDNET and spelling correction or missing character handling to improve upon the existing techniques

My Research Diaries

Wednesday, March 17, 2010

Vocabulary handling for software reuse

No comments:

Post a Comment