$item = 'textAreaBuffer';
if($item =~ m/.*?[a-z].*?[A-Z]/){
$count=0;
@rem=();
while($item =~ m/([a-z][A-Z])/g){
$rem[$count]=pos($item);$count=$count+1;
}
for($count=0;$count<@rem;$count++){
if($count==0){
print FILE2 substr($item,0,$rem[$count]-1);
print FILE2 " "; }
else{
print FILE2 substr($item,$rem[$count-1]-1,$rem[$count]-$rem[$count-1]);
print FILE2 " ";
}
}
print FILE2 substr($item,$rem[@rem-1]-1);
print FILE2 " ";
}
Thursday, March 18, 2010
Perl: Detecting two words concatenated into one sepearted by uppercase character
If you are given a word 'textArea' and want to be able to separate them into 'text Area', then here is the perl code to help you achieve it
Wednesday, March 17, 2010
Vocabulary handling for software reuse
Vocabulary mismatch is even more of a pronounced problem when dealing with corpus made of source code. There are several tools used in the past and here are some of them that I will need to research and learn about
1) Soundex: Phonetic variations of the words are captured
2) Lexical affinity: A sliding window (grep-like) tool that calculates how close are two identifier names
3) Separate words using "_": counter_activity -> counter activity
4) Separate words using case sensitivity: LoadBuffer -> Load buffer
One could additionally use WORDNET and spelling correction or missing character handling to improve upon the existing techniques
1) Soundex: Phonetic variations of the words are captured
2) Lexical affinity: A sliding window (grep-like) tool that calculates how close are two identifier names
3) Separate words using "_": counter_activity -> counter activity
4) Separate words using case sensitivity: LoadBuffer -> Load buffer
One could additionally use WORDNET and spelling correction or missing character handling to improve upon the existing techniques
Subscribe to:
Posts (Atom)