Monday, January 23, 2012

Quick Reference: Student t's paired t-test for comparing retrieval algorithms

Given a set of queries Q of size n, two retrieval algorithms need to compared
calculate AP (Average precision) for each query using both the algorithms, call one AP1 and the second one AP2

To calculate if they are statistically significant, here are the steps

a) calculate D = AP1-AP2
b) t-score = mean(D)/(standard_deviation(D)/sqrt(n))
c) Feed it into a p-value calculator with df(degrees of freedom) = n-1 Click here

Pretty neat and quick