My Research Diaries

Tuesday, July 3, 2012

PdfLatex wont render images in the pdf output

If you are writing a conference paper and using latex to create your paper, then this post might be useful to you.

Typically each document has a header of the following sort.

\documentclass[10pt,a4paper,draft]{article}

If you have chosen the "draft" option as in "\documentclass[10pt,a4paper,draft]{article}", then a simple code as given below, will not work.

\begin{figure}[!t]
\centering
\includegraphics[width=3in]{fetch}
\caption{Pipeline: Fetch Instruction}
\label{fig:fetch}
\end{figure}

The figure wont render in the output pdf if the "draft" option is selected in the \document command.

Make sure you say its final, or just leave that option alone, like this: \documentclass[10pt,a4paper]{article}.

Sunday, June 24, 2012

Measuring Query Difficulty

The trail of this reading began with Sonia Haiduc's paper "On the Effect of the Query in IR-based Concept Location". A lot of the paper is a review of the recent research focus in IR "predicting query difficulty". I was surprised to see that this paper was accepted. There were no results, not even an explanation of how evaluating query difficulty is different for concept location compared to traditional IR. Nevertheless, there was a good amount of literature review that guided me to several other interesting papers. There are "predictors" as well as "estimators" of query quality. Predictors usually examine the content of the query alone and generate a score that represents the query difficulty. The estimators use the ranked list to generate a score. The evaluation technique for all the algorithms in the umbrella of query difficulty prediction goes as follows:
The predictor or the estimator gives out score for each query that is expected to have correlation with the Average Precision for that query. The query quality score is correlated with the AP over all the queries in the entire test collection and in order to be doubly sure, a statistical significance test is conducted to illustrate that the two distributions originated from the same original system (demonstrate high p-value).

In order to get a better idea of how the specificities of the predictor or estimator algorithm goes I referred to the highly cited paper " Predicting Query Performance, SIGIR 2002". This paper is output of the Bruce Croft group and lives up to its reputation. The paper discuses an clarity based score, which measures the separateness of the query from the collection model. The following formula measures the divergence = \sum_w p(w|q) log_2(p(w|q)/p(w|coll)). The numerator measures the entropy of the query while the denominator the divergence from the collection model. the p(w|q) is the query model. The experimentation is extensive spanning multiple test collections and setting up quantitative proof for all the thresholds they use in their experiments.

"Query Difficulty, Robustness, and Selective Application of Query Expansion" published in Advances in Information Retrieval in 2004, was one of the first to come up with a "use" of measuring query difficulty. They employed a number of retrieval models and test collections for their study. Not only did they correlate query difficulty with the P@10, AP and so on, they used the "relevance judgements" also to understand its correlation with query difficulty.

There has been some work in the area of "traceability link recovery" for dealing with difficult queries. The method employed is simple. For queries that show poor performance, one uses an web based search to mine documents that could be used for query reformulation. I wonder if this comes under the umbrella of multi-modal retrieval.

I think it is worthwhile to spend some using these metrics on my own datasets and isolating "difficult" queries ahead of time to aid analysis.

Wednesday, April 18, 2012

Handling duplicates in ranking documents for Retrieval Algorithms

I never paid attention to the ranking step of a retrieval algorithm. All I had to do was sort the scores. These scores are computed by using a similarity (or divergence) function to compare a query with all the documents in the collection. What I had overlooked was the possibility of scores being equal. How do you rank documents with similar scores? Does it have a great impact on retrieval performance? Turns out, the answer to the second question is YES. To give a simple example, consider 6 documents with the following scores,

doc no   1      2      3       4    5        6
scores 0.87 0.97 0.97 0.85 0.49 0.87

Simple ranking would give out the following output

doc no   1      2      3    4    5   6
rank       3     1      2    5    6   4

If doc 3 is the relevant document, it is still considered to be at rank 2 as opposed to rank 1(where it actually should be since it scores as much as doc2). The Precision goes from 1.00 to 0.5 just because of this sort of ranking.

This wikipedia article summarizes on ways to handle this. It seems like a trivial concept but it has tremendously improved bug localization results on my iBUGS dataset.

For 291 bugs, I compute the retrieval performance by using the "standard competition" ranking (1 2 2 4) and the retrieval performance shot up from 0.14 to 0.21 with a p-value of 0. Thus the improvement is statistically significant.

Monday, February 27, 2012

Quick Reference: cvs2cl

CVS is a primitive and unfriendly tool for version control (at least in my experience).

Gladly there is a tool called cvs2cl that makes it easier to manage the log files.

Here are the key concepts

1)Install cvs and cvs2cl (both available through sudo apt-get install commands)
2)Checkout the version you want into a directory
3) cd
4) in this directory execute cvs2cl command

cvs2cl -T gives out the changes in chronologically reverse order with locations where tags first appeared. But a word of caution, the tag here means, where a tag was first encountered as it appears in the logfile.

I tried
cvs2cl -T --chrono . But it gives out inaccurate or even incorrect information here. Since, all it is doing is reversing the log file, and not really checking out information in reverse order

The option --xml outputs in xml format.

The overall command to track changes in trunk only with tag information when the tag first occurred (reverse chronological order) use this

cvs2cl --xml --no-wrap --summary --T -S -r --follow-only trunk

Each file in cvs is in a particular state, that is encoded in tag in the xml file. Meaning of file state is "Exp" most of the time and stands for experimental. Since we are tracking most of the files will be exp state, but in a release branch you might find "stab" for stable version of the file.

Hope this helps

Monday, January 23, 2012

Quick Reference: Student t's paired t-test for comparing retrieval algorithms

Given a set of queries Q of size n, two retrieval algorithms need to compared
calculate AP (Average precision) for each query using both the algorithms, call one AP1 and the second one AP2

To calculate if they are statistically significant, here are the steps

a) calculate D = AP1-AP2
b) t-score = mean(D)/(standard_deviation(D)/sqrt(n))
c) Feed it into a p-value calculator with df(degrees of freedom) = n-1 Click here

Pretty neat and quick

Friday, October 21, 2011

Experience with Matlab's TMG

I have been playing with Matlab's TMG and here are my 2 cents from experience with the software

Pros: Easy to install, Easy to index a corpus and queries.
Cons: Proprietary code. Impossible to contribute modify or fix bugs in the code. Is not maintained well. Does not have any discussion forum where users and developers can communicate. Last citation was 2009

I was unable to even run some basic retrieval tasks due to bug in the code. norm2 does not work for sparse arrays and one needs to replace norm2 with normset, and I was unable to make the change because the code is protected and proprietary.

Thursday, July 14, 2011

Counter-intuitive results

I have been researching how restricting the vocabulary of a dataset affects retrieval results and I am seeing some really counter-intuitive results

This the format of the experiment:

I create a vocabulary using some percentage of the documents not all. The percentages I used was (10,20,30,40,50,60,70,80,90) and then performed retrieval and calculated MAP

There were two key observations that I found interesting.

1) For software corpora, I found that even with 10% of documents, 45% of the original vocabulary was built. This means that software vocabulary tends to have a more uniform distribution of the terms/identifiers/variable-names across source files
2) With the Vector Space Model with tf-idf weighting, I found that with 10-30% of documents used to create the vocabulary (or dictionary) I got better performance as compared to original vocabulary. The only explanation I have for this result is that, some words are more important for retrieval than others. I do need to mention that the original vocabulary itself is a pruned version of the original raw vocabulary obtained by removing sparse terms in the document-term Matrix.

Shivani