Saturday, May 16, 2009

A sweet discovery, returning or passing pointers

IN C/c++ programs a common thing is to return values by address

double * return_doubleval()
{
double *a = new double
*a = 1;
return a;
}

as opposed to

void return_double(double *a)
{
if (a==NULL)
a = new double ;
*a =1;
return;
}

because the second idea does not work
we need to send in double **a and do (*a) = new double if we want the pointer address also to be updated in the calling function

BTW, the idea of allocating memory and having the caller delete the memory is a bad bad idea
nevertheless, sometimes we dont know the length of the array we want to allocate and only the function that is called knows an answer to it, and this becomes mandatory

Wednesday, May 13, 2009

Model update based RF for text retrieval

Document Reformulation has been gaining much attention among researchers, especially because query reformulation does not help the underlying model learn and retrain from the user feedback.

One interesting paper that has caught my attention is the one by Rajapakse and Denham (Text Retrieval with more realistic concept matching and reinforcement learning). Not only did I find the work very refreshing and have amazing parallel to the standard mainstream topic modeling based information retrieval approaches, I found the following key contributions made by the paper:

1) If a model has to learn from user feedback and return with improved results it has to be real time. A real time document model has to be distributed and not centralized. Thus feedback regarding a document vis-a-vis a query affects that particular document model leaving others unaffected. This is a major change from the standardized topic modeling based approaches where the topic distributions are central to the entire corpus.
2) The model rewards and penalizes unlike only rewarding in most of the standard mainstream query reformulation techniques. Hence learning new things and ability to forget is exhibited by the model
3) The RF phase is also treated as training

Like most research several issues are left to be esoteric and they are listed as follows
1) No explanation is given on how exactly the concepts are extracted for each document and how the concept lattice is built
2) there are several parameters to tune, viz. weights to object-attribute pairs, learning rate of the weight parameter
3)The testing strategy is an independent one

Testing strategy
1) Divide the query set into two sets train and test
2) The training set is again divided into 4 subsets
3) Start training the document with 20 iterations of RF for each query in each subset.

The only big issue remains, on how the initial concept lattices are built and how crucial is their quality on performance in later stages of retrieval process.