Document Reformulation has been gaining much attention among researchers, especially because query reformulation does not help the underlying model learn and retrain from the user feedback.
One interesting paper that has caught my attention is the one by Rajapakse and Denham (Text Retrieval with more realistic concept matching and reinforcement learning). Not only did I find the work very refreshing and have amazing parallel to the standard mainstream topic modeling based information retrieval approaches, I found the following key contributions made by the paper:
1) If a model has to learn from user feedback and return with improved results it has to be real time. A real time document model has to be distributed and not centralized. Thus feedback regarding a document vis-a-vis a query affects that particular document model leaving others unaffected. This is a major change from the standardized topic modeling based approaches where the topic distributions are central to the entire corpus.
2) The model rewards and penalizes unlike only rewarding in most of the standard mainstream query reformulation techniques. Hence learning new things and ability to forget is exhibited by the model
3) The RF phase is also treated as training
Like most research several issues are left to be esoteric and they are listed as follows
1) No explanation is given on how exactly the concepts are extracted for each document and how the concept lattice is built
2) there are several parameters to tune, viz. weights to object-attribute pairs, learning rate of the weight parameter
3)The testing strategy is an independent one
1) Divide the query set into two sets train and test
2) The training set is again divided into 4 subsets
3) Start training the document with 20 iterations of RF for each query in each subset.
The only big issue remains, on how the initial concept lattices are built and how crucial is their quality on performance in later stages of retrieval process.