## Sunday, February 28, 2010

### Combining discriminative and Generative training

This blog is a continuation of the previous post Targeting discrimination as opposed to similarity. In model learning, given labeled data there is always a choice to make, discriminative or generative. There are pros and cons of either methods and usually a two-stage approach is used to combine the power of the two. I found a very recent paper, Multi-Conditional Learning published in AAAI 2006, on combining the power of the two at the time of learning. The likelihood function is now composed of the generative likelihood and the discriminative likelihood. This is possible only because we have labeled data. The discriminative-generative pair used for illustration is primarily the CRF-MRF. The MRF is used to capture topic co-occurrences. The word co-occurrences are captured by the latent topic model. However, by using a MCL regularizer (discriminative component) in the learning algorithm, the model being learnt is actually a combination of MRF and CRF, thereby combining the power of the two.The authors claim huge performance gains in document retrieval on the 20newsgroups dataset.

## Saturday, February 27, 2010

### Review: IR application into software libraries: Concept graphs

I bumped upon this paper recently Source Code Retrieval using Conceptual Similarity published in 2004, RAIO. The main idea of the paper is to introduce concept graphs to be able to represent software code at two levels of granularity, structured and unstructured. Concept graphs enforce relationships between the variable names (unstructured information) based on the constructs of the programming language. Once the concept graphs are created, graph matching algorithms are used to find similar code segments. The baseline used is the Okapi IRS on the unstructured information in the code. They also present a combined approach where the scores obtained from Okapi and the Concept graphs are used to rank documents and queries (both are software segments). A corpus of size 2392 was used with a subset of 25 documents separated out as queries. In order to prove that the structure of the code plays an important role in matching (of code segments) these 25 documents were modified by performing transformation on the variable names etc. The results were reported using two measures: MMR and P@n n being the order of 5 (for this corpus)

## Thursday, February 18, 2010

## Tuesday, February 9, 2010

### Targetting discrimination as opposed to similarity

Clustering mechanism with data that is expected to follow continuous distributions use an objective function to measure the optimum clustering of the documents. Such an objective function is usually a ratio of two quantities : inter-cluster distance to intra-cluster distance. Such a ratio measures the power of the clustering algorithm to bring together similar data samples and separate apart the dissimilar data samples. Fortunately for Gaussian models, this works out into a nice clean equation. No wonder people LOVE the Gaussian model.

When we are dealing with discrete data, the most likely distribution is the multinomial distribution. To add more control over the behavior of the model one could use dirichlet priors. The document clustering approaches in the discrete domain merely use the EM or the gibbs sampling approach to fit data optimally to the discrete model (especially latent mixture model). There is no underlying objective function that says, this discrete (latent) model should also be able to optimally sepearte out one topic from the other. Thus one could end up with 2 topics generating the exact same set of words.

This is definitely something that is lacking in the EM algorithms for fitting discrete data....

Maybe there has been work done on this already. But I am yet to discover it

When we are dealing with discrete data, the most likely distribution is the multinomial distribution. To add more control over the behavior of the model one could use dirichlet priors. The document clustering approaches in the discrete domain merely use the EM or the gibbs sampling approach to fit data optimally to the discrete model (especially latent mixture model). There is no underlying objective function that says, this discrete (latent) model should also be able to optimally sepearte out one topic from the other. Thus one could end up with 2 topics generating the exact same set of words.

This is definitely something that is lacking in the EM algorithms for fitting discrete data....

Maybe there has been work done on this already. But I am yet to discover it

Subscribe to:
Posts (Atom)