Thursday, July 26, 2012

The hashing trick for a dynamically changing vocabulary

I recently learned about the "hashing" trick in Machine Learning. It is typically used to handle dynamically changing vocabulary in large-scale machine learning algorithms. With the hashing trick, we always have a "fixed" vocabulary. Only thing is what this vocabulary is, we dont know. The words are hashed into a M- length hash table, and no matter how many new words come in the hash contains all of these tables. Its amazing that Yahoo research is the company that came up with the idea and implemented a widely used open source software called the vowpal wabbit. Researchers who propose new online algorithms with a dynamically changing feature set implement their algorithm in vowpal Wabbit. I think there is also an effort to implement vowpal wabbit on top of hadoop.

It is indeed sad that a company like Yahoo which has been an innovator of so many cool ideas is under the weather. Hopefully Marissa Mayer will turn things around.

For more information on Vowpal Wabbit visit https://github.com/JohnLangford/vowpal_wabbit/wiki/Examples

No comments:

Post a Comment