Tuesday, June 28, 2011

Back to Square one

I am lost here every time I come here. Its a maze and I cannot find a way out. Even if I do make a way out, I am unclear as to what happened back there.

I am now making a step by step plan on how I can tackle this gigantic problem I am facing.

What am I trying to do?
My main goal is to try and do distributed topic modeling using R on Amazon EMR.

The steps I need to take now to solve this problem

1) Install hadoop and run a single node hadoop cluster and basic mapper reducer scripts on it
2) Run R on hadoop using hive and try to do the same via R
3) Run distributed tm on R
4) Run Mahout on single node hadoop
5) Using hive try to convert data types between R and mahout

Do all of the above on the amazon emr cluster using its ruby client.

Loads of painful nights ahead, but hopefully rewarding too