I am lost here every time I come here. Its a maze and I cannot find a way out. Even if I do make a way out, I am unclear as to what happened back there.
I am now making a step by step plan on how I can tackle this gigantic problem I am facing.
What am I trying to do?
My main goal is to try and do distributed topic modeling using R on Amazon EMR.
The steps I need to take now to solve this problem
1) Install hadoop and run a single node hadoop cluster and basic mapper reducer scripts on it
2) Run R on hadoop using hive and try to do the same via R
3) Run distributed tm on R
4) Run Mahout on single node hadoop
5) Using hive try to convert data types between R and mahout
Do all of the above on the amazon emr cluster using its ruby client.
Loads of painful nights ahead, but hopefully rewarding too