Ruby Gems with AWS Elastic MapReduce Streaming
by srikanth on 17/11/2011

I wanted to write a Hadoop streaming program on Amazon’s EMR framework using Ruby. And ofcourse, I needed some libraries. For some reason, the aws-sdk library isn’t installed by default on the mappers and reducers on EMR. So you’ll need to use a bootstrap script to get this to work

#!/bin/bash -x sudo apt-get update
sudo apt-get -y -V install gcc ruby-dev libxml2 libxml2-dev make libxslt1-dev ri
cd rubygems-1.8.11
sudo ruby setup.rb
sudo gem1.8 install aws-sdk -y --no-rdoc

Yup, there’s no gcc by default on those mappers and reducers. So figuring out the errors from gem trying to install stuff was crazy. Hopefully this helps someone out there.

  • sheki

    EMR = Elastic Map Reduce.
    You want to write Map Reduce code in ruby. 
    Looks like neat stuff is happening.

  • Abhishek Kona

    also Modern linux distro’s do not come with a C-compiler. Minimalist is the thing.

  • satelin2002

    That helped me :)