Getting the Most Out of Your Hardware for your MapReduce Job
When trying to squeeze out the most of your hardware you will want to either hit one of three bottlenecks: CPU, disk, or memory. I’m currently configuring a cluster with Fusion-IO drives that can give up to 800MB/s for reading and writing. I want to run my jobs and either hit 200k reads and writes per second, or max out my CPUs. I get this information from running "iostat -x 5", which gives statistics every five seconds.
I've been under-utilizing the Fusion drives, getting around 30k writes per second. Yet my CPU utilization is less than 50% (each node has 16 cores). I went ahead and increased the number of mappers per machine incrementally up to 24 mappers. The problem I see is a slew of errors and warnings coming out of Hadoop.
“Unable to create a new native thread”
“Could not obtain block”
“Task process exit with non-zero status”
This is because I've run out of memory (each machine has 24 gigs of ram) and my system does not have swap. TaskTrackers slowly start getting black listed and the job completely fails. What sucks is that the best configuration is very job dependent. Some jobs are CPU bound while others are IO bound. In my case all jobs are memory bound.
With 20 or so mappers per node iostat now shows that I'm doing over 60k read and write requests on the drives and a CPU utilization of 75 percent.
Child JVM Heap Memory
I was also running into some issues with child JVM's not having enough heap space when running the wordcount benchmark. This was solved by setting
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx512M</value>
</property>
in the mapred-site.xml configuration file. 256MB for the sort example looks to be enough though.