There are a number of configuration variables for tuning the performance of your MapReduce jobs. This section describes some of the important task-related settings.
TOPICS
Task JVM Memory Settings (AMI 3.0.0)
Avoiding Cluster Slowdowns (AMI 3.0.0)
Task JVM Memory Settings (AMI 3.0.0)
Avoiding Cluster Slowdowns (AMI 3.0.0)
Task JVM Memory Settings (AMI 3.0.0)
Avoiding Cluster Slowdowns (AMI 3.0.0)Task JVM Memory Settings (AMI 3.0.0)
Hadoop 2.2.0 uses two parameters to configure memory for map and reduce: mapreduce.map.java.opts and mapreduce.reduce.java.opts, respectively. These replace the single configuration option from previous Hadoop versions: mapreduce.map.java.opts.
The defaults for these settings per instance type are shown in the following tables.
m1.medium
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx768m |
mapreduce.reduce.java.opts | -Xmx768m |
mapreduce.map.memory.mb | 1024 |
mapreduce.reduce.memory.mb | 1024 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 2048 |
yarn.nodemanager.resource.memory-mb | 2048 |
m1.large
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx864m |
mapreduce.reduce.java.opts | -Xmx1536m |
mapreduce.map.memory.mb | 1024 |
mapreduce.reduce.memory.mb | 2048 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 3072 |
yarn.nodemanager.resource.memory-mb | 5120 |
m1.xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx512m |
mapreduce.reduce.java.opts | -Xmx1536m |
mapreduce.map.memory.mb | 768 |
mapreduce.reduce.memory.mb | 2048 |
yarn.scheduler.minimum-allocation-mb | 256 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 12288 |
m2.xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 7168 |
yarn.nodemanager.resource.memory-mb | 14336 |
m2.2xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 30720 |
m3.xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1515m |
mapreduce.reduce.java.opts | -Xmx1792m |
mapreduce.map.memory.mb | 1904 |
mapreduce.reduce.memory.mb | 2150 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 3788 |
yarn.nodemanager.resource.memory-mb | 5273 |
m3.2xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx2129m |
mapreduce.reduce.java.opts | -Xmx2560m |
mapreduce.map.memory.mb | 2826 |
mapreduce.reduce.memory.mb | 3072 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 5324 |
yarn.nodemanager.resource.memory-mb | 9113 |
m2.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 61440 |
c1.medium
Configuration Option | Default Value |
---|---|
io.sort.mb | 100 |
mapreduce.map.java.opts | -Xmx288m |
mapreduce.reduce.java.opts | -Xmx288m |
mapreduce.map.memory.mb | 512 |
mapreduce.reduce.memory.mb | 512 |
yarn.scheduler.minimum-allocation-mb | 256 |
yarn.scheduler.maximum-allocation-mb | 512 |
yarn.nodemanager.resource.memory-mb | 1024 |
c1.xlarge
Configuration Option | Default Value |
---|---|
io.sort.mb | 150 |
mapreduce.map.java.opts | -Xmx864m |
mapreduce.reduce.java.opts | -Xmx1536m |
mapreduce.map.memory.mb | 1024 |
mapreduce.reduce.memory.mb | 2048 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 2048 |
yarn.nodemanager.resource.memory-mb | 5120 |
c3.large
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx768m |
mapreduce.reduce.java.opts | -Xmx768m |
mapreduce.map.memory.mb | 921 |
mapreduce.reduce.memory.mb | 921 |
yarn.scheduler.minimum-allocation-mb | 499 |
yarn.scheduler.maximum-allocation-mb | 1920 |
yarn.nodemanager.resource.memory-mb | 1920 |
c3.xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1177m |
mapreduce.reduce.java.opts | -Xmx1356m |
mapreduce.map.memory.mb | 1413 |
mapreduce.reduce.memory.mb | 1628 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 2944 |
yarn.nodemanager.resource.memory-mb | 3302 |
c3.2xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1515m |
mapreduce.reduce.java.opts | -Xmx1792m |
mapreduce.map.memory.mb | 1904 |
mapreduce.reduce.memory.mb | 2150 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 3788 |
yarn.nodemanager.resource.memory-mb | 5273 |
c3.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx2129m |
mapreduce.reduce.java.opts | -Xmx2560m |
mapreduce.map.memory.mb | 2826 |
mapreduce.reduce.memory.mb | 3072 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 5324 |
yarn.nodemanager.resource.memory-mb | 9113 |
c3.8xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx4669m |
mapreduce.reduce.java.opts | -Xmx4915m |
mapreduce.map.memory.mb | 4669 |
mapreduce.reduce.memory.mb | 4915 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 8396 |
yarn.nodemanager.resource.memory-mb | 16793 |
cc1.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 5120 |
yarn.nodemanager.resource.memory-mb | 20480 |
cg1.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 5120 |
yarn.nodemanager.resource.memory-mb | 20480 |
cc2.8xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 56320 |
cr1.8xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx10895m |
mapreduce.reduce.java.opts | -Xmx13516m |
mapreduce.map.memory.mb | 15974 |
mapreduce.reduce.memory.mb | 16220 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 27238 |
yarn.nodemanager.resource.memory-mb | 63897 |
g2.2xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx512m |
mapreduce.reduce.java.opts | -Xmx1536m |
mapreduce.map.memory.mb | 768 |
mapreduce.reduce.memory.mb | 2048 |
yarn.scheduler.minimum-allocation-mb | 256 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 12288 |
hi1.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx3379m |
mapreduce.reduce.java.opts | -Xmx4121m |
mapreduce.map.memory.mb | 4700 |
mapreduce.reduce.memory.mb | 4945 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 8448 |
yarn.nodemanager.resource.memory-mb | 16921 |
hs1.8xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx1280m |
mapreduce.reduce.java.opts | -Xmx2304m |
mapreduce.map.memory.mb | 1536 |
mapreduce.reduce.memory.mb | 2560 |
yarn.scheduler.minimum-allocation-mb | 512 |
yarn.scheduler.maximum-allocation-mb | 8192 |
yarn.nodemanager.resource.memory-mb | 56320 |
i2.xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx2150m |
mapreduce.reduce.java.opts | -Xmx2585m |
mapreduce.map.memory.mb | 2856 |
mapreduce.reduce.memory.mb | 3102 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 5376 |
yarn.nodemanager.resource.memory-mb | 9241 |
i2.2xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx3399m |
mapreduce.reduce.java.opts | -Xmx4147m |
mapreduce.map.memory.mb | 4730 |
mapreduce.reduce.memory.mb | 4976 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 8499 |
yarn.nodemanager.resource.memory-mb | 17049 |
i2.4xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx5898m |
mapreduce.reduce.java.opts | -Xmx7270m |
mapreduce.map.memory.mb | 8478 |
mapreduce.reduce.memory.mb | 8724 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 14745 |
yarn.nodemanager.resource.memory-mb | 32665 |
i2.8xlarge
Configuration Option | Default Value |
---|---|
mapreduce.map.java.opts | -Xmx10895m |
mapreduce.reduce.java.opts | -Xmx13516m |
mapreduce.map.memory.mb | 15974 |
mapreduce.reduce.memory.mb | 16220 |
yarn.scheduler.minimum-allocation-mb | 532 |
yarn.scheduler.maximum-allocation-mb | 27238 |
yarn.nodemanager.resource.memory-mb | 63897 |
You can start a new JVM for every task, which provides better task isolation, or you can share JVMs between tasks, providing lower framework overhead. If you are processing many small files, it makes sense to reuse the JVM many times to amortize the cost of start-up. However, if each task takes a long time or processes a large amount of data, then you might choose to not reuse the JVM to ensure that all memory is freed for subsequent tasks.
Use the
mapred.job.reuse.jvm.num.tasks
option to configure the JVM reuse settings.
To modify JVM using a bootstrap action
- In the directory where you installed the Amazon EMR CLI, run the following from the command line.
Linux, UNIX, and Mac OS X users:- ./elastic-mapreduce --create --alive --name "JVM infinite reuse" \--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \--bootstrap-name "Configuring infinite JVM reuse" \--args "-m,mapred.job.reuse.jvm.num.tasks=-1"
- Windows users:
ruby elastic-mapreduce --create --alive --name "JVM infinite reuse" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --bootstrap-name "Configuring infinite JVM reuse" --args "-m,mapred.job.reuse.jvm.num.tasks=-1" - Note
Amazon EMR sets the value of
mapred.job.reuse.jvm.num.tasks
to 20, but you can override it with a bootstrap action. A value of -1
means infinite reuse within a single job, and 1
means do not reuse tasks.Avoiding Cluster Slowdowns (AMI 3.0.0)
In a distributed environment, you are going to experience random delays, slow hardware, failing hardware, and other problems that collectively slow down your cluster. This is known as the stragglers problem. Hadoop has a feature called speculative execution that can help mitigate this issue. As the cluster progresses, some machines complete their tasks. Hadoop schedules tasks on nodes that are free. Whichever task finishes first is the successful one, and the other tasks are killed. This feature can substantially cut down on the run time of jobs. The general design of a mapreduce algorithm is such that the processing of map tasks is meant to be idempotent. However, if you are running a job where the task execution has side effects (for example, a zero reducer job that calls an external resource), it is important to disable speculative execution.
You can enable speculative execution for mappers and reducers independently. By default, Amazon EMR enables it for mappers and reducers in AMI 2.3. You can override these settings with a bootstrap action. For more information about using bootstrap actions
Speculative Execution Parameters
Parameter | Default Setting |
---|---|
mapred.map.tasks.speculative.execution | true |
mapred.reduce.tasks.speculative.execution | true |
To disable reducer speculative execution using a bootstrap action
- In the directory where you installed the Amazon EMR CLI, run the following from the command line.
- Linux, UNIX, and Mac OS X users:./elastic-mapreduce --create --alive --name "Reducer speculative execution" \--bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop \--bootstrap-name "Disable reducer speculative execution" \--args "-m,mapred.reduce.tasks.speculative.execution=false"
- Windows users:
ruby elastic-mapreduce --create --alive --name "Reducer speculative execution" --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --bootstrap-name "Disable reducer speculative execution" --args "-m,mapred.reduce.tasks.speculative.execution=false"
Comments
Post a Comment