- We can adjust some hadoop setting like use config.json to adjust hadoop block size
[
{
"Classification": "hdfs-site",
"Properties": {
"dfs.blocksize": "67108864"
}
}
]
- Choose HBase, Spark, Hadoop...and wait till cluster state change to running
- If we need to install many dependency, we have to modify the volume due to the size of default root partition is only 10GB. After modification, we should run
sudo resize2fs /dev/xvda1
to resize the root partition.
We may attempt to only copy the shared library to reduce the size of installation The MapR distribution has setting to resize the EBS, while amazon ami does not.
- Each node has built in aws cli, thus we can thus use EC2Box to run the installed scripts.
The steps does not work here, due to it only runs on the master node Though we can use bootstrap action to run scripts on all nodes using custome jar :
s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
and arguments3://parallelvid/install.sh
. It may corrupted the following cluster installations, since it runs before cluster setup.
- Then, a setup script can run on master to setup HDFS, Hbase etc.
For example start the Hbase thrift server
sudo -E /usr/lib/hbase/bin/hbase-daemon.sh start thrift -p 9097 --infoport 9098
The port may conflict other software, check log before procceed.
- The emr distribution is based on Apach BigTop. You can find the related installed library under
/usr/lib/