- Generic Resource Manager
- http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#17
- ResourceManager/NodeManager
- ApplicationMaster(Master per Job)
- http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#21
- ApplicationMaster Failure
- initAndStartAppMaster() -> serviceStart() -> processRecovery
- JobHistoryServer writes completed events into FileSystem(e.g. HDFS, S3)
- Generic JobHistoryServer
- https://issues.apache.org/jira/browse/YARN-321
- MapTask/ReduceTask failures
- MapReduce-Style Fault recovery
- MRAppMaster handles all faults of Tasks and recover them
- http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
- MRAppMaster.java
- YARN-related failures
- NodeManager Failure
- NodeStatusUpdater reports NodeManager health via heartbeat
- Faults are deteced by ReousrceManager(heartbeat)
- ResourceTrackerService#nodeHeartBeat
- ResourceManager Failure
- What's happen?
- http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
- Overview
- http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#26
- Configuration
- https://gist.github.com/oza/7055279
- Operation
- yarn rmadmin -transitionToActive/
- ZKFC