oza/FT.md

Created December 20, 2013 10:12

Star (1) You must be signed in to star a gist
Fork (2) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/oza/8052857.js"></script>
Save oza/8052857 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

FT.md

Fault Tolerance in MRv2 over Hadoop/YARN

What is/Why YARN?

Generic Resource Manager
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#17

Architecture Overview

ResourceManager/NodeManager
ApplicationMaster(Master per Job)
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#21

Application-related failures in MapReduce

ApplicationMaster Failure
initAndStartAppMaster() -> serviceStart() -> processRecovery
JobHistoryServer writes completed events into FileSystem(e.g. HDFS, S3)
Generic JobHistoryServer
https://issues.apache.org/jira/browse/YARN-321
MapTask/ReduceTask failures
MapReduce-Style Fault recovery
MRAppMaster handles all faults of Tasks and recover them
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
MRAppMaster.java

Fault Tolerance in YARN

YARN-related failures
NodeManager Failure
NodeStatusUpdater reports NodeManager health via heartbeat
Faults are deteced by ReousrceManager(heartbeat)
ResourceTrackerService#nodeHeartBeat
ResourceManager Failure
What's happen?
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#25
Overview
http://www.slideshare.net/ozax86/ntt-meets-hadoop-and-cloudera-world-tokyo#26
Configuration
https://gist.github.com/oza/7055279
Operation
yarn rmadmin -transitionToActive/
ZKFC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment