Note: In this document (and also in the Mesos code base, at least as of Feb 14, 2012), we refer to Mesos Applications as "Frameworks".
See one of the example framework schedulers in MESOS_HOME/src/examples/
to get an idea of what a Mesos framework scheduler and executor in the language of your choice looks like.
You can write a framework scheduler in C, C++, Java/Scala, or Python. Your framework scheduler should inherit from the Scheduler
class (see API below). Your scheduler should create a SchedulerDriver
(which will mediate communication between your scheduler and the Mesos master) and then call SchedulerDriver.run()
/**
* Empty virtual destructor (necessary to instantiate subclasses).
*/
virtual ~Scheduler() {}
/**
* Invoked when the scheduler successfully registers with a Mesos
* master. A unique ID (generated by the master) used for
* distinguishing this framework from others and MasterInfo
* with the ip and port of the current master are provided as arguments.
*/
virtual void registered(SchedulerDriver* driver,
const FrameworkID& frameworkId,
const MasterInfo& masterInfo) = 0;
/**
* Invoked when the scheduler re-registers with a newly elected Mesos master.
* This is only called when the scheduler has previously been registered.
* MasterInfo containing the updated information about the elected master
* is provided as an argument.
*/
virtual void reregistered(SchedulerDriver* driver,
const MasterInfo& masterInfo) = 0;
/**
* Invoked when the scheduler becomes "disconnected" from the master
* (e.g., the master fails and another is taking over).
*/
virtual void disconnected(SchedulerDriver* driver) = 0;
/**
* Invoked when resources have been offered to this framework. A
* single offer will only contain resources from a single slave.
* Resources associated with an offer will not be re-offered to
* _this_ framework until either (a) this framework has rejected
* those resources (see SchedulerDriver::launchTasks) or (b) those
* resources have been rescinded (see Scheduler::offerRescinded).
* Note that resources may be concurrently offered to more than one
* framework at a time (depending on the allocator being used). In
* that case, the first framework to launch tasks using those
* resources will be able to use them while the other frameworks
* will have those resources rescinded (or if a framework has
* already launched tasks with those resources then those tasks will
* fail with a TASK_LOST status and a message saying as much).
*/
virtual void resourceOffers(SchedulerDriver* driver,
const std::vector<Offer>& offers) = 0;
/**
* Invoked when an offer is no longer valid (e.g., the slave was
* lost or another framework used resources in the offer). If for
* whatever reason an offer is never rescinded (e.g., dropped
* message, failing over framework, etc.), a framwork that attempts
* to launch tasks using an invalid offer will receive TASK_LOST
* status updats for those tasks (see Scheduler::resourceOffers).
*/
virtual void offerRescinded(SchedulerDriver* driver,
const OfferID& offerId) = 0;
/**
* Invoked when the status of a task has changed (e.g., a slave is
* lost and so the task is lost, a task finishes and an executor
* sends a status update saying so, etc). Note that returning from
* this callback _acknowledges_ receipt of this status update! If
* for whatever reason the scheduler aborts during this callback (or
* the process exits) another status update will be delivered (note,
* however, that this is currently not true if the slave sending the
* status update is lost/fails during that time).
*/
virtual void statusUpdate(SchedulerDriver* driver,
const TaskStatus& status) = 0;
/**
* Invoked when an executor sends a message. These messages are best
* effort; do not expect a framework message to be retransmitted in
* any reliable fashion.
*/
virtual void frameworkMessage(SchedulerDriver* driver,
const ExecutorID& executorId,
const SlaveID& slaveId,
const std::string& data) = 0;
/**
* Invoked when a slave has been determined unreachable (e.g.,
* machine failure, network partition). Most frameworks will need to
* reschedule any tasks launched on this slave on a new slave.
*/
virtual void slaveLost(SchedulerDriver* driver,
const SlaveID& slaveId) = 0;
/**
* Invoked when an executor has exited/terminated. Note that any
* tasks running will have TASK_LOST status updates automagically
* generated.
*/
virtual void executorLost(SchedulerDriver* driver,
const ExecutorID& executorId,
const SlaveID& slaveId,
int status) = 0;
/**
* Invoked when there is an unrecoverable error in the scheduler or
* scheduler driver. The driver will be aborted BEFORE invoking this
* callback.
*/
virtual void error(SchedulerDriver* driver, const std::string& message) = 0;
Your framework executor must inherit from the Executor class. It must override the launchTask()
method.
You can use the $MESOS_HOME
environment variable inside of your executor to determine where mesos is running from.
/**
* Invoked once the executor driver has been able to successfully
* connect with Mesos. In particular, a scheduler can pass some
* data to it's executors through the FrameworkInfo.ExecutorInfo's
* data field.
*/
virtual void registered(ExecutorDriver* driver,
const ExecutorInfo& executorInfo,
const FrameworkInfo& frameworkInfo,
const SlaveInfo& slaveInfo) = 0;
/**
* Invoked when the executor re-registers with a restarted slave.
*/
virtual void reregistered(ExecutorDriver* driver,
const SlaveInfo& slaveInfo) = 0;
/**
* Invoked when the executor becomes "disconnected" from the slave
* (e.g., the slave is being restarted due to an upgrade).
*/
virtual void disconnected(ExecutorDriver* driver) = 0;
/**
* Invoked when a task has been launched on this executor (initiated
* via Scheduler::launchTasks). Note that this task can be realized
* with a thread, a process, or some simple computation, however, no
* other callbacks will be invoked on this executor until this
* callback has returned.
*/
virtual void launchTask(ExecutorDriver* driver,
const TaskInfo& task) = 0;
/**
* Invoked when a task running within this executor has been killed
* (via SchedulerDriver::killTask). Note that no status update will
* be sent on behalf of the executor, the executor is responsible
* for creating a new TaskStatus (i.e., with TASK_KILLED) and
* invoking ExecutorDriver::sendStatusUpdate.
*/
virtual void killTask(ExecutorDriver* driver, const TaskID& taskId) = 0;
/**
* Invoked when a framework message has arrived for this
* executor. These messages are best effort; do not expect a
* framework message to be retransmitted in any reliable fashion.
*/
virtual void frameworkMessage(ExecutorDriver* driver,
const std::string& data) = 0;
/**
* Invoked when the executor should terminate all of it's currently
* running tasks. Note that after a Mesos has determined that an
* executor has terminated any tasks that the executor did not send
* terminal status updates for (e.g., TASK_KILLED, TASK_FINISHED,
* TASK_FAILED, etc) a TASK_LOST status update will be created.
*/
virtual void shutdown(ExecutorDriver* driver) = 0;
/**
* Invoked when a fatal error has occured with the executor and/or
* executor driver. The driver will be aborted BEFORE invoking this
* callback.
*/
virtual void error(ExecutorDriver* driver, const std::string& message) = 0;
You need to put your framework somewhere that all slaves on the cluster can get it from. If you are running HDFS, you can put your executor into HDFS. Then, you tell Mesos where it is via the @ExecutorInfo@ parameter of MesosSchedulerDriver
constructor (e.g. see src/examples/java/TestFramework.java
for an example of this). ExecutorInfo is a a Protocol Buffer Message class (defined in @include/mesos/mesos.proto@), and you set its uri field to something like HDFS://path/to/executor/
. Also, you can pass the frameworks_home
configuration option (defaults to: MESOS_HOME/frameworks
) to your mesos-slave
daemons when you launch them to specify where all of your framework executors are stored (e.g. on an NFS mount that is available to all slaves), then set ExecutorInfo
to be a relative path, and the slave will prepend the value of frameworks_home to the relative path provided.
Once you are sure that your executors are available to the mesos-slaves, you should be able to run your scheduler, which will register with the Mesos master, and start receiving resource offers!