mesos-framework

Package version Package downloads Package license Build Status

This project provides a high-level wrapper around the Mesos HTTP APIs for schedulers and executors. It can be used to write Mesos frameworks in pure JavaScript. The currently supported Mesos version is 1.5.0.

Installation

You can use mesos-framework in your own projects by running

npm i mesos-framework --save

Local test environment

docker-compose can be used for setting up a local test environment. Just run

$ docker-compose up -d

in the base directory of this project.

Documentation

The mesos-framework project is not a Mesos framework itself, but can be imagined as a "framework-framework" (or meta framework), meaning that it provides a certain abstraction around the HTTP APIs for schedulers and executors, together with some convenience methods.

It implements all existing Calls as methods for both the Scheduler and Executor classes, meaning that they can be used without having to write the HTTP communication yourself. Additionally, it exposes all Events for both classes, as definied in the Mesos docs. It also adds some custom events for the Scheduler class for better task handling.

There are some basic event handler methods provided for the Scheduler class, which for example take care of the checking and accepting the offers received from the Mesos master, as well as keeping track of tasks. Please have a look at the class documentation in the docs folder of this project.

For both the Scheduler and Executor classes, the belonging event handler methods can be overwritten with custom logic. To do that, you can supply a options.handlers property map object (where the property name is the uppercase Event name) when instantiating a class:

var scheduler = new Scheduler({
    ...
    "handlers": {
        "HEARTBEAT": function (timestamp) {
            console.log("CUSTOM HEARTBEAT!");
            this.lastHeartbeat = timestamp;
        }
    }
});

Basically this is the mechanism to create custom framework logic. Please have a look at the examples folder to see examples for command-based and Docker-based schedulers.

API docs

The API docs can be accessed via API docs hosted on GitHub pages.

Coverage reports

The Coverage Reports are hosted on GitHub pages as well.

Scheduler

The Scheduler is the "heart" of a Mesos framework. It is very well possible to create a Mesos framework only by implementing the Scheduler with the standard CommandInfo and ContainerInfo objects.

The option properties you can specify to create a Scheduler are the following:

  • masterUrl: The URL of the leading Mesos master (mandatory).
  • port: The port of the leading Mesos master (mandatory).
  • useZk: Should be set to true if you want to use ZooKeeper to persist the task information. Default is false.
  • zkUrl: The ZooKeeper connection url. Default is master.mesos:2181.
  • zkPrefix: The prefix of the ZooKeeper node where the data for this framework shall be stored. Default is /dcos-service-.
  • user: The system user name to use (must exist on the agents!). Default is root.
  • role: The Mesos role to use when subscribing to the master. Default is *.
  • frameworkName: The desired framework name (will choose a standard name if not specified). Default is mesos-framework. concatented with a unique UUID.
  • restartStates: An array of TaskState objects which should trigger a restart of a task. For example, regularly finished tasks (in state TASK_FINISHED) are not restarted by default.
  • masterConnectionTimeout: The number of seconds to wait before a connection to the leading Mesos master is considered as timed out (default: 10).
  • frameworkFailoverTimeout: The number of seconds to wait before a framework is considered as failed by the leading Mesos master, which will then terminate the existing tasks/executors (default: 604800).
  • exponentialBackoffFactor: The factor in which to increase (multiply) the backoff timer upon re-subscription (default 1.5).
  • exponentialBackoffMinimum: The minimal backoff time for re-subscription in seconds (default: 1).
  • exponentialBackoffMaximum: The maximum backoff time for re-subscription in seconds (default: 15).
  • tasks: An object (map) with the task info (see below). It's possible to create different (prioritized) tasks, e.g. launching different containers with different instance counts. See the Docker Scheduler example.
  • handlers: An object containing custom handler functions for the Scheduler events, where the property name is the uppercase Event name.
  • staticPorts: A boolean to indicate whether to use fixed ports in the framework or not. Default is false.
  • serialNumberedTasks: A boolean to indicate whether to add a task serial number to the task name launched in mesos (from the framework's prespective there are always serial numbers in use, they are also part of the task IDs), disabling this is useful for service access with a single mesos DNS name. Defaults to true.

A tasks sub-object can contain objects with task information:

  • instances: The number of instances (tasks) you want to launch (will be 1 if you don't specify this property).
  • priority: The priority of which the different tasks shall be launched (lower is better). If none is specified, tasks will be launched based on the task naming.
  • allowScaling: A boolean value which indicates whether this task permits scaling operations (default: false).
  • commandInfo: A Mesos.CommandInfo definition (mandatory).
  • containerInfo: A Mesos.ContainerInfo definition.
  • executorInfo: A Mesos.ExecutorInfo definition.
  • resources: The object of Mesos.Resource types, such as cpu, mem, ports and disk (mandatory) with an optional fixedPorts number array (included in the general port count).
  • portMappings: The array of portMapping objects, each containing a numeric port value (for container ports), and a protocol string (either tcp or udp).
  • healthChecks: A Mesos.HealthCheck definition.
  • labels: A Mesos.Labels definition.

High availability

Currently, mesos-framework doesn't support HA setups for the scheduler instances, meaning that you can only run one instance at once. If you run the scheduler application via Marathon, you should be able to make use of the health checks to let them restart the scheduler application once it fails.

You can use ZooKeeper though to be able to recover from scheduler restarts. This is done by setting useZk to true and specifying a zkUrl connection string (optionally, if you don't want to use the default master.mesos:2181, which should work inside clusters using Mesos DNS).

mesos-framework has support for the detection of the leading master via Mesos DNS. You can use leader.mesos as the masterUrl, enabling that upon re-registration of the scheduler, the correct master address will be used (Mesos DNS lookup). If you just provided an IP address or a hostname, mesos-framework will try to establish a new connection to the given address and look for redirection information (the "leader change" case). If this request times out, there is no way to automatically determine the current leader, so the scheduler stops itsef (the "failed Master" case).

Sample framework implementation

See also the example implementation of a framework at mesos-framework-boilerplate.

Events

Events from Master
The following events from the leading Mesos master are exposed:

  • subscribed: The first event sent by the master when the scheduler sends a SUBSCRIBE request on the persistent connection (i.e. the framework was started). Emits an object containing the frameworkId and the mesosStreamId, this may be emitted multiple times in the lifetime of a framework, as reconnections emit it as well.
  • offers: Sent by the master whenever there are new resources that can be offered to the framework. Emits the base object from the Master for this event.
  • inverse_offers: Sent by the master whenever there are resources requested back from the scheduler. Emits the base object from the Master for this event.
  • rescind: Sent by the master when a particular offer is no longer valid. Emits the base object from the Master for this event.
  • rescind_inverse_offer: Sent by the master when a particular inverse offer is no longer valid (e.g., the agent corresponding to the offer has been removed) and hence needs to be rescinded. Emits the base object from the Master for this event.
  • update: Sent by the master whenever there is a status update that is generated by the executor, agent or master. Emits the base object from the Master for this event.
  • message: A custom message generated by the executor that is forwarded to the scheduler by the master. Emits an object containing the agentId, the executorId and the ASCII-encoded data.
  • heartbeat: This event is periodically sent by the master to inform the scheduler that a connection is alive. Emits the timestamp of the last heartbeat event.
  • failure: Sent by the master when an agent is removed from the cluster (e.g., failed health checks) or when an executor is terminated. Emits the base object from the Master for this event.
  • error: Sent by the master when an asynchronous error event is generated (e.g., a framework is not authorized to subscribe with the given role), or if another error occurred.

Events from Scheduler
The following events from the Scheduler calls are exposed:

  • ready: Is emitted when the scheduler is instantiated and ready to subscribe to the Mesos master. Every scheduler.subscribe() call should be wrapped by an event handle reacting to this event. See the examples.
  • sent_subscribe: Is emitted when the scheduler has sent the SUBSCRIBE call.
  • sent_accept: Is emitted when the scheduler has sent an ACCEPT call to accept an offer from the Master.
  • sent_decline: Is emitted when the scheduler has sent a DECLINE call to decline an offer from the Master.
  • sent_teardown: Is emitted when the scheduler has sent the TEARDOWN call to stop the framework to the Master.
  • sent_revive: Is emitted when the scheduler has sent a REVIVE call to the Master to remove any/all filters that it has previously set via ACCEPT or DECLINE calls.
  • sent_kill: Is emitted when the scheduler has sent a KILL call to the Master to kill a specific task.
  • sent_acknowledge: Is emitted when the scheduler has sent an ACKNOWLEDGE call to the Master to acknowledge a status update.
  • sent_shutdown: Is emitted when the scheduler has sent a SHUTDOWN call to the Master to shutdown a specific custom executor.
  • sent_reconcile: Is emitted when the scheduler has sent a RECONCILE call to the Master to query the status of non-terminal tasks.
  • sent_message: Is emitted when the scheduler has sent a MESSAGE call to the Master to send arbitrary binary data to the executor.
  • sent_request: Is emitted when the scheduler has sent a REQUEST call to the Master to call new resources.
  • sent_supress: Is emitted when the scheduler has sent a SUPPRESS call to the Master to suppress new resource offers.
  • sent_accept_inverse_offers: Is emitted when the scheduler has sent a ACCEPT_INVERSE_OFFERS call to the Master to accept inverse offers.
  • sent_decline_inverse_offers: Is emitted when the scheduler has sent a DECLINE_INVERSE_OFFERS call to the Master to decline inverse offers.
  • sent_acknowledge_operation_status: Is emitted when the scheduler has sent the ACKNOWLEDGE_OPERATION_STATUS call.
  • sent_reconcile_operations: Is emitted when the scheduler has sent the RECONCILE_OPERATIONS call.
  • updated_task: Is emitted when a task was updated. Contains an object with taskId, executorId and state.
  • removed_task: Is emitted when a task was removed. Contains the taskId.
  • task_launched: Is emitted when a task moves to the running state, to handle initialization. Contains the task structure.

Example

Also, you can have a look at the examples folder to see examples for command-based and Docker-based schedulers.

"use strict";

var Scheduler = require("mesos-framework").Scheduler;
var Mesos = require("mesos-framework").Mesos.getMesos();

var scheduler = new Scheduler({
    "masterUrl": "172.17.11.102", // If Mesos DNS is used this would be "leader.mesos", otherwise use the actual IP address of the leading master
    "port": 5050,
    "frameworkName": "My first Command framework",
    "logging": {
        "level": "debug"
    },
    "restartStates": ["TASK_FAILED", "TASK_KILLED", "TASK_LOST", "TASK_ERROR", "TASK_FINISHED"],
    "tasks": {
        "sleepProcesses": {
            "priority": 1,
            "instances": 3,
            "commandInfo": new Builder("mesos.CommandInfo").setValue("env && sleep 100").setShell(true),
            "resources": {
                "cpus": 0.2,
                "mem": 128,
                "ports": 1,
                "disk": 0
            }
        }
    },
    "handlers": {
        "HEARTBEAT": function (timestamp) {
            this.logger.info("CUSTOM HEARTBEAT!");
            this.lastHeartbeat = timestamp;
        }
    }
});

// Start the main logic once the framework scheduler has received the "SUBSCRIBED" event from the leading Mesos master
scheduler.on("subscribed", function (obj) {

    // Display the Mesos-Stream-Id
    scheduler.logger.info("Mesos Stream Id is " + obj.mesosStreamId);

    // Display the framework id
    scheduler.logger.info("Framework Id is " + obj.frameworkId);

    // Trigger shutdown after one minute
    setTimeout(function() {
        // Send "TEARDOWN" request
        scheduler.teardown();
        // Shutdown process
        process.exit(0);
    }, 60000);

});

// Capture "offers" events
scheduler.on("offers", function (offers) {
    scheduler.logger.info("Got offers: " + JSON.stringify(offers));
});

// Capture "heartbeat" events
scheduler.on("heartbeat", function (heartbeatTimestamp) {
    scheduler.logger.info("Heartbeat on " + heartbeatTimestamp);
});

// Capture "error" events
scheduler.on("error", function (error) {
    scheduler.logger.info("ERROR: " + JSON.stringify(error));
    scheduler.logger.info(error.stack);
});

scheduler.on("ready", function () {
    // Start framework scheduler
    scheduler.subscribe();
});

Executor

You should consider writing your own executors if your framework has special requirements. For example, you may not want a 1:1 relationship between tasks and processes.

How can the custom executors be used? Taken from the Mesos framework development guide:

One way to distribute your framework executor is to let the Mesos fetcher download it on-demand when your scheduler launches tasks on that slave. ExecutorInfo is a Protocol Buffer Message class, and it contains a field of type CommandInfo. CommandInfo allows schedulers to specify, among other things, a number of resources as URIs. These resources are fetched to a sandbox directory on the slave before attempting to execute the ExecutorInfo command. Several URI schemes are supported, including HTTP, FTP, HDFS, and S3.

Alternatively, you can pass the frameworks_home configuration option (defaults to: MESOS_HOME/frameworks) to your mesos-slave daemons when you launch them to specify where your framework executors are stored (e.g. on an NFS mount that is available to all slaves), then use a relative path in CommandInfo.uris, and the slave will prepend the value of frameworks_home to the relative path provided.

Events

Events from Scheduler
The following events from the Scheduler are exposed:

  • subscribed: The first event sent by the agent when the executor sends a SUBSCRIBE request on the persistent connection.
  • launch: Sent by the agent whenever it needs to assign a new task to the executor. The executor is required to send an UPDATE message back to the agent indicating the success or failure of the task initialization.
  • kill: Is sent whenever the scheduler needs to stop execution of a specific task. The executor is required to send a terminal update (e.g., TASK_FINISHED, TASK_KILLED or TASK_FAILED) back to the agent once it has stopped/killed the task. Mesos will mark the task resources as freed once the terminal update is received.
  • acknowledged: Sent to signal the executor that a status update was received as part of the reliable message passing mechanism. Acknowledged updates must not be retried.
  • message: Sent a custom message generated by the scheduler and forwarded all the way to the executor. These messages are delivered “as-is” by Mesos and have no delivery guarantees. It is up to the scheduler to retry if a message is dropped for any reason.
  • shutdown: Sent by the agent in order to shutdown the executor. Once an executor gets a SHUTDOWN event it is required to kill all its tasks, send TASK_KILLED updates and gracefully exit.
  • error: Sent by the agent when an asynchronous error event is generated. It is recommended that the executor abort when it receives an error event and retry subscription.

Events from Executor
The following events from the Executor calls are exposed:

  • sent_subscribe: Is emitted when the executor has sent the SUBSCRIBE request.
  • sent_update: Is emitted when the scheduler has sent the UPDATE request to the agent.
  • sent_message: Is emitted when the scheduler has sent the MESSAGE request to send arbitrary binary data to the agent.

Mesos

Creating objects ("natively" via protobufjs)

The module also exposes the Mesos protocol buffer object, which is loaded via protobuf.js. It can be used to create the objects which can be then passed to the scheduler/executor methods.

Example:

var Mesos = require("mesos-framework").Mesos.getMesos();

var TaskID = new Mesos.TaskID("my-task-id");

You can also instantiate Mesos protocol buffer objects from plain JSON. Be sure to follow the structure defined in the mesos.proto protobuf though, otherwise this will raise an error...

Example:

var Builder = require("mesos-framework").Mesos.getBuilder();

var taskId = {
    "value": "my-task-id"
};

var TaskID = new (Builder.build("mesos.TaskID"))(taskId);

Creating objects (via builder pattern)

You can also create Mesos objects via the builder pattern like this:

Example:

var Builder = require("mesos-framework").Builder;

var commandInfo = new Builder("mesos.CommandInfo")
                    .setValue("env && sleep 100")
                    .setShell(true);

taskHealthHelper

This module allows for testing of task health (or any metric available via HTTP, for example cluster state, leader, etc...) and emit a scheduler event so the issue will be handled in code.

The option properties you can specify to create a taskHealthHelper are the following:

  • interval: The time interval between checks, in seconds, can be partial seconds (but not recommended), defaults to 30 seconds.
  • graceCount: The amount of failed checks until a task is marked as unhealthy, defaults to 4.
  • portIndex: The port index to check, defaults to 0.
  • propertyPrefix: A optional prefix for the health property, mainly to be used when having multiple health checks per framework (normal health and leader status, for instance), defaults to an empty string.
  • errorEvent: The name of the event to emit when a task fails the health check (after the grace count), defaults to propertyPrefix + "task_unhealthy".
  • additionalProperties: An array of additional properties to be set, these properties do not have the prefix added to them, information below.
  • taskNameFilter: An optional regular expression to filter tasks to be checked by the health check.
  • statusCodes: An array of acceptable HTTP status codes, defaults to [200].
  • checkBodyFunction: An optional function to check the body of the HTTP response (only after it passed the status check), parameters are the task object and the response body, needs to retrun a boolean.

The additional properties array is an array of objects with the following members:

  • name: The name of the property to set, for example: "leader" (mandatory).
  • setUnhealthy: Should the property be set when the check fails (optional, when unset it only sets the property when healthy).
  • inverse: Whether the health status and the property are inversed or not (optional).