Skip to content

Instantly share code, notes, and snippets.

@simon-mo
Last active September 18, 2020 17:33
Show Gist options
  • Select an option

  • Save simon-mo/6d23dfed729457313137aef6cfbc7b54 to your computer and use it in GitHub Desktop.

Select an option

Save simon-mo/6d23dfed729457313137aef6cfbc7b54 to your computer and use it in GitHub Desktop.

Serve migration guide

Applies to all Serve users. Refactor your Serve API call.

From

from ray import serve

class MyBackend:
	def __call__(self, flask_request):
		...

serve.init()
serve.create_backend("backend:v1", MyBackend)
serve.create_endpoint("endpoint", backend="backend:v1")
serve.set_traffic(...)

Change:

- serve.init()
+ client = serve.start()
- serve.create_backend("backend:v1", MyBackend)
+ client.create_backend("backend:v1", MyBackend)
- serve.create_endpoint("endpoint", backend="backend:v1")
+ client.create_endpoint("endpoint", backend="backend:v1")
- serve.set_traffic(...)
+ client.set_traffic(...)
  • Case 1: If you are launching an ephemeral development Serve instance, you should just use serve.start()
  • Case 2: If you are launching a long-running detached Serve instance, you should run serve.start(detached=True). The instance will not be destroyed after the script exits.
  • Case 3: If you are just connecting a long-running detached Serve instance, you should run serve.connect(). This assumes there is already a long-running instance started via `serve.start(detached=True).

Note: serve.start/connect are orthogonal to the underlying Ray cluster. The rule is simple:

  • If there isn't a Ray cluster connected, then Serve will start one via ray.init. The Ray cluster will be torn down on script exit.
  • If there is already a Ray cluster connected, then Serve will use the one connected.

Applies to batching, ServeHandle users.

The Servable API (protocol for your backend implementation) was simplified. Instead of:

class MyHandler:
	def __call__(self, flask_request, *, arg_1=None, arg_2=None)

You can it to just:

class MyHandler:
	def __call__(self, request)

The Python arguments and data are accessible via request object, which is a ServeRequest object that has similar API as Flask.Request. Learn more: https://docs.ray.io/en/master/serve/advanced.html#how-do-servehandle-and-serverequest-work

For batching, the input requests will now be a list of requests. If you can sending requests both from web and Python, the list can a mixed type of Flask.Request and ServeRequest:

@serve.accept_batch
def batch_func(requests):
	assert isinstance(requests, list)

Serve

New APIs

  • serve.client API makes it easy and appropriately manage lifetime for multiple Serve clusters. (#10409)
    • This is a breaking change. Please see more in our migration guide for steps to update your existing applications.
    • You should move serve.init -> serve.start/connect and call API methods on the client objects returned by serve.start/connect.
    • This gives you ability to specify a cluster wide Serve instance via serve.start(detached=True) and later connect to it via serve.connect(), or using serve.start() as a default, an ephemeral cluster will be started and teardown when the Python scripts exit.
  • ServeHandle API was revamped. (#10527, #10483)
    • Your callable only needs to accept a single argument request instead of multiple keyword arguments.
    • The request will be a Flask.Request if it's coming from web and ServeRequest if it's from Python ServeHandle. ServeRequest has similar API as the Flask request (e.g. request.args, request.data, request.json).
    • When you pass in arguments to handle.remote, the keyword arguments gets injected into request.args and the first position argument gets injected into request.data or request.json.
  • ASGI middleware support: you can enable CORS and any Starlette middlewares by adding them to serve.start(http_middleware=[...]). (#10529, #9940)

API removal

  • serve.metric module is removed, along side with serve.stat, serve.init(metric_exporter=...) API. Serve now export metrics in Prometheus format through Ray's built-in metrics exporter. Backend latency histogram and router queue sizes are added. (#10185, 10535)
  • SLO ordering code path is removed. relative_slo_ms and absolute_slo_ms arguments for HTTP and ServeHandle are removed. (#10075)

Improvements

  • Serve APIs are fully typed. (#10205, #10288)
  • Backend configs are now typed and validated via Pydantic. (#10559, #10389)
  • Progress towards application level backend autoscaler. (#9955, #9845, #9828)
  • New architecture page in documentation. (#10204)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment