Simon Mo simon-mo

RayServe - Scalable Model Serving

Ray Serve Overview & Concepts

Context: Challenges in Serving

There are generally two ways of serving machine learning applications at scale. The first is wrapping your application in a traditional web server. This approach is easy but hard to scale each component, and easily leading to high memory usage as well as concurrency issue. The other approach is to use a cloud-hosted solution

RayServe - Scalable Model Serving

Ray Serve Overview & Concepts

Context: Challenges in Serving

There are generally two ways of serving machine learning applications at scale. The first is wrapping your application in a traditional web server. This approach is easy but hard to scale each component, and easily leading to high memory usage as well as concurrency issue. The other approach is to use a cloud-hosted solution

	WARNING: autodoc: failed to import function 'rllib.utils.annotations.PublicAPI' from module 'ray'; the following exception was raised:
	Traceback (most recent call last):
	File "/Users/simonmo/miniconda3/lib/python3.6/site-packages/sphinx/ext/autodoc/importer.py", line 32, in import_module
	return importlib.import_module(modname)
	File "/Users/simonmo/miniconda3/lib/python3.6/importlib/__init__.py", line 126, in import_module
	return _bootstrap._gcd_import(name[level:], package, level)
	File "<frozen importlib._bootstrap>", line 994, in _gcd_import
	File "<frozen importlib._bootstrap>", line 971, in _find_and_load
	File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
	File "<frozen importlib._bootstrap>", line 665, in _load_unlocked

	import numpy as np

	np.random.seed(42)


	def gamma(mean, cv, size):
	if cv == 0.0:
	return np.ones(size) * mean
	else:
	return np.random.gamma(1.0/cv, cv*mean, size=size)



	Checking ===== BUILD.bazel
	Found command {'rule': 'core_worker-jni-darwin-compat', 'cmd': 'cp $< $@'}

	Found conditional {'rule': 'redis', 'condition': '@bazel_tools//src/conditions:windows', 'cmd': '\n unzip -q -o -- $(location @com_github_tporadowski_redis_bin//file) redis-server.exe redis-cli.exe &&\n mv -f -- redis-server.exe $(location redis-server) &&\n mv -f -- redis-cli.exe $(location redis-cli)\n '}

	Found conditional {'rule': 'redis', 'condition': '//conditions:default', 'cmd': '\n tmpdir="redis.tmp" &&\n path=$(location @com_github_antirez_redis//:file) &&\n cp -p -L -R -- "$${path%/*}" "$${tmpdir}" &&\n chmod +x "$${tmpdir}"/deps/jemalloc/configure &&\n parallel="$$(getconf _NPROCESSORS_ONLN \|\| echo 1)"\n make -s -C "$${tmpdir}" -j"$${parallel}" V=0 CFLAGS="$${CFLAGS-} -DLUA_USE_MKSTEMP -Wno-pragmas -Wno-empty-body" &&\n mv "$${tmpdir}"/src/redis-server $(location redis-server)

	from ray import serve
	import requests
	import time
	import ray

	class MultiMethodDemo:
	def method_a(self, flask_requests, *, keyword=[]):
	if serve.context.web:
	print("Method A: Got flask requests batch size", serve.context.batch_size)
	else:

	import torch
	import torch.optim as optim
	import torchvision
	import torchvision.transforms as transforms
	import torch.nn as nn

	use_cuda = False
	if torch.cuda.is_available():
	use_cuda = True