Skip to content

Instantly share code, notes, and snippets.

@edecoux
Last active September 23, 2022 06:35
Show Gist options
  • Select an option

  • Save edecoux/638b68dc2a76a230dd621ee601499bc9 to your computer and use it in GitHub Desktop.

Select an option

Save edecoux/638b68dc2a76a230dd621ee601499bc9 to your computer and use it in GitHub Desktop.
Interactive Serverless Compute.md

Interactive Serverless Compute.md · GitHub

https://share.summari.com/interactive-serverless-computemd-github?utm_source=Chrome

Overview

  • Serverless compute is emerging as an attractive cloud computing model that lets developers focus only on the core applications, building them as small, fine-grained workloads without having to worry about building and/or managing the infrastructure they run on.
  • Cloud providers dynamically provision, deploy, patch, and monitor the infrastructure and its resources (e.g., compute, storage, memory, and network) for these workloads; with tenants only paying for the resources they consume at millisecond increments
  • They generally put a strict limit on the compute time and resource that can be consumed by a single workload, in order to ensure that they can easily deploy and scale each workload without impacting the availability of other workloads.

Preliminary Results

  • To compare the performance of λ−NIC versus existing serverless compute frameworks, we select OpenFaaS as the baseline framework because it is the most favorited open-source serverless framework and closely resembles serverless infrastructure
  • We evaluate three types of workloads: short, simple, and long
  • Short workloads with no dependency
  • Simple server that responds to static contents
  • Longer workloads involving processing on larger pieces of data (e.g., image or stream processing)
  • The data required for these workloads is generally larger than a single packet and is stored in the DRAM

Knowledgegments

  • This research was supported by The Stanford Platform Lab.
  • The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official λ−NIC Bare-Metal Container Simple Server
  • Latency (ms) for each workload
  • Work completion time (s) for 10,000 requests sent across 56 threads while context switching
  • Resource usage when running image transformer

![[λ-NIC/attachments/λ-NIC—Interactive Serverless Compute.pdf]]

λ−NIC: Interactive Serverless Compute on SmartNICs

Sean Choi

Stanford University

Muhammad Shahbaz

Stanford University

Balaji Prabhakar

Stanford University

Mendel Rosenblum

Stanford University

CCS CONCEPTS

  • Networks→Cloud computing;Programmable networks; In-network processing;

KEYWORDS

Serverless compute; SmartNIC; P4; and NPU ACM Reference Format: Sean Choi, Muhammad Shahbaz, Balaji Prabhakar, and Mendel Rosenblum. 2019.λ−NIC: Interactive Serverless Compute on SmartNICs. InSIGCOMM ’19: ACM SIGCOMM 2019 Conference (SIGCOMM Posters and Demos ’19), August 19–23, 2019, Beijing, China.ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/3342280. 3342341

1 OVERVIEW

Serverless computeis emerging as an attractive cloud computing model that lets developers focus only on the core applications, building them as small, fine-grained workloads (i.e., lambdas), without having to worry about building and/or managing the infrastructure they run on. Cloud providers dynamically provision, deploy, patch, and monitor the infrastructure and its resources (e.g., compute, storage, memory, and network) for these workloads; with tenants only paying for the resources they consume at millisecond increments. The cloud providers generally put a strict limit on the compute time and resource that can be consumed by a single workload, in order to ensure that they can easily deploy and scale each workload without impacting the availability of other workloads. Thus, the workloads are short-lived with strict compute time and memory limits (up to 15 minutes and 3 GB, respectively, for Amazon Lambda [ 6 ]) and are often latency sensitive. Some examples of these workloads include real-time stream processing and generic API endpoints. Today, all major cloud vendors offer some form of serverless frameworks (Figure 1), such as Amazon Lambda [ 3 ], Google Cloud Functions [ 9 ], and Microsoft Azure Functions [ 7 ], along with opensource developments like OpenFaaS [ 13 ] and OpenWhisk [ 2 ]. These frameworks rely on virtualization and containers [ 10 ] to execute and scale tenants’ lambdas. These technologies were designed to maximize utilization of the providers’ physical infrastructure, while presenting each tenant with its own view of a completely isolated machine. With serverless computing, where server management is hidden from tenants, these virtualization technologies become redundant, unnecessarily bloating the code size of serverless workloads, and causing processing delays (of hundreds of milliseconds)

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. SIGCOMM Posters and Demos ’19, August 19–23, 2019, Beijing, China ©2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6886-5/19/08…$15. https://doi.org/10.1145/3342280. 3342341

Worker Pool

SmartNIC W 2

SmartNIC

SmartNIC

SmartNIC W 1

W 1 W 2

W 3

Workload Manager

Data Store

Raw Workload

Compiled Workloads

Gateway (Proxy)

User Request

R 1

R 2 R 3

External Service

Figure 1: Overview of a general serverless compute framework.Riis the request for workloadi(Wi).

and memory overheads (of tens of megabytes) [ 18 ]. The increased overheads also limits the concurrent execution (less than hundred or so) of these workloads on a single server, hence, raising the overall cost of running such workloads in a data center. At the same time, high latencies and limited concurrency in modern serverless compute frameworks prohibit many interactive workloads (e.g., web servers and database clients) from taking advantage of serverless compute. The industry is starting to realize these issues and some providers, such as Google and CloudFlare, have recently started developing alternative frameworks (like Isolate [ 1 ]) that remove these technology layers (e.g., containers) and run serverless workloads directly on the bare-metal server [ 12 ]. However, CPU-based alternatives are inherently limited by their architecture design, which is not designed to run thousands of small functions in parallel due to high cost of context switching [15]. Recently, public cloud providers are deploying SmartNICs in an attempt to reduce load on host CPUs [ 14 ]. So far, these attempts have been limited to offloading ad-hoc tasks (like TCP offload, VXLAN tunneling, and overlay networking) to accelerate network processing of the hosts. However, modern SmartNICs, more specifically ASIC-based NICs, consist of hundreds of RISC processors (i.e., NPUs) [ 4 ], each with their own instruction store and local memory. These SmartNICs can run many discrete programs in parallel at high speeds and low latencies, unlike GPUs and FPGAs, which are optimized to accelerate a specific workload [11, 14, 16]. Thus, we presentλ−NIC, an open-source framework for running interactive serverless workloads on ASIC-based SmartNICs. λ−NIC leverages SmartNIC’s close proximity to the network and a large number of NPU cores to simultaneously run thousands of serverless workloads on a single NIC with predictable latency. To ease development and deployment of serverless compute workloads,λ−NIC exposes a new event-based programming abstraction, match+lambda, that allows developers to easily compose and execute serverless compute workloads on SmartNICs.

SIGCOMM Posters and Demos ’19, August 19–23, 2019, Beijing, China S. Choi et al.

2 PRELIMINARY RESULTS

To compare the performance ofλ−NIC versus existing serverless compute frameworks, we select OpenFaaS [ 13 ] as our baseline framework, as it is the most favorited open-source serverless framework and closely resembles existing serverless infrastructure. By default, OpenFaaS run user workloads within Docker [ 10 ] containers via Kubernetes [ 17 ].λ−NIC is built as an extension to the OpenFaaS, running users’ custom serverless workloads on SmartNICs [ 4 ], thereby naturally inheriting all of OpenFaaS’s features with the added support. In addition, to evaluate emerging frameworks like Isolate, we add support for running the workloads on a bare-metal backend, which runs the workloads as Linux process directly on the host OS. The overall architecture is in Figure 1. Workloads.We evaluateλ−NIC on three types of workloads, reflecting popular serverless compute usage patterns [5, 8]. a. Short workloads with no dependency.These workloads involves replying self-contained contents, such as a static web page. We evaluate a simple server that responds various static contents. b. Short workloads with external dependencies.These workloads request data from external data sources (e.g., database clients). These workloads generate extensive intra-data center requests and typically have strict tail-latency requirements. We evaluate memcached client workloads that each make SET and GET requests. c. Longer workloads.These workloads involve processing on larger piece of data (e.g., image or stream processing), The data required for such workloads is generally larger than a single packet, and is stored in the DRAM. While such workloads often do not have low latency requirements, they require higher throughput, which can benefit from more cores available on SmartNICs. We evaluate an image grayscaler to emulate a compute intensive workload. Evaluation Results.λ−NIC is more efficient in many aspects when compared to the baseline system. Theλ−NIC workload optimizer efficiently coalesces workloads to reduce the executable binary’s code size to fit in a single NIC core. After optimizations, λ−NIC is capable of achieving up to 880x improvements in workload response latency (Table 1) and 736x improvements in workcompletion times for comparably equivalent interactive workloads running within a container on OpenFaas (using 56 NPU cores vs. 28 CPU cores with 56 threads). In addition,λ−NIC is not affected by context switching, allowing it to process multiple workloads more efficiently (Table 2) compared to CPU based systems. Finally, λ−NIC reduces host CPU and memory usage (Table 3).

ACKNOWLEDGMENTS

We thank the members of The Stanford Platform Lab, Neeraja Yadwadkar, and the anonymous SIGCOMM reviewers for their valuable feedback that helped improve the quality of this paper. This research was supported by The Stanford Platform Lab. Muhammad Shahbaz was also supported by Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement number FA8650-18-2-7865. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official

λ−NIC Bare-Metal Container

Simple Server

AVG 0.05 1.64 (+32x) 45.56 (+880x) STD 0.01 0.11 0. 99th 0.08 1.88 47. Memcached GET

AVG 0.11 1.69 49. STD 0.01 0.08 0. 99th 0.16 1.88 51. Memcached SET

AVG 0.11 1.68 49. STD 0.01 0.08 0. 99th 0.15 1.88 51. Image Transform

AVG 199.80 656.20 (+3x) 943.00 (+5x) STD 3.74 20.00 24. 99th 202.50 756.50 1,033. Table 1: Latency (ms) for each workload.

λ−NIC Single Core Multi Core Total Time (s) 0.17 18.92 10. Table 2: Work completion time (s) for 10,000 requests sent across 56 threads while context switching.

λ−NIC Bare-Metal Container Host RAM (MB) 0 62.55 219. NIC RAM (MB) 63.25 0 0 Host CPU (Avg. %) 0.09 9.22 13. Table 3: Resource usage when running image transformer.

policies or endorsements, either expressed or implied, of AFRL and DARPA or the U.S. Government.

REFERENCES

[1] Isolate Class Reference. https://v8docs.nodesource.com/node-0.8/. [2]Apache OpenWhisk. https://openwhisk.apache.org/documentation.html, 2016. [3] Serverless Architectures with AWS Lambda. Tech. rep., AWS, November 2017. [4] Agilio CX SmartNICs. https://www.netronome.com/products/agilio-cx/, 2018. [5] Aws lambda developer guide. Tech. rep., Dec. 2018. [6] AWS Lambda Limits. https://docs.aws.amazon.com/lambda/, 2018. [7] Azure Functions. https://azure.microsoft.com/en-us/services/functions/, 2018. [8] Cloud function use cases. https://cloud.google.com/functions/use-cases/, 2018. [9] Google Cloud Functions. https://cloud.google.com/functions/, 2018. [10]What is a Container. https://www.docker.com/resources/what-container, 2018. [11]Bahrampour, S., Ramakrishnan, N., Schott, L., and Shah, M.Comparative Study of Caffe, Neon, Theano, and Torch for Deep Learning.CoRR(2015). [12]Bloom, Z.Serverless without Containers, Nov. 2018. https://blog.cloudflare.com/ cloud-computing-without-containers/. [13]Ellis, A.OpenFaaS. https://www.openfaas.com/, Dec. 2016. [14]Firestone, D., Putnam, A., Mundkur, S., Chiou, D., Dabagh, A., Andrewartha, M., Angepat, H., Bhanu, V., Caulfield, A., Chung, E., Chandrappa, H. K., Chaturmohta, S., Humphrey, M., Lavier, J., Lam, N., Liu, F., Ovtcharov, K., Padhye, J., Popuri, G., Raindel, S., Sapre, T., Shaw, M., Silva, G., Sivakumar, M., Srivastava, N., Verma, A., Zuhair, Q., Bansal, D., Burger, D., Vaid, K., Maltz, D. A., and Greenberg, A.Azure Accelerated Networking: SmartNICs in the Public Cloud. InUSENIX NSDI(2018). [15]Hruby, T., Crivat, T., Bos, H., and Tanenbaum, A. S.On Sockets and System Calls: Minimizing Context Switches for the Socket API. InUSENIX TRIOS(2014). [16]Putnam, A., Caulfield, A. M., Chung, E. S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G. P., Gray, J., et al. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services.IEEE Micro 35, 3 (2015), 10–22. [17]Rensin, D. K.Kubernetes - Scheduling the Future at Cloud Scale. 1005 Gravenstein Highway North Sebastopol, CA 95472, 2015. [18]Salah, T., Zemerly, M. J., Yeun, C. Y., Al-Qutayri, M., and Al-Hammadi, Y. Performance comparison between container-based and vm-based services. In DNAC ICIN(2017).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment