Google Summer of Code'24 Final Report

Name: Rudraksh Karpe
Project Repository: Anlytics Edge Ecosystem GenAI Workloads on Rancher using openSUSE Leap
Organization: openSUSE Project
Mentor: Bryan Gartner, Navin Chandra, Terry Smith, Ann Daivs

This report summarizes the work done by me in the Google Summer of Code 2024 program as a contributor for the the project Analytics Edge Ecosystem Workloads using Rancher at openSUSE Project.

Background 📚

The openSUSE project is a worldwide effort that promotes the use of Linux everywhere and creates one of the world's best Linux distributions. Analytics Edge Ecosystem Workloads (AEEW) is one of its open source projects to provide an open source deployment of well trained, tested, and functional AI/ML or Generative AI workload on the Edge for the business verticals such as Retail, Healthcare, Finance, etc. while leveraging Kubernetes and containerization technologies for efficient deployment and management.

Motivation ✨

The exponentially growing AI, ML and LLMs advancements bring concerns about privacy, as there is a risk of data exposure to online LLMs service providers. Setting up LLMs in-house requires a high computational cost which is a major obstacle for businesses across various sectors such as Retail, Healthcare, Finance, etc. These industries seek to leverage the power of LLMs to drive profitability in their overall business while maintaining control over their data.

Proposed Solution

Edge ecosystem has revolutionized analytics by analyzing data locally instead of sending it to a central server, enabling faster decision-making and real-time insights. This approach empowers businesses by adopting distributed compute infrastructure while leveraging computational resources near data generation source. This creates another possibility to run ML models/workloads, including LLMs, directly against this data where we can perform training, testing, validation as well as RAG to perform analytics at Edge.

This solution was designed to be independent of cloud services and could be deployed to edge devices with either CPU only or a combination of CPU and GPU, ensuring adaptability to diverse edge computing scenarios. The low latency inferencing provided by this system proved to be highly beneficial for various business verticals and their consumers, providers, and business partners, contributing to overall revenue growth while offering an enhanced experience to stakeholders.

In the process, edge devices were connected via a central orchestration system, such as Kubernetes (K8s) coupled with High-Performance Computing (HPC). Rancher by SUSE was effectively used to manage nodes in this process. For edge devices, K3s, known as the lightweight version of K8s, was employed due to its small binary size. Moreover, RKE and RKE2 were explored as alternative Kubernetes distributions.

Project Implementation 🛠

Data & RAG Pipelines Implementation:

Initially, the data was taken from raw file formats (.pdf, .csv, .md, .pptx, etc.) further these file formats were passed through parsers as Langchain/Llamaindex data loaders.

After loading, the data was pass through ETL pipelines to transform data into suitable formats, perform cleaning, normalize and load transformed data into a database or datastore.

Further down the pipeline, we performed text-splitting to break down text into smaller segments called chunks to fit these into the Embedding models.
Next to that embeddings were generated from these chunks which remains vector representations of data that capture its semantic meaning.
Eventually, embeddings were paased through semantic indexes, which were stored in the Knowledge base as a vector database. Further, these results will be ranked and fed into local LLM.
Queries will be performed over embeddings using semantic search to give answers to users questions.

Operations and orchestration Implementation:

Implemented a microservices-based architecture supporing both Docker and Podman, splitting the workload into two distinct containers: one for the API backend to handle RAG pipelines and LLM-generated responses, and another for user interface applications built using Flask/Streamlit.
Deployed these Docker containers as separate services on K3s and RKE/RKE2 clusters, exposing different network ports, with Rancher serving as the central Kubernetes management platform for streamlined operations and monitoring.

rudrakshkarpe/GSoC'24.md

Google Summer of Code'24 Final Report

Background 📚

Motivation ✨

Proposed Solution

Project Implementation 🛠

Complete Project Workflow 🔁