Skip to content

Instantly share code, notes, and snippets.

@rudrakshkarpe
Last active January 1, 2025 19:35
Show Gist options
  • Save rudrakshkarpe/752c22f7448a5d626e6172a18e0117a5 to your computer and use it in GitHub Desktop.
Save rudrakshkarpe/752c22f7448a5d626e6172a18e0117a5 to your computer and use it in GitHub Desktop.
Google Summer of Code Final Report: Analytics Edge Ecosystem GenAI workloads

Google Summer of Code'24 Final Report

This report summarizes the work done by me in the Google Summer of Code 2024 program as a contributor for the the project Analytics Edge Ecosystem Workloads using Rancher at openSUSE Project.

Background πŸ“š

The openSUSE project is a worldwide effort that promotes the use of Linux everywhere and creates one of the world's best Linux distributions. Analytics Edge Ecosystem Workloads (AEEW) is one of its open source projects to provide an open source deployment of well trained, tested, and functional AI/ML or Generative AI workload on the Edge for the business verticals such as Retail, Healthcare, Finance, etc. while leveraging Kubernetes and containerization technologies for efficient deployment and management.

Motivation ✨

The exponentially growing AI, ML and LLMs advancements bring concerns about privacy, as there is a risk of data exposure to online LLMs service providers. Setting up LLMs in-house requires a high computational cost which is a major obstacle for businesses across various sectors such as Retail, Healthcare, Finance, etc. These industries seek to leverage the power of LLMs to drive profitability in their overall business while maintaining control over their data.

Proposed Solution

Edge ecosystem has revolutionized analytics by analyzing data locally instead of sending it to a central server, enabling faster decision-making and real-time insights. This approach empowers businesses by adopting distributed compute infrastructure while leveraging computational resources near data generation source. This creates another possibility to run ML models/workloads, including LLMs, directly against this data where we can perform training, testing, validation as well as RAG to perform analytics at Edge.

This solution was designed to be independent of cloud services and could be deployed to edge devices with either CPU only or a combination of CPU and GPU, ensuring adaptability to diverse edge computing scenarios. The low latency inferencing provided by this system proved to be highly beneficial for various business verticals and their consumers, providers, and business partners, contributing to overall revenue growth while offering an enhanced experience to stakeholders.

In the process, edge devices were connected via a central orchestration system, such as Kubernetes (K8s) coupled with High-Performance Computing (HPC). Rancher by SUSE was effectively used to manage nodes in this process. For edge devices, K3s, known as the lightweight version of K8s, was employed due to its small binary size. Moreover, RKE and RKE2 were explored as alternative Kubernetes distributions.

Project Implementation πŸ› 

Data & RAG Pipelines Implementation:

  • Initially, the data was taken from raw file formats (.pdf, .csv, .md, .pptx, etc.) further these file formats were passed through parsers as Langchain/Llamaindex data loaders.

data_parser

  • After loading, the data was pass through ETL pipelines to transform data into suitable formats, perform cleaning, normalize and load transformed data into a database or datastore.

data_parser

  • Further down the pipeline, we performed text-splitting to break down text into smaller segments called chunks to fit these into the Embedding models.
  • Next to that embeddings were generated from these chunks which remains vector representations of data that capture its semantic meaning.
  • Eventually, embeddings were paased through semantic indexes, which were stored in the Knowledge base as a vector database. Further, these results will be ranked and fed into local LLM.
  • Queries will be performed over embeddings using semantic search to give answers to users questions.

data_parser

Operations and orchestration Implementation:

  • Implemented a microservices-based architecture supporing both Docker and Podman, splitting the workload into two distinct containers: one for the API backend to handle RAG pipelines and LLM-generated responses, and another for user interface applications built using Flask/Streamlit.

  • Deployed these Docker containers as separate services on K3s and RKE/RKE2 clusters, exposing different network ports, with Rancher serving as the central Kubernetes management platform for streamlined operations and monitoring.

data_parser

Complete Project Workflow πŸ”

data_parser

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment