OMI All Hands – DWG Update – 03/27/2025

OMI Data Pipeline

Discussion around central repo ux with kent. Cheezy has begun refactor of front page
Jimmy returns! Was without internet for a few weeks.
Several PR's out there that need review.
Could still use more help on central repo.
Dr. Head providing some cool resizing utils for the pipelines, currently working on integrating

Merged

#184: Auth page and header design update
#178: Update JSONL test image urls
#177: Jsonl upload prototype

Peer Review

#186: Keep additional metadata information
#185: Fixes to aws infrastructure

Graphcap

We are in alpha testing! Currently working to onboard folks. Sign up at : (Alpha Test Form)[https://docs.google.com/forms/d/e/1FAIpQLSezl0Z2hPnW8zeNeCdzINiQWhwhR52yEIoCpwUUz2J7GlAiBw/viewform?usp=sharing]
Current primary focus is on setting up the system to handle distributed compute. Currently looking at an event based system with kafka.
Reworking the inference server to be an inference bridge to be completely devoid of state / configuration knowledge so it works well with distributed loads and setups.
Currently utilizing browser for caption storage while we sort out the distributed storage flow
Ollama support now working, their function calling and structured output is pretty rough compared to others, but it works. I get failures in place I don't in vllm. So more retry logic is needed.

I'll include some more info on the current architecture at the end of this update.

New Stuff

Initial UI Merged to main

Contains basic perspective captioning

Perspective Management

Repository & Project Status

Merged:

#27 : Graphcap Alpha Client
#28 : Perspective Library Wizard

Active

#32 : Provider Config & Inference Bridge

On Deck

Fixes for dataset upload
Content index pipeline & save to Postgres
Save annotations in Postgres
UX/Onboarding tweaks
Batch captioning / Synthesizer Workflow

Architecture & Technical Direction

Current Architecture

Graphcap follows a modular, service-based architecture designed for small to medium deployments with a local-first approach:

React Client: Serves as both the UI and system orchestrator
Data Service: Manages database operations via PostgreSQL
Inference Bridge: Stateless service that performs AI captioning
Media Server: Handles image storage and processing

┌─────────────────────────────────────┐
│           React Client              │
│           (Orchestrator)            │
└───────┬─────────┬─────────┬─────────┘
        │         │         │
        ▼         ▼         ▼
┌───────────┐ ┌───────────┐ ┌─────────────┐
│  Data     │ │ Inference │ │ Media Server│
│  Service  │ │ Bridge    │ │             │
└─────┬─────┘ └─────┬─────┘ └──────┬──────┘
      │             │              │
      ▼             │              │
┌──────────┐        │              │
│PostgreSQL│        │              │
└──────────┘        │              │
                    │              │
                    ▼              ▼
             ┌─────────────────────────┐
             │  Workspace Volume       │
             │  (Shared Storage)       │
             └─────────────────────────┘

Evolution with Kafka

As we scale beyond single-machine deployments, we're integrating Kafka to connect system components and enable distributed processing:

┌─────────────────┐                         ┌─────────────────┐
│                 │                         │                 │
│  React Client   │─────────────────────────│ Inference Bridge│
│  (Orchestrator) │                         │     Pool        │
│                 │◀────────────┐           └────────┬────────┘
└─────┬───────────┘             │                    │
      │                         │                    │
      │                         │                    │
      │                         │                    │ Process
      │                         │                    │ Requests
      │                         │                    │
      ▼                         │                    ▼
┌─────────────────┐    ┌────────┴────────┐    ┌─────────────────┐
│                 │    │                 │    │                 │
│  Data Service   │───▶│    Kafka        │◀──│  Inference      │
│                 │    │  Event Bus      │    │  Results        │
└─────┬───────────┘    └────────┬────────┘    └─────────────────┘
      │                         │                                
      │                         │                                
      ▼                         │                               
┌─────────────────┐             │                               
│   PostgreSQL    │◀────────────┘                               
│                 │                                            
└─────────────────┘

Why Kafka for Our Perspective System

Using Kafka as our event backbone provides several advantages:

Scalable Perspective Processing: Multiple Inference Bridges can process perspectives in parallel
Decoupled Components: Services can scale independently based on demand
Support for Perspective Pipelines: Process complex perspective chains (like generating several perspective results to generate synthesized captions across all images in a dataset)
Batch Job Processing: Our batch system can queue large numbers of images and perspectives for processing across distributed workers
Fault Tolerance: Work can be retried and resumed if any component fails

The batch job system we're building will track progress in PostgreSQL while allowing distributed processing through Kafka, making it possible to process thousands of images with multiple perspectives efficiently.

As we continue alpha testing, this architecture will help us move from browser-based storage to a fully distributed system that maintains performance as we scale.

Topic Structure

Kafka Topics for Graphcap

Core Event Topics

┌─────────────────────────────────────────────────────────────┐
│                     CAPTION PROCESSING                      │
├─────────────────────────────────────────────────────────────┤
│ captions.request                                            │
│ - Single image caption requests                             │
│ - Includes image path, perspective, provider info           │
├─────────────────────────────────────────────────────────────┤
│ captions.result                                             │
│ - Completed caption results                                 │
│ - Contains structured output based on perspective schema    │
├─────────────────────────────────────────────────────────────┤
│ captions.failed                                             │
│ - Failed caption attempts                                   │
│ - Includes error details for retry or reporting             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       BATCH PROCESSING                      │
├─────────────────────────────────────────────────────────────┤
│ jobs.created                                                │
│ - New batch job metadata                                    │
│ - Includes job type, configuration, priority                │
├─────────────────────────────────────────────────────────────┤
│ jobs.status                                                 │
│ - Job status changes (running, completed, failed)           │
│ - Progress updates and statistics                           │
├─────────────────────────────────────────────────────────────┤
│ job_items.pending                                           │
│ - Individual work items ready for processing                │
│ - High volume, partitioned by image collections             │
├─────────────────────────────────────────────────────────────┤
│ job_items.completed                                         │
│ - Processed work items with results                         │
├─────────────────────────────────────────────────────────────┤
│ job_items.failed                                            │
│ - Failed work items with error information                  │
└─────────────────────────────────────────────────────────────┘

Supporting Topics

┌─────────────────────────────────────────────────────────────┐
│                      MEDIA MANAGEMENT                       │
├─────────────────────────────────────────────────────────────┤
│ media.uploaded                                              │
│ - New images added to the system                            │
│ - Triggers preprocessing and thumbnail generation           │
├─────────────────────────────────────────────────────────────┤
│ media.processed                                             │
│ - Images that have completed preprocessing                  │
│ - Ready for captioning                                      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                       SYSTEM EVENTS                         │
├─────────────────────────────────────────────────────────────┤
│ system.logs                                                 │
│ - Centralized logging from all components                   │
│ - Structured with service name, level, message              │
├─────────────────────────────────────────────────────────────┤
│ system.metrics                                              │
│ - Performance and health metrics                            │
│ - Aggregated for monitoring and dashboards                  │
├─────────────────────────────────────────────────────────────┤
│ perspectives.updated                                        │
│ - Changes to perspective library                            │
│ - Triggers inference bridge reloads                         │
└─────────────────────────────────────────────────────────────┘

Topic Usage Patterns

For distributed captioning workflow:

React Client or batch job system publishes to captions.request
Inference Bridges consume from captions.request (partitioned for parallel processing)
After processing, bridges publish to captions.result or captions.failed
Data Service consumes results and updates database
React Client (or notification system) receives updates via jobs.status

For perspective dependency chains:

Use job_items.completed as trigger for dependent perspectives
Chain processors subscribe to job_items.completed and look for prerequisite perspectives
When prerequisites are found, new requests are published to captions.request
Chain recorded in batch system via batchJobDependencies table

This topic structure balances:

Throughput (high-volume topics partitioned for scale)
Organization (logical grouping by domain)
Observability (dedicated topics for failures and monitoring)
Flexibility (supporting different usage patterns)

fearnworks/dwg_3_27.md