- What are Vector Databases, Embeddings, and RAG?
- Why they matter in modern AI applications
- How the DevOps AI Toolkit uses these technologies for intelligent pattern matching
- Traditional databases vs Vector databases
- Storing and searching high-dimensional data
- Similarity search using distance metrics (cosine similarity)
- Key benefits: semantic search, scalability, performance
- Converting text to numerical representations
- Capturing semantic meaning in vector space
- Embedding models (OpenAI's text-embedding-3-small)
- Dimension reduction and information density
- Combining retrieval systems with generative AI
- Enhancing AI responses with relevant context
- Reducing hallucinations and improving accuracy
- Dynamic knowledge base integration
graph TB
subgraph "Developer Interaction"
D[Developer] -->|1 Deploy a stateless Go API| AI[AI Assistant]
end
subgraph "Pattern Creation Flow - Happens First"
PE[Platform Engineer] -->|A1 Creates Pattern| MCP[MCP Tool]
MCP -->|A2 Step-by-Step Workflow| PS[Pattern Service]
PS -->|A3 Generate Embedding| ES[Embedding Service]
PS -->|A5 Store Pattern + Vector| VDB[(Qdrant Vector DB)]
end
subgraph "RAG Recommendation Flow - When User Requests Deployment"
AI -->|2 Extract Intent| RR[Resource Recommender]
%% Pattern Search Branch
RR -->|3a Extract Concepts| CE[Concept Extractor]
CE -->|5 Concepts| RR
RR -->|6 Search Patterns| PVS[Pattern Vector Service]
PVS -->|7a Keyword Search| VDB
PVS -->|7b Semantic Search| ES2[Embedding Service]
ES2 -->|10 Query Vector| VDB
VDB -->|11 Similar Patterns| PVS
PVS -->|12 Ranked Patterns| RR
RR -->|15 Selected Resources| RR
%% Final Recommendation
AI -->|20 Deployment Manifests| D
end
%% External Services
Claude[Claude AI]
OAI[OpenAI API]
K8S[(Kubernetes API)]
%% Pattern Creation connections to AI
ES -->|A4 Text to Vector| OAI
%% RAG Flow connections to AI
CE -->|3b Extract Concepts| Claude
Claude -->|4 Returns concepts| CE
ES2 -->|8 Generate Query Embedding| OAI
OAI -->|9 Vector Response| ES2
RR -->|14 Select Candidates| Claude
Claude -->|15 Candidates| RR
RR -->|18 Generate Recommendation| Claude
Claude -->|19 Enhanced Response| AI
%% Kubernetes API connections - INEFFICIENT!
RR -->|3c Get ALL Resources| K8S
K8S -->|13 Resource List| RR
RR -->|16 For EACH Resource| K8S
K8S -->|17 Detailed Schema| RR
style VDB fill:#8b008b,stroke:#333,stroke-width:2px,color:#fff
style ES fill:#4169e1,stroke:#333,stroke-width:2px,color:#fff
style ES2 fill:#4169e1,stroke:#333,stroke-width:2px,color:#fff
style PVS fill:#228b22,stroke:#333,stroke-width:2px,color:#fff
style K8S fill:#dc143c,stroke:#333,stroke-width:3px,color:#fff
style Claude fill:#4b0082,stroke:#333,stroke-width:2px,color:#fff
style OAI fill:#4b0082,stroke:#333,stroke-width:2px,color:#fff
The diagram highlights two major inefficiencies (shown in red):
-
Step 3b (Get ALL Resources): The system fetches ALL available Kubernetes resources using
kubectl api-resources
, even though only a few will be selected. This includes hundreds of resources in a typical cluster. -
Steps 15-16 (Individual Schema Fetching): For EACH selected resource, the system makes a separate call to
kubectl explain resource --recursive
. This results in multiple round trips to the Kubernetes API.
Critical finding: The current pattern matching system (steps 3a-11) does NOT reduce Kubernetes API calls. It only:
- Provides additional context to Claude AI for better selection
- Influences which resources are selected from the FULL list
- Adds organizational knowledge to the final recommendation
But the system STILL:
- Fetches ALL resources from Kubernetes (step 3b)
- Makes individual schema calls for EACH selected resource (steps 15-16)
A proper RAG implementation for Kubernetes resources would:
-
Pre-index all Kubernetes resources in Vector DB:
- Store resource definitions with embeddings
- Include schema information to avoid runtime fetching
- Update periodically or on-demand
-
Semantic search for resources:
- User intent: "deploy a stateless Go API"
- Vector search returns: Deployment, Service, HPA (without calling K8s API)
- Skip the "fetch ALL resources" step entirely
-
Benefits:
- No need to fetch ALL resources
- No need for individual schema calls
- Faster response times
- Reduced load on Kubernetes API
graph LR
subgraph "Current: Inefficient"
A1[User Intent] --> B1[Fetch ALL Resources]
B1 --> C1[AI Selects Few]
C1 --> D1[Fetch Each Schema]
end
subgraph "Proposed: Efficient RAG"
A2[User Intent] --> B2[Vector Search]
B2 --> C2[Returns Relevant Resources]
C2 --> D2[Schemas Already Cached]
end
style B1 fill:#dc143c,color:#fff
style D1 fill:#dc143c,color:#fff
style B2 fill:#228b22,color:#fff
style D2 fill:#228b22,color:#fff
classDiagram
class VectorDBService {
-client: QdrantClient
-collectionName: string
+initializeCollection(vectorSize)
+upsertDocument(document)
+searchSimilar(vector, options)
+searchByKeywords(keywords, options)
+healthCheck()
}
class EmbeddingService {
-provider: OpenAIEmbeddingProvider
+generateEmbedding(text)
+isAvailable()
+getDimensions()
+createPatternSearchText(pattern)
}
class PatternVectorService {
-vectorDB: VectorDBService
-embeddingService: EmbeddingService
+storePattern(pattern)
+searchPatterns(query, options)
+hybridSearch(query, keywords)
+keywordOnlySearch(keywords)
}
class OrganizationalPattern {
+id: string
+description: string
+triggers: string[]
+suggestedResources: string[]
+rationale: string
+createdAt: string
+createdBy: string
}
PatternVectorService --> VectorDBService
PatternVectorService --> EmbeddingService
PatternVectorService --> OrganizationalPattern
- Platform engineer initiates pattern creation via MCP tool
- Step-by-step workflow collects pattern details
- Pattern data structure includes:
- Description and rationale
- Trigger keywords for matching
- Suggested Kubernetes resources
- Metadata (creator, timestamp)
Platform Engineer: I want to create a new organizational deployment pattern
AI Assistant: I'll help you create a new organizational deployment pattern. Let me guide you through the process.
**Pattern Description**
Please provide a clear description of this deployment pattern. What type of application or service does it support?
Platform Engineer: This is a pattern for deploying stateless web applications that need high availability and auto-scaling
AI Assistant: Great! Now I need some trigger keywords.
**Trigger Keywords**
What keywords or phrases should trigger this pattern? When users mention these terms, this pattern will be suggested.
Please provide keywords or phrases, for example:
- "stateless app"
- "web application"
- "REST API"
Platform Engineer: stateless app, web application, api service, microservice, rest api, http service
AI Assistant: Perfect! Now let's define the Kubernetes resources.
**Suggested Resources**
Which Kubernetes resources should be included in this pattern? These will be recommended when this pattern matches.
Common examples: Deployment, Service, ConfigMap, Secret, HorizontalPodAutoscaler, NetworkPolicy, Ingress
Platform Engineer: Deployment, Service, HorizontalPodAutoscaler, NetworkPolicy
AI Assistant: Excellent choices! Finally, I need a rationale.
**Pattern Rationale**
Please explain why this combination of resources is recommended. This helps users understand the pattern's value.
Platform Engineer: This pattern provides a robust foundation for stateless applications with automatic scaling based on load, service discovery, and network security. The HPA ensures the application scales with demand, while NetworkPolicy provides defense-in-depth security.
AI Assistant: Perfect! I've created your organizational pattern:
β
**Pattern created successfully!**
- **ID**: pat_2025-01-30T154823_a7b9c3d2
- **Description**: Pattern for deploying stateless web applications that need high availability and auto-scaling
- **Triggers**: stateless app, web application, api service, microservice, rest api, http service
- **Resources**: Deployment, Service, HorizontalPodAutoscaler, NetworkPolicy
- **Rationale**: Provides robust foundation with auto-scaling and security
This pattern will now be used to enhance deployment recommendations when users mention stateless applications.
// From src/core/pattern-types.ts
interface OrganizationalPattern {
id: string;
description: string;
triggers: string[]; // ["stateless app", "web api", "microservice"]
suggestedResources: string[]; // ["Deployment", "Service", "HPA"]
rationale: string;
createdAt: string;
createdBy: string;
}
sequenceDiagram
participant PS as Pattern Service
participant ES as Embedding Service
participant OAI as OpenAI API
participant VDB as Vector DB
PS->>ES: 1 createPatternSearchText(pattern)
Note over ES: Combines description,<br/>triggers, resources,<br/>rationale into text
ES->>OAI: 2 generateEmbedding(searchText)
OAI-->>ES: 3 Returns 1536-dimension vector
ES-->>PS: 4 embedding vector
PS->>VDB: 5 upsertDocument with<br/>id, payload, vector
Note over VDB: Stores pattern with<br/>semantic embedding
- Text preparation combines all pattern fields
- OpenAI's text-embedding-3-small model generates 1536-dimension vectors
- Graceful fallback to keyword-only search if embedding fails
graph LR
subgraph "Search Input"
Q[Query: deploy golang api] -->|1 Process| QP[Query Processor]
end
subgraph "Dual Search Strategy"
QP -->|2a Keywords| KS[Keyword Search]
QP -->|2b Full Text| SS[Semantic Search]
KS -->|3a Extract Keywords| KW[golang, api]
SS -->|3b Generate Embedding| QE[Query Embedding]
KW -->|4a Match Triggers| VDB1[(Vector DB)]
QE -->|4b Cosine Similarity| VDB2[(Vector DB)]
end
subgraph "Result Combination"
VDB1 -->|5a Keyword Results| RC[Result Combiner]
VDB2 -->|5b Semantic Results| RC
RC -->|6 Weighted Scoring| FR[Final Rankings]
end
style RC fill:#ffa,stroke:#333,stroke-width:2px
- Semantic Search (70% weight): Uses vector similarity for conceptual matching
- Keyword Search (30% weight): Exact and partial trigger matching
- Hybrid Scoring: Combines both approaches for optimal results
// From src/core/pattern-vector-service.ts
private combineHybridResults(
semanticResults: any[],
keywordResults: any[],
queryKeywords: string[],
options: { limit: number; scoreThreshold: number }
): PatternSearchResult[] {
// Semantic results get 70% weight
resultMap.set(result.id, {
pattern,
score: semanticScore * 0.7,
matchType: 'semantic'
});
// Keyword results get 30% weight (or 100% if keyword-only)
if (resultMap.has(result.id)) {
// Hybrid result - combine scores
existing.score + (keywordScore * 0.3)
} else {
// Keyword-only result - full score
score: keywordScore
}
}
sequenceDiagram
participant U as User
participant RR as Resource Recommender
participant CE as Concept Extractor
participant PS as Pattern Service
participant Claude as Claude AI
U->>RR: 1 Deploy a stateless Golang API
RR->>CE: 2 extractDeploymentConcepts(intent)
CE->>Claude: 3 Analyze intent for concepts
Claude-->>CE: 4 Returns concepts:<br/>stateless application<br/>golang application
CE-->>RR: 5 Deployment concepts
loop For each concept
RR->>PS: 6 searchPatterns(keywords)
PS-->>RR: 7 Matching patterns
end
RR->>RR: 8 deduplicateAndRankPatterns()
RR->>Claude: 9 Generate recommendation with<br/>user intent, resources, patterns
Claude-->>RR: 10 Enhanced recommendation
RR-->>U: 11 Deployment solution
- Patterns provide organizational context to AI
- AI considers but isn't constrained by patterns
- Pattern suggestions appear in recommendation prompts
- Multi-concept matching for comprehensive coverage
## Organizational Patterns
The following organizational patterns may be relevant:
- ID: pat_2025-01-30T123456_abc123
Description: Standard pattern for stateless web applications
Suggested Resources: Deployment, Service, HorizontalPodAutoscaler
Rationale: Provides scalability and high availability for stateless workloads
Triggers: stateless app, web application, api service
- ID: pat_2025-01-30T123457_def456
Description: Network security pattern for public-facing services
Suggested Resources: NetworkPolicy, Ingress
Rationale: Enforces network isolation and secure external access
Triggers: external access, public api, network security
- Traditional Search: Exact keyword matching only
- Vector Search: Understands "golang api" β "go microservice" β "REST service in Go"
- Real Example: User searches "deploy golang api", finds patterns for "Go microservice" and "REST API deployment"
- Platform teams encode best practices as patterns
- Patterns automatically influence all future deployments
- Consistent application of organizational standards
- Knowledge sharing across teams
graph TD
A[Search Request] -->|1 Check Services| B{Embedding<br/>Available?}
B -->|2a Yes| C[Hybrid Search]
B -->|2b No| D[Keyword Search]
C -->|3a Semantic + Keywords| E[Ranked Results]
D -->|3b Keywords Only| E
E -->|4 Pattern Recommendations| F[Pattern Recommendations]
style B fill:#ff8c00,stroke:#333,stroke-width:2px,color:#fff
- System gracefully degrades without embeddings
- Keyword search still provides valuable results
- No hard dependencies on external services
- Vector Dimensions: Automatic handling of dimension mismatches
- Collection Management: Self-healing on corruption
- Caching: Patterns loaded once per recommendation
- Batch Processing: Multiple concept searches in parallel
Developer: I want to deploy a stateless Golang web application
AI Assistant: I'll help you deploy a stateless Golang web application. Let me analyze your requirements and find the best solution.
[Behind the scenes: System extracts concepts, searches patterns, finds 2 matching patterns]
I found 2 deployment solutions that match your needs:
**Solution 1: Kubernetes Stateless Application** β Recommended
Deploy a stateless Golang web application using Kubernetes native resources with auto-scaling and monitoring.
**Solution 2: Crossplane Composite Application**
Use Crossplane to create a composable infrastructure for your Golang application with cloud-native resources.
Which solution would you like to proceed with?
Developer: Let's go with solution 1
AI Assistant: Great choice! I'll guide you through configuring your Kubernetes stateless application.
**Application Details**
What is the name of your Golang application?
Developer: payment-api
AI Assistant: **Container Image**
What is the container image for your payment-api? (e.g., myregistry/payment-api:v1.0.0)
Developer: gcr.io/myproject/payment-api:v2.1.0
AI Assistant: **Namespace**
Which Kubernetes namespace should the payment-api be deployed to?
Developer: production
AI Assistant: **Port Configuration**
What port does your payment-api listen on?
Developer: 8080
AI Assistant: Now let's configure some optional settings.
**Replicas**
How many replicas would you like to start with? (default: 2)
Developer: 3
AI Assistant: **Auto-scaling**
Would you like to enable auto-scaling? (yes/no)
Developer: yes
AI Assistant: **Auto-scaling Configuration**
- Target CPU utilization percentage? (default: 70)
Developer: 80
AI Assistant: - Maximum number of replicas? (default: 10)
Developer: 20
AI Assistant: Perfect! Configuration complete. Generating your deployment manifests...
[Behind the scenes: Patterns influenced the inclusion of HPA and NetworkPolicy]
Here are your deployment manifests:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
namespace: production
labels:
app: payment-api
tier: backend
managed-by: dot-ai
spec:
replicas: 3
selector:
matchLabels:
app: payment-api
template:
metadata:
labels:
app: payment-api
spec:
containers:
- name: payment-api
image: gcr.io/myproject/payment-api:v2.1.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
name: payment-api
namespace: production
spec:
selector:
app: payment-api
ports:
- port: 8080
targetPort: 8080
protocol: TCP
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: payment-api
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-api
namespace: production
spec:
podSelector:
matchLabels:
app: payment-api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 8080
Would you like me to deploy these manifests to your cluster?
### Example 2: Pattern Matching Behind the Scenes
#### Developer Intent Analysis
1. **Developer Request**: "I want to deploy a stateless Golang web application"
2. **Concept Extraction**:
```json
{
"concepts": [
{
"category": "application_architecture",
"concept": "stateless application",
"importance": "high",
"keywords": ["stateless", "application", "web"]
},
{
"category": "programming_language",
"concept": "golang application",
"importance": "medium",
"keywords": ["golang", "go"]
}
]
}
-
Pattern Search Process:
π Searching for patterns matching: "stateless application web" β Found Pattern: pat_2025-01-30T154823_a7b9c3d2 (Score: 0.92) - Matched triggers: ["stateless app", "web application"] - Match type: hybrid (semantic + keyword) π Searching for patterns matching: "golang go" β Found Pattern: pat_2025-01-30T160234_b8c4d5e3 (Score: 0.78) - Matched triggers: ["go microservice"] via semantic similarity - Match type: semantic
-
AI Prompt Enhancement:
## Organizational Patterns The following organizational patterns may be relevant: - ID: pat_2025-01-30T154823_a7b9c3d2 Description: Pattern for stateless web applications with HA and auto-scaling Suggested Resources: Deployment, Service, HorizontalPodAutoscaler, NetworkPolicy Rationale: Provides robust foundation with auto-scaling and security - ID: pat_2025-01-30T160234_b8c4d5e3 Description: Go microservice deployment pattern Suggested Resources: Deployment, ConfigMap, Service Rationale: Optimized for Go applications with proper config management
Developer: Show me patterns for database deployments
AI Assistant: Searching for relevant deployment patterns...
[Behind the scenes: OpenAI API unavailable, falling back to keyword search]
Found 1 organizational pattern:
**PostgreSQL Database Pattern**
- **Description**: Production-ready PostgreSQL deployment with persistence
- **Triggers**: database, postgresql, postgres, persistent storage
- **Resources**: StatefulSet, PersistentVolumeClaim, Service, Secret
- **Rationale**: Ensures data persistence and ordered pod management for databases
[Note: Using keyword matching - semantic search temporarily unavailable]
Platform Engineer: Check the system health
AI Assistant: Running comprehensive system diagnostics...
## System Health Report
### Core Services
β
**Kubernetes API**: Connected (v1.28.3)
β
**Anthropic Claude API**: Available
### Pattern Management
β
**Vector Database (Qdrant)**: Connected
- URL: http://localhost:6333
- Collection: patterns
- Documents: 15
- Health: Operational
### Embedding Service
β
**OpenAI Embeddings**: Available
- Model: text-embedding-3-small
- Dimensions: 1536
- Status: Operational
### Search Capabilities
β
**Hybrid Search**: Enabled (Semantic + Keyword)
- Semantic weight: 70%
- Keyword weight: 30%
All systems operational. Pattern-enhanced recommendations fully available.
Platform Engineer: Show me all organizational patterns
AI Assistant: Here are all organizational patterns in the system:
**1. Stateless Application Pattern**
- **ID**: pat_2025-01-30T154823_a7b9c3d2
- **Description**: Pattern for stateless web applications with HA and auto-scaling
- **Triggers**: stateless app, web application, api service, microservice, rest api
- **Resources**: Deployment, Service, HorizontalPodAutoscaler, NetworkPolicy
- **Created**: 2025-01-30 by platform-team
**2. Go Microservice Pattern**
- **ID**: pat_2025-01-30T160234_b8c4d5e3
- **Description**: Go microservice deployment pattern
- **Triggers**: go microservice, golang service, go api
- **Resources**: Deployment, ConfigMap, Service
- **Created**: 2025-01-30 by platform-team
**3. PostgreSQL Database Pattern**
- **ID**: pat_2025-01-30T162145_c9d5e6f4
- **Description**: Production-ready PostgreSQL deployment
- **Triggers**: database, postgresql, postgres, persistent storage
- **Resources**: StatefulSet, PersistentVolumeClaim, Service, Secret
- **Created**: 2025-01-30 by dba-team
Total: 3 patterns stored
Platform Engineer: Delete the Go microservice pattern
AI Assistant: I'll delete the Go Microservice Pattern (ID: pat_2025-01-30T160234_b8c4d5e3).
β
Pattern successfully deleted.
This pattern will no longer influence deployment recommendations. The remaining patterns are still active.
// From src/core/vector-db-service.ts
async upsertDocument(document: VectorDocument): Promise<void> {
const point: any = {
id: document.id,
payload: document.payload
};
// Handle documents with or without embeddings
if (document.vector && document.vector.length > 0) {
point.vector = document.vector;
} else {
// Zero vector placeholder for keyword-only patterns
point.vector = new Array(384).fill(0);
}
await this.client.upsert(this.collectionName, {
wait: true,
points: [point]
});
}
// From src/core/embedding-service.ts
async generateEmbedding(text: string): Promise<number[] | null> {
if (!this.isAvailable()) {
return null; // Graceful fallback
}
try {
return await this.provider!.generateEmbedding(text);
} catch (error) {
console.warn('Embedding generation failed, falling back to keyword search:', error);
return null;
}
}
// From src/core/schema.ts
private async searchRelevantPatterns(intent: string): Promise<OrganizationalPattern[]> {
// Extract multiple concepts from user intent
const concepts = await this.extractDeploymentConcepts(intent);
// Search patterns for each concept
const allPatternMatches: PatternMatch[] = [];
for (const concept of concepts) {
const searchResults = await this.patternService.searchPatterns(
concept.keywords.join(' '),
{ limit: 10 }
);
// Weight by concept importance
const matches = searchResults.map(result => ({
pattern: result.pattern,
score: result.score * this.getConceptImportanceWeight(concept.importance),
matchedConcept: concept,
matchType: result.matchType
}));
allPatternMatches.push(...matches);
}
// Deduplicate and rank by combined score
return this.deduplicateAndRankPatterns(allPatternMatches);
}
- Problem: Existing collection has different vector dimensions
- Solution: Automatic collection recreation with correct dimensions
- Code Reference:
src/core/vector-db-service.ts:88-104
- Problem: OpenAI API key not configured
- Solution: System falls back to keyword-only search
- Impact: Still functional, slightly less semantic understanding
- Problem: User intent doesn't match any patterns
- Solution: AI provides recommendations without pattern context
- Behavior: Pure AI recommendations based on cluster resources
- Pattern learning from successful deployments
- Feedback loop for pattern effectiveness
- Cross-cluster pattern sharing
- Domain-specific embedding models
- Pattern caching strategies
- Embedding batch generation
- Distributed vector search
- Query result caching
- Multi-language pattern support
- Fuzzy matching improvements
- Contextual re-ranking
- User preference learning
- Vector DBs enable semantic understanding beyond keywords
- Embeddings capture meaning in mathematical space
- RAG combines retrieval with generation for better AI responses
- DevOps AI Toolkit demonstrates practical implementation
- Hybrid search provides best of both worlds
- Graceful degradation ensures reliability
- Organizational patterns enhance without constraining AI