Self-hosted search engine + conservative/Christian news aggregator with AI-powered ranking, semantic search, and instant answers.
graph TB
subgraph Internet
USER[👤 Users]
BRAVE[Brave Search API]
RSS[RSS/Atom Feeds<br/>20+ Sources]
WEBSITES[201 Conservative/Christian<br/>Web Domains]
end
subgraph "Traefik Ingress"
SEARCH_DOMAIN[search.nashgo.org]
NEWS_DOMAIN[news.nashgo.org]
end
subgraph "Kubernetes — hoarder.local"
subgraph "Search & Serve"
API1[kestrel-api<br/>Replica 1]
API2[kestrel-api<br/>Replica 2]
end
subgraph "Index Management"
WRITER[kestrel-writer<br/>Singleton]
end
subgraph "Data Ingestion"
CRAWLER0[kestrel-crawler<br/>Shard 0 · 100 domains]
CRAWLER1[kestrel-crawler<br/>Shard 1 · 101 domains]
NEWS[kestrel-news<br/>RSS Fetcher]
end
subgraph "Storage (NFS PVC)"
PAGE_IDX[(Tantivy<br/>PageIndex<br/>2,100+ pages)]
NEWS_IDX[(Tantivy<br/>NewsIndex<br/>686+ articles)]
FRONTIER0[(SQLite<br/>Frontier DB<br/>Shard 0)]
FRONTIER1[(SQLite<br/>Frontier DB<br/>Shard 1)]
NEWS_STATE[(SQLite<br/>News State)]
end
end
subgraph "coredump.local — 5090 GPU"
RANKER[kestrel-ranker<br/>Python/FastAPI]
OLLAMA[Ollama]
subgraph "Models"
QWEN3B[qwen2.5:3b<br/>Batch Reranking]
QWEN7B[qwen2.5:7b<br/>Instant Answers]
NOMIC[nomic-embed-text<br/>Semantic Search]
end
end
USER --> SEARCH_DOMAIN
USER --> NEWS_DOMAIN
SEARCH_DOMAIN --> API1
SEARCH_DOMAIN --> API2
NEWS_DOMAIN --> API1
NEWS_DOMAIN --> API2
API1 & API2 -->|keyword search| PAGE_IDX
API1 & API2 -->|keyword search| NEWS_IDX
API1 & API2 -->|meta-search| BRAVE
API1 & API2 -->|rerank + answer + embed| RANKER
RANKER --> OLLAMA
OLLAMA --> QWEN3B & QWEN7B & NOMIC
CRAWLER0 & CRAWLER1 -->|POST /index/page| WRITER
NEWS -->|POST /index/news| WRITER
WRITER -->|write| PAGE_IDX
WRITER -->|write| NEWS_IDX
CRAWLER0 --> FRONTIER0
CRAWLER1 --> FRONTIER1
CRAWLER0 & CRAWLER1 -->|fetch| WEBSITES
NEWS -->|fetch| RSS
NEWS --> NEWS_STATE
style API1 fill:#4a9eff,color:#fff
style API2 fill:#4a9eff,color:#fff
style WRITER fill:#ff9f43,color:#fff
style CRAWLER0 fill:#2ecc71,color:#fff
style CRAWLER1 fill:#2ecc71,color:#fff
style NEWS fill:#9b59b6,color:#fff
style RANKER fill:#e74c3c,color:#fff
style OLLAMA fill:#e74c3c,color:#fff
sequenceDiagram
participant U as User
participant C as Cache
participant API as kestrel-api
participant T as Tantivy
participant B as Brave API
participant R as Ranker
participant O as Ollama
U->>API: GET /search?q=education policy
API->>C: Check cache (1hr TTL)
alt Cache hit
C-->>API: Cached response
API-->>U: Results (instant)
else Cache miss
par Fan-out search
API->>T: Search PageIndex
T-->>API: Local results (5)
and
API->>T: Search NewsIndex
T-->>API: News results (3)
and
API->>B: Web search
B-->>API: Brave results (20)
end
API->>API: RRF Merge + Dedup + Blacklist
alt Query ≥ 3 words & no site: filter
par AI Enhancement
API->>R: POST /rerank (batch scoring)
R->>O: Single prompt, all results
O-->>R: Score array [8,3,9,1,5...]
R-->>API: Re-ranked results
and
API->>R: POST /answer (if question)
R->>O: qwen2.5:7b with context
O-->>R: Direct answer
R-->>API: Instant answer
end
end
API->>C: Store in cache
API-->>U: Results + Answer
end
graph LR
subgraph "StatefulSet (2 shards)"
C0[Crawler Shard 0<br/>Domains 0,2,4,6...]
C1[Crawler Shard 1<br/>Domains 1,3,5,7...]
end
subgraph "Per-Shard State"
F0[(Frontier DB 0)]
F1[(Frontier DB 1)]
end
subgraph "Anti-Detection"
UA[12 Browser UAs<br/>Chrome/Firefox/Safari/Edge]
HEADERS[Realistic Headers<br/>sec-ch-ua, Accept-Language]
RATE[Per-Host Rate Limit<br/>2s delay per domain]
CF[Cloudflare Detection<br/>403 + challenge HTML → 7d backoff]
end
subgraph "Crawl Pipeline"
ROBOTS[robots.txt<br/>Compliance]
FETCH[HTTP Fetch<br/>100 concurrent]
EXTRACT[Content Extraction<br/>readability + scraper]
WRITER[kestrel-writer<br/>POST /index/page]
end
C0 --> F0
C1 --> F1
C0 & C1 --> UA & HEADERS & RATE & CF
C0 & C1 --> ROBOTS --> FETCH --> EXTRACT --> WRITER
style C0 fill:#2ecc71,color:#fff
style C1 fill:#2ecc71,color:#fff
style CF fill:#e74c3c,color:#fff
graph LR
subgraph "Feed Sources (20+)"
F1[Daily Wire]
F2[National Review]
F3[Christian Post]
F4[Gospel Coalition]
FN[... 16 more]
end
subgraph "Categories"
CAT_F[faith]
CAT_P[politics]
CAT_O[opinion]
CAT_C[culture]
end
subgraph "kestrel-news"
FETCH[RSS/Atom Fetch<br/>15 min cycle]
ETAG[ETag/If-Modified<br/>Caching]
DEDUP[URL Dedup<br/>SQLite seen_urls]
EXTRACT[Content Extract<br/>readability]
end
F1 & F2 & F3 & F4 & FN --> FETCH
FETCH --> ETAG --> DEDUP --> EXTRACT
EXTRACT -->|POST /index/news| WRITER[kestrel-writer]
F1 & F2 --> CAT_P
F3 & F4 --> CAT_F
style FETCH fill:#9b59b6,color:#fff
graph TB
subgraph "kestrel-ranker (Python/FastAPI)"
RERANK[POST /rerank<br/>Batch scoring]
ANSWER[POST /answer<br/>Instant answers]
EMBED[POST /embed<br/>Vector embeddings]
SEMANTIC[POST /semantic-search<br/>Cosine similarity]
end
subgraph "Ollama (coredump.local:11434)"
QWEN3B[qwen2.5:3b<br/>1.9 GB · Reranking]
QWEN7B[qwen2.5:7b<br/>4.7 GB · Answers]
NOMIC[nomic-embed-text<br/>0.3 GB · Embeddings]
end
subgraph "Fallback"
OR[OpenRouter API<br/>If Ollama down]
end
RERANK -->|1 prompt, N results<br/>~500ms| QWEN3B
ANSWER -->|query + top-3 context<br/>~2-3s| QWEN7B
EMBED -->|batch text → vectors<br/>~100ms| NOMIC
SEMANTIC --> EMBED
RERANK -.->|fallback| OR
ANSWER -.->|fallback| OR
style RERANK fill:#e74c3c,color:#fff
style ANSWER fill:#e74c3c,color:#fff
style EMBED fill:#e74c3c,color:#fff
graph TB
subgraph "hoarder.local (MicroK8s)"
subgraph "kestrel namespace"
API[API x2]
WRITER[Writer x1]
CRAWLERS[Crawlers x2]
NEWS_POD[News x1]
RANKER_POD[Ranker x1]
end
subgraph "Observability"
PROMETHEUS[Prometheus]
GRAFANA[Grafana]
TEMPO[Tempo<br/>Distributed Tracing]
LOKI[Loki<br/>Log Aggregation]
end
subgraph "Ingress"
TRAEFIK[Traefik]
end
subgraph "Storage"
NFS[(Synology NFS<br/>20TB backing)]
end
subgraph "Secrets"
BSM[Bitwarden Secrets Manager<br/>CSI Provider]
end
end
subgraph "coredump.local (Void Linux)"
GPU[NVIDIA RTX 5090<br/>32GB VRAM]
OLLAMA_SVC[Ollama Service]
end
subgraph "External"
GHCR[GitHub Container Registry<br/>5 Docker images]
BRAVE_EXT[Brave Search API]
GH[GitHub<br/>kvncrw/kestrel]
end
API --> TEMPO
CRAWLERS --> TEMPO
NEWS_POD --> TEMPO
RANKER_POD --> OLLAMA_SVC
OLLAMA_SVC --> GPU
BSM -->|Brave API key| API
style GPU fill:#76b900,color:#fff
style TEMPO fill:#ff6b35,color:#fff
style NFS fill:#0078d4,color:#fff
| Layer | Technology | Purpose |
|---|---|---|
| Language | Rust (2024 edition) | API, Writer, Crawler, News |
| AI Service | Python 3.12 / FastAPI | Ranker, Embeddings, Answers |
| Web Framework | Axum | Async HTTP server |
| Search Index | Tantivy | Full-text search (BM25) |
| State Management | SQLite (rusqlite) | Frontier queues, feed state |
| Templates | Askama | Compile-time HTML rendering |
| LLM Inference | Ollama | Local GPU model serving |
| Models | qwen2.5 (3b/7b), nomic-embed-text | Ranking, Answers, Embeddings |
| Container Runtime | MicroK8s / containerd | Kubernetes orchestration |
| Ingress | Traefik | TLS termination, routing |
| Storage | Synology NFS (20TB) | Persistent index + state |
| Secrets | Bitwarden Secrets Manager | CSI-mounted API keys |
| Tracing | OpenTelemetry → Tempo | Distributed trace pipeline |
| CI/CD | GitHub Actions | Lint, test, build, push |
| Registry | GitHub Container Registry | 5 Docker images |
| Metric | Value |
|---|---|
| Search latency (cached) | <5ms |
| Search latency (uncached, no rerank) | ~200-500ms |
| Search latency (with AI rerank) | ~700-1500ms |
| Rerank throughput | ~2-4 req/sec (batch scoring) |
| Crawl rate | ~100 concurrent × 2 shards |
| Index size | 2,100+ pages, 686+ articles (growing) |
| Domains tracked | 201 |
| News feeds | 20+ (15 min cycle) |
| Test coverage | 179 tests (123 Rust + 56 Python) |
| API replicas | 2 (HPA to 6) |
pie title Crawl Domains by Category (201 total)
"Conservative News" : 50
"Christian / Faith" : 30
"Think Tanks" : 25
"Alt Media" : 27
"Homeschool" : 17
"Reference" : 18
"2nd Amendment" : 6
"Pro-Life" : 7
"Legal / Policy" : 6
"Tech Conservative" : 15
- GitHub: kvncrw/kestrel (private)
- Live: search.nashgo.org · news.nashgo.org
- Branding: Configurable via
KESTREL_BRAND_NAME/KESTREL_BRAND_ACCENT_COLOR