"Securing the future through integrated intelligence." β Tanya Mushonga
Sky Marshal is a full-stack, end-to-end smart surveillance platform built around a swarm of low-cost ESP32-CAM edge nodes. The system combines embedded firmware, event-driven microservices, a real-time computer vision pipeline, and cross-platform client applications into a cohesive, production-grade architecture.
This document is the canonical technical reference for the entire ecosystem.
- Project Overview
- Ecosystem & Repositories
- Tech Stack & Rationale
- System Architecture
- Data Flow: End-to-End
- API Reference
- Data Models
- Edge Firmware (ESP32-CAM)
- Core Design Decisions
- Security Model
- Deployment Overview
- Roadmap
Sky Marshal addresses a common gap in surveillance systems: the disconnect between low-cost edge hardware, cloud processing, and real-time operator interfaces. Most solutions are either expensive proprietary systems or fragile DIY setups with no coherent architecture.
Sky Marshal bridges this gap by:
- Using ESP32-CAM modules (~$5 each) as distributed edge nodes that are Wi-Fi enabled and require no special drivers.
- Routing all data through an event-driven Kafka pipeline to decouple ingestion from processing.
- Delivering annotated, real-time video to operators via WebSocket on both a web dashboard and a mobile app.
- Keeping control simple: the edge nodes poll for their configuration, making them resilient to network interruptions.
Key Capabilities:
| Capability | Details |
|---|---|
| Live Video Streaming | JPEG frames streamed from ESP32-CAM via HTTP POST β WebSocket to clients |
| Object Detection | CV worker runs inference on ingested frames and annotates metadata |
| Remote Patrol Control | Operators activate/deactivate nodes from the dashboard or mobile app |
| GPS Tagging | Each frame is tagged with coordinates from the drone node |
| Multi-Client Broadcast | Both the web dashboard and mobile app receive the same live feed simultaneously |
The project is partitioned into five specialized repositories, each owning a distinct layer of responsibility.
| # | Repository | Role | Language |
|---|---|---|---|
| 1 | skymarshal-api |
Central orchestration hub, state management, auth | Node.js / TypeScript |
| 2 | skymarshal-mobile |
Field monitoring and tactical control (mobile) | React Native / Expo |
| 3 | skymarshal-admin-dashboard |
Centralized command center and historical analysis (web) | Next.js |
| 4 | iatos-camera |
High-performance ingest gateway for raw video frames | Node.js / TypeScript |
| 5 | sky_marshal_firmware |
Low-level firmware for ESP32-CAM edge nodes | C++ / Arduino |
Each repository is independently deployable and communicates only through defined interfaces (REST, Kafka topics, WebSocket), ensuring strong separation of concerns.
| Technology | Usage | Why |
|---|---|---|
| Next.js | Admin Dashboard | SSR for fast initial load, API routes for BFF pattern, strong TypeScript support |
| React Native + Expo | Mobile App | Write once, deploy to iOS and Android; shares component logic with the web layer |
| NativeWind / Tailwind CSS | Styling (both platforms) | Single utility-first design language across web and mobile eliminates context switching |
Strategic rationale: Anchoring all clients on a React-based stack maximises code reuse (hooks, utilities, design tokens) and means a single developer can work across all client surfaces without a context shift.
| Technology | Usage | Why |
|---|---|---|
| Node.js + TypeScript | API & Ingest services | Non-blocking I/O is ideal for high-concurrency tasks like frame ingestion and WebSocket broadcasting |
| Express.js | HTTP API server | Lightweight, well-understood, flexible middleware model |
| Apache Kafka | Event bus between ingest and processing | Durable message queue that absorbs frame bursts and decouples producers from consumers |
| WebSockets | Real-time frame delivery to clients | Low-latency, full-duplex channel required for live video; REST polling is too slow |
| PostgreSQL | Relational data (users, drones, sessions, patrol logs) | ACID-compliant, ideal for structured operational data |
| MongoDB | Frame metadata & event logs | Schema-flexible document store fits the variability of frame annotation payloads |
| Component | Details | Why |
|---|---|---|
| ESP32-CAM (AI Thinker) | Microcontroller + camera module | Built-in Wi-Fi, OV2640 camera sensor, and deep-sleep support in a single ~$5 unit |
| C++ / Arduino Framework | Firmware language | Maximum hardware control, rich library ecosystem for the ESP32, real-time performance |
| HTTP POST | Uplink transport | Stateless and firewall-friendly; nodes can recover from any network interruption without a persistent connection |
The system is divided into three logical zones: Edge, Cloud/Server, and Clients.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CLIENTS β
β β
β ββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β Next.js Admin Dashboardβ β React Native Mobile App β β
β β (Web) β β (iOS / Android) β β
β ββββββββββββββ¬ββββββββββββββ βββββββββββββ¬βββββββββββββββ β
β β REST (Control) β WebSocket (Live) β
ββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ
β β
ββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββ
β CLOUD / SERVER β
β β
β βββββββββββββββ ββββββββββββββββ ββββββββββββββββββ β
β β SkyMarshal β β IATOS Ingest β β CV Pipeline β β
β β API β β Gateway βββββΆβ (Object Det.) β β
β β (Port 3000) β β (Port 3003) β βββββββββ¬βββββββββ β
β ββββββββ¬βββββββ ββββββββ¬ββββββββ β β
β β DB R/W β Kafka Publish β Broadcast β
β ββββββββΌβββββββββββ ββββββββΌββββββββ β β
β β PostgreSQL β β Apache Kafka ββββββββββββββ β
β β MongoDB β β (Event Bus) β β
β βββββββββββββββββββ βββββββββββββββ β
ββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β HTTP Poll (Config)
β HTTP POST (Frames)
ββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β EDGE β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β ESP32-CAM #1 β β ESP32-CAM #2 β β ESP32-CAM #N β ... β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
How an operator command propagates from the UI to the physical device.
Operator Action
β
βΌ
[Next.js Dashboard] ββββ HTTPS POST βββββΆ [SkyMarshal API]
β
Validate session token
β
Update drone config in DB
SET is_active = true
WHERE drone_id = "xyz"
β
βΌ
[ESP32 Node polls /config]
GET /api/v1/streams/config/
every 10 seconds
β
βββ { "is_active": true,
"stream_id": "xyz" }
β
ESP32 transitions:
IDLE βββΆ STREAMING
Step-by-step:
- Action β An operator on the Admin Dashboard or Mobile App activates "Active Patrol" for a specific drone node.
- Request β The client sends an authenticated REST request (
POST /api/v1/drones/{drone_id}/activate) to the SkyMarshal API with a valid session token. - Validation β The API verifies the token and the operator's permissions for that drone.
- State Update β The API sets
is_active = truefor the specifieddrone_idin the PostgreSQL database. - Edge Sync β The ESP32 node, which polls
/api/v1/streams/config/{drone_id}every 10 seconds, receives the updated configuration on its next poll. - Transition β The firmware parses the JSON response and transitions its internal state machine from
IDLEtoSTREAMING.
How captured video frames travel from the camera sensor to the operator's screen.
[OV2640 Camera Sensor]
β
βΌ JPEG Capture
[ESP32 Firmware]
β
βΌ Base64 Encode + JSON Wrap
β { drone_id, frame_number, gps, frame_data }
β
βΌ HTTP POST
[IATOS Ingest Gateway :3003]
β
βΌ Validate + Publish
[Apache Kafka Topic: "raw-frames"]
β
βΌ Consume
[CV Pipeline Worker]
- Decode Base64
- Run Object Detection (persons, vehicles, etc.)
- Annotate frame metadata
β
βΌ Publish annotated event
[Frame Service]
β
βββββ WebSocket βββΆ [Next.js Dashboard]
βββββ WebSocket βββΆ [React Native Mobile App]
β
Render on Canvas
with real-time overlays
Step-by-step:
- Capture β The OV2640 camera sensor on the ESP32-CAM captures a JPEG frame.
- Encoding β The firmware encodes the binary frame as Base64 and wraps it in a JSON payload containing
drone_id,frame_number,timestamp, andgpscoordinates. - Ingestion β The ESP32 POSTs the JSON packet to the IATOS Ingest Gateway (Port 3003) via HTTP.
- Queuing β IATOS validates the payload, strips sensitive routing headers, and publishes the message to the
raw-framesKafka topic. - Annotation (Async) β A CV worker consumes the Kafka message, decodes the Base64 frame, runs object detection inference (e.g., detecting persons and vehicles), and publishes an annotated event to a downstream topic.
- Distribution β The Frame Broadcast Service consumes the annotated events and pushes them over WebSocket to all subscribed clients.
- Render β The Admin Dashboard and Mobile App receive each frame event and render it on an HTML/Native Canvas element, overlaying bounding boxes, drone ID, GPS, and timestamp.
All API routes are served from the skymarshal-api service. Base URL: https://<host>/api/v1
Authentication uses Bearer token (JWT) in the Authorization header.
| Method | Endpoint | Description |
|---|---|---|
GET |
/drones |
List all registered drone nodes |
GET |
/drones/{drone_id} |
Get status and metadata for a specific drone |
POST |
/drones/{drone_id}/activate |
Activate a drone's patrol mode |
POST |
/drones/{drone_id}/deactivate |
Deactivate a drone (transitions to IDLE) |
| Method | Endpoint | Description |
|---|---|---|
GET |
/streams/config/{drone_id} |
Returns current config for the drone (is_active, stream_id, etc.) |
Example Response:
{
"drone_id": "drone-003",
"is_active": true,
"stream_id": "stream-abc-xyz",
"poll_interval_ms": 10000,
"resolution": "SVGA"
}| Method | Endpoint | Description |
|---|---|---|
POST |
/ingest/frame |
Accepts a frame payload from an ESP32 node |
Frame Payload Schema:
{
"drone_id": "drone-003",
"stream_id": "stream-abc-xyz",
"frame_number": 1042,
"timestamp": "2025-04-10T14:32:00Z",
"gps": {
"lat": -17.8292,
"lng": 31.0522,
"alt_m": 45.2
},
"frame_data": "<Base64-encoded JPEG string>"
}| Method | Endpoint | Description |
|---|---|---|
POST |
/auth/login |
Returns a JWT for an operator |
POST |
/auth/refresh |
Refreshes an expiring JWT |
POST |
/auth/logout |
Invalidates the current session |
CREATE TABLE drones (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(100) NOT NULL,
is_active BOOLEAN DEFAULT FALSE,
stream_id VARCHAR(100),
last_seen_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);{
"_id": "ObjectId",
"drone_id": "drone-003",
"stream_id": "stream-abc-xyz",
"frame_number": 1042,
"timestamp": "2025-04-10T14:32:00.000Z",
"gps": { "lat": -17.8292, "lng": 31.0522, "alt_m": 45.2 },
"detections": [
{ "label": "person", "confidence": 0.94, "bbox": [120, 80, 200, 340] },
{ "label": "vehicle", "confidence": 0.87, "bbox": [400, 150, 580, 290] }
],
"raw_stored": false
}The firmware (sky_marshal_firmware) operates as a deterministic state machine.
ββββββββββββ
Boot βββΆ INIT β
ββββββ¬ββββββ
β Wi-Fi connected
ββββββΌββββββ
β POLLING βββββββββββββββββββββ
ββββββ¬ββββββ β
β is_active == true β
ββββββΌββββββ is_active=false β
β STREAMING ββββββββββββββββββββ
ββββββββββββ
| Parameter | Default | Description |
|---|---|---|
POLL_INTERVAL_MS |
10000 |
How often the node polls the API for its config |
FRAME_QUALITY |
12 |
JPEG quality (0β63; lower = higher quality) |
FRAME_SIZE |
FRAMESIZE_SVGA |
Resolution (SVGA = 800Γ600) |
POST_ENDPOINT |
http://<host>:3003/ingest/frame |
IATOS ingest URL |
CONFIG_ENDPOINT |
http://<host>:3000/api/v1/streams/config/{id} |
API config poll URL |
void streamingLoop() {
camera_fb_t* frame = esp_camera_fb_get(); // Capture frame
String b64 = base64::encode(frame->buf, frame->len); // Encode
StaticJsonDocument<512> payload;
payload["drone_id"] = DRONE_ID;
payload["frame_number"] = frameCount++;
payload["gps"] = buildGpsPayload();
payload["frame_data"] = b64;
http.begin(POST_ENDPOINT);
http.addHeader("Content-Type", "application/json");
http.POST(payload.as<String>()); // Send to IATOS
http.end();
esp_camera_fb_return(frame); // Free buffer
}Separating the high-bandwidth frame ingestion layer (IATOS, Port 3003) from the business logic API (Port 3000) ensures that a burst of frames from multiple nodes never degrades the responsiveness of the admin interface. Each service can be independently scaled horizontally.
Without Kafka, a burst of frames hitting the CV worker directly would cause bottlenecks or dropped frames. Kafka acts as a durable buffer β IATOS writes fast, and the CV workers consume at their own pace. This also makes it trivial to add future consumers (e.g., an archival service or alert engine) without touching existing code.
While Base64 adds ~33% overhead to payload size, it was the correct first-version choice: it eliminates multipart form handling on both the ESP32 and the Node.js ingest server, works natively with JSON, and is trivially debuggable (any HTTP inspector can read it). A future version can move to binary streaming (e.g., multipart/octet-stream) for bandwidth efficiency.
Rather than maintaining a persistent MQTT or WebSocket connection from the ESP32, nodes poll the API for their configuration every 10 seconds. This makes nodes resilient to network drops (they simply re-poll on reconnect), requires no broker sidecar on the edge, and works behind NAT/firewalls without port forwarding.
Structured, relational data (users, drone registry, patrol sessions, audit logs) lives in PostgreSQL for ACID guarantees. Semi-structured, high-volume frame event data lives in MongoDB, where its schema flexibility accommodates evolving detection payload shapes without costly migrations.
| Layer | Mechanism |
|---|---|
| Client β API | JWT Bearer token, HTTPS |
| API β DB | Internal network only, credentials via environment variables |
| ESP32 β IATOS | drone_id + stream_id validated on every ingest request |
| IATOS β Kafka | Internal broker, no public exposure |
| WebSocket | Token validated on connection upgrade |
Note: For production deployments, ESP32 nodes should use TLS-enabled endpoints and a pre-shared device key for additional ingest authentication.
The server-side services are containerised and can be orchestrated with Docker Compose for development or Kubernetes for production.
# docker-compose.yml (development)
services:
api:
build: ./skymarshal-api
ports: ["3000:3000"]
environment:
- DATABASE_URL=postgres://...
- KAFKA_BROKER=kafka:9092
iatos:
build: ./iatos-camera
ports: ["3003:3003"]
environment:
- KAFKA_BROKER=kafka:9092
kafka:
image: confluentinc/cp-kafka:7.6.0
ports: ["9092:9092"]
postgres:
image: postgres:16
volumes: ["pgdata:/var/lib/postgresql/data"]
mongo:
image: mongo:7
volumes: ["mongodata:/data/db"]# Using PlatformIO
cd sky_marshal_firmware
pio run --target upload --upload-port /dev/ttyUSB0| Feature | Status |
|---|---|
| Core ingest pipeline (ESP32 β IATOS β Kafka) | β Complete |
| Admin Dashboard (Next.js) | β Complete |
| Mobile App (React Native) | β Complete |
| Object Detection CV Worker | β Complete |
| Binary frame streaming (replace Base64) | π Planned |
| MQTT control channel (replace HTTP poll) | π Planned |
| Multi-operator role-based access control | π Planned |
| Persistent video archival service | π Planned |
| Alert engine (motion triggers, geofence breach) | π Planned |
| OTA firmware update mechanism | π Planned |
| Name | Role | GitHub | Location | |
|---|---|---|---|---|
| π°οΈ | Tanya Mushonga | Lead Architect & Full-Stack Engineer β system design, API, firmware, mobile & web | github.com/TanyaMushonga | Harare, Zimbabwe πΏπΌ |
| β‘ | Malwande Moyo | Software Developer & Electronics Engineer β embedded systems, hardware integration | github.com/malwandemoyo | Bulawayo, Zimbabwe πΏπΌ |
Sky Marshal is an independently designed and built surveillance ecosystem. All architecture, firmware, and application code is original work.