QA Comparison: Main vs Skills Architecture
Side-by-side comparison of the native tool architecture (main) vs the skill+bash architecture across 85+ QA scenarios. Both runs use the same Qwen 3.5 397B model on the same hardware, same NATS infrastructure, same QA evaluator. Only the os-assistant binary and scenario acceptance criteria differ.
Skills scored 48 GREAT vs main's 44 with the same BAD count (23 vs 23). Token usage was comparable (10.3M vs 10.3M) with fewer LLM calls (4,869 vs 5,649). Median time to first audio was 1.5s (skills) vs 1.7s (main).
Main
Skills
GREAT
44
48
SLOW
6
6
FINE
13
8
BAD
23
23
Total
86
85
Metric
Main
Skills
Ratio
LLM calls
5,649
4,869
0.9x
Total tokens
10.3M
10.3M
1.0x
Input tokens
10.0M
10.0M
1.0x
Output tokens
253K
253K
1.0x
Process
Main Calls
Main Tokens
Skills Calls
Skills Tokens
Ratio
os-context
4,301
3.9M
3,634
4.2M
1.1x
os-workers
389
3.7M
436
3.5M
0.9x
os-assistant
894
2.5M
747
2.5M
1.0x
os-protocol
65
80K
52
59K
0.7x
Tag
Main Calls
Main Tokens
Skills Calls
Skills Tokens
location-enrichment
4,124
3.8M
3,468
4.1M
protocol-builder
221
3.0M
198
2.8M
assistant-default-agent
455
2.3M
594
2.5M
workflow-execution
88
663K
48
345K
agentic-task-runner
7
23K
42
189K
bucket-classify
293
158K
0
0
temporal-narrative
177
84K
166
99K
research
0
0
71
87K
protocol-trigger-decision
65
80K
52
59K
situation-summary
66
70K
67
72K
wake-word-classification
90
29K
92
30K
exit-intent-classification
52
23K
39
16K
task-planner
7
12K
10
11K
transcript-compression
4
3K
22
8K
Phase
Queries
Hits
Hit Rate
Main
9,063,717
6,491,312
71.6%
Skills
5,557,794
3,781,184
68.0%
Visual Comparison by Category
Category
GREAT Rate
First Audio
Latency
Tool Calls
Protocols
Main
18/38 (47%)
1.7s
1.5s
0.6
Skills
17/38 (45%)
1.6s
1.3s
1.5
Privacy
Main
13/20 (65%)
1.5s
1.2s
0.4
Skills
11/20 (55%)
1.4s
1.1s
0.7
Core Assistant
Main
5/7 (71%)
1.5s
1.2s
0.6
Skills
6/7 (86%)
1.0s
756ms
0.6
Tools & Skills
Main
3/8 (38%)
2.2s
2.7s
1.3
Skills
8/9 (89%)
1.9s
1.6s
1.2
Background Tasks
Main
0/6 (0%)
3.4s
3.5s
1.3
Skills
3/4 (75%)
1.9s
1.5s
2.1
Scheduling
Main
4/5 (80%)
1.5s
1.4s
0.8
Skills
3/5 (60%)
2.1s
2.2s
3.6
Accessories
Main
1/2 (50%)
2.6s
2.4s
0.5
Skills
0/2 (0%)
2.4s
1.9s
0.3
Category
Main GREAT
Skills GREAT
Main Avg 1st Audio
Skills Avg 1st Audio
Main Avg Tools
Skills Avg Tools
Protocols
18/38
17/38
1.7s
1.6s
0.6
1.5
Privacy
13/20
11/20
1.5s
1.4s
0.4
0.7
Core Assistant
5/7
6/7
1.5s
1.0s
0.6
0.6
Tools & Skills
3/8
8/9
2.2s
1.9s
1.3
1.2
Background Tasks
0/6
3/4
3.4s
1.9s
1.3
2.1
Scheduling
4/5
3/5
1.5s
2.1s
0.8
3.6
Accessories
1/2
0/2
2.6s
2.4s
0.5
0.3
Skills Improved Over Main (14)
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Background Task: Follow-On Instruction
BAD
GREAT π’
3.3s
1.7s π’
4.1s
1.4s π’
2.0
2.0
Background Task: Live Interrupt Instruction
BAD
GREAT π’
2.6s
1.4s π’
2.3s
1.1s π’
0.7 π’
1.3
Background Task: Web Research
GOOD BUT SLOW
GREAT π’
3.0s
2.2s π’
2.6s
1.8s π’
1.0 π’
2.0
Calendar Query (Not Connected)
BAD
GREAT π’
1.8s π’
4.0s
1000ms π’
3.4s
0.0 π’
8.0
Communication: Announce and Notify
GOOD BUT SLOW
GREAT π’
2.0s
1.5s π’
1.5s
1.1s π’
0.8 π’
1.5
Conversation Memory Search
BAD
GREAT π’
1.7s
1.7s
943ms π’
1.2s
0.0 π’
2.0
Device Setup: Matter Camera
-
GREAT π’
-
1.0s
-
1.1s
-
1.5
Email Tools: Check and Draft
FAILED BUT FINE
GREAT π’
1.3s
1.0s π’
815ms π’
983ms
0.3 π’
1.2
Privacy: Trust-Level Tool Restrictions
FAILED BUT FINE
GREAT π’
1.1s
1.0s
847ms
707ms π’
0.5 π’
0.8
Protocol: Casual Conversational Pivot
BAD
GREAT π’
3.1s π’
3.8s
2.7s π’
3.3s
1.0 π’
2.0
Protocol: Casual Forgetful Habit
BAD
GREAT π’
909ms π’
1.1s
634ms π’
821ms
0.0 π’
1.0
Protocol: Person Arrival Trigger
FAILED BUT FINE
GREAT π’
1.8s
857ms π’
1.5s
577ms π’
0.7
0.7
Someone Reporting Context
GOOD BUT SLOW
GREAT π’
2.3s
1.6s π’
2.6s
1.8s π’
3.0
3.0
Visual Memory Search
GOOD BUT SLOW
GREAT π’
1.5s
1.2s π’
3.9s
755ms π’
4.0
0.7 π’
Main Better Than Skills (10)
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Accessory Parallel Conversation Routing
GREAT π’
GOOD BUT SLOW
2.4s
1.7s π’
2.5s
1.2s π’
0.5
0.1 π’
Cancel an Alarm
GREAT π’
FAILED BUT FINE
1.4s π’
1.7s
1.2s π’
2.7s
1.0 π’
2.7
Privacy: Child-Present Audience Content Filtering
GREAT π’
GOOD BUT SLOW
2.9s
2.1s π’
2.6s
1.7s π’
0.7
0.5 π’
Privacy: Guest Data Isolation
GREAT π’
GOOD BUT SLOW
1.2s
1.2s
942ms
976ms
0.1 π’
0.2
Privacy: Teen Speaker Content Moderation
GREAT π’
FAILED BUT FINE
871ms
831ms
642ms
500ms π’
0.0
0.0
Protocol Builder: Broad Trigger Constraint
GREAT π’
BAD
1.8s
1.1s π’
1.6s
775ms π’
0.8 π’
1.0
Protocol Builder: Maximally Vague Observation Request
GREAT π’
FAILED BUT FINE
2.3s
1.7s π’
2.0s
1.3s π’
0.7 π’
2.0
Protocol: Casual Complaint as Implicit Request
GREAT π’
BAD
1.5s
1.2s π’
1.1s
853ms π’
0.7 π’
1.5
Protocol: False Positive Resistance Under Volume
GREAT π’
BAD
1.8s
-
1.4s
-
0.3
-
Timer Lifecycle
GREAT π’
BAD
1.3s
1.2s π’
1.3s π’
2.3s
1.0 π’
2.3
Protocols
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Protocol Builder: Add Workflow to Existing
GREAT
GREAT
2.5s
1.6s π’
1.9s
1.2s π’
1.0 π’
2.0
Protocol Builder: Contradictory Constraints
GREAT
GREAT
1.9s
1.3s π’
1.5s
1.4s π’
0.3 π’
1.7
Protocol Builder: Survey Before Create
GREAT
GREAT
1.7s
1.8s
1.3s π’
1.3s
0.5 π’
2.0
Protocol: Casual Habit Building Request
GREAT
GREAT
1.5s
1.6s
1.4s
1.3s
0.5 π’
1.7
Protocol: Casual Offhand Tracking Request
GREAT
GREAT
1.0s π’
3.0s
781ms π’
2.7s
0.2 π’
1.5
Protocol: Casual Wake with Automation Intent
GREAT
GREAT
1.8s π’
3.0s
2.7s
2.6s
1.5 π’
4.0
Protocol: Casual Wake with Expressed Wish
GREAT
GREAT
1.4s π’
2.1s
1.4s π’
1.7s
0.5 π’
3.0
Protocol: Casual Wishful Observation
GREAT
GREAT
1.8s
877ms π’
1.5s
606ms π’
0.3 π’
0.7
Protocol: Full Lifecycle
GREAT
GREAT
1.4s π’
1.9s
1.2s π’
1.6s
0.8 π’
3.0
Protocol: Metric-Based Trigger
GREAT
GREAT
1.7s
1.0s π’
1.4s
741ms π’
0.5 π’
1.0
Protocol: One-Shot to Recurring Transition
GREAT
GREAT
2.4s
2.0s π’
2.1s
1.7s π’
1.0
0.8 π’
Protocol: Research Then Watch
GREAT
GREAT
4.3s
2.4s π’
4.0s
2.2s π’
1.0 π’
1.5
Protocol: Scheduled Reminder
GREAT
GREAT
1.4s π’
1.8s
1.5s
1.4s π’
1.0 π’
2.0
Protocol: Search Existing
GREAT
GREAT
1.2s
1.1s π’
981ms
845ms π’
0.7 π’
1.2
Privacy
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Privacy: COPPA Consent Gate
GREAT
GREAT
621ms
556ms π’
205ms
148ms π’
0.0
0.0
Privacy: COPPA Consent Grant Flow
GREAT
GREAT
801ms
770ms
503ms
455ms π’
0.2 π’
0.5
Privacy: Child Anti-Coaxing Resistance
GREAT
GREAT
1.0s
881ms π’
773ms
508ms π’
0.0
0.0
Privacy: Child Conversation History Isolation
GREAT
GREAT
2.3s
1.6s π’
2.0s
1.0s π’
0.4
0.2 π’
Privacy: Child Speaker Safety
GREAT
GREAT
1.4s
1.3s π’
1.2s
1.0s π’
0.2
0.2
Privacy: Guest Household Context Isolation
GREAT
GREAT
818ms π’
1.3s
496ms π’
969ms
0.0 π’
0.2
Privacy: Guest Memory & History Isolation
GREAT
GREAT
973ms
776ms π’
753ms
531ms π’
0.1 π’
0.5
Privacy: Speaker Change Transcript Isolation
GREAT
GREAT
954ms π’
1.2s
697ms π’
894ms
0.0 π’
0.7
Privacy: Teen-Adult Tool Boundary
GREAT
GREAT
1.1s
1.1s
855ms π’
1.2s
0.5 π’
1.8
Privacy: Teen-Present Audience Content Restriction
GREAT
GREAT
2.8s
2.3s π’
2.8s
2.4s π’
0.8
0.6 π’
Core Assistant
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Instruction Adherence: Custom Sign-Off
GREAT
GREAT
2.7s
1.3s π’
2.4s
1.1s π’
0.8
0.2 π’
Morning Greeting
GREAT
GREAT
1.2s
1.2s
670ms
478ms π’
0.0
0.0
Multi-user Personalization
GREAT
GREAT
1.0s
1.0s
790ms π’
910ms
0.4 π’
1.3
Unknown Capability Request
GREAT
GREAT
1.2s
760ms π’
954ms
354ms π’
0.0
0.0
Wake and Dismiss
GREAT
GREAT
1.1s
635ms π’
563ms
318ms π’
0.0
0.0
Tools & Skills
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Get Directions
GREAT
GREAT
1.8s
1.5s π’
5.6s
1.1s π’
2.0
0.7 π’
Live Information: Sports Scores
GREAT
GREAT
3.5s
3.7s
3.1s π’
3.4s
1.0
1.0
Web Search
GREAT
GREAT
4.1s
3.3s π’
3.5s
2.8s π’
1.0
1.0
Scheduling
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Set a Timer
GREAT
GREAT
1.8s π’
1.9s
1.4s π’
1.5s
1.0 π’
3.0
Set an Alarm
GREAT
GREAT
1.3s π’
1.6s
2.2s
1.2s π’
1.0 π’
2.0
Protocols
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Protocol: Competing Similar Protocols Disambiguation
BAD
BAD
1.8s
1.6s π’
1.9s
1.2s π’
0.8 π’
1.5
Protocol: Context Sensitivity β Same Action, Different Meaning
BAD
BAD
1.9s π’
2.0s
1.4s
1.4s
0.5 π’
2.0
Protocol: Hear-Modality Utterance Trigger
BAD
BAD
1.2s π’
1.6s
997ms π’
1.4s
0.2 π’
0.8
Protocol: Hear-Modality Work Stress Detection
BAD
BAD
1.4s π’
1.7s
1.3s π’
1.5s
0.4 π’
1.9
Protocol: Semantic Near-Miss Precision
BAD
BAD
1.0s
-
652ms
-
0.0
-
Protocol: Subjective Observation β Dangerous Activity Alert
BAD
BAD
1.7s
824ms π’
1.7s
1.0s π’
1.0
1.0
Protocol: Validation Failure Surfaces Reason
BAD
BAD
1.4s
1.3s
1.0s
944ms π’
0.0 π’
1.0
Privacy
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Privacy: COPPA Consent Revocation
BAD
BAD
1.1s π’
1.4s
779ms π’
970ms
0.0 π’
0.2
Privacy: Guest Device Context Isolation
BAD
BAD
1.3s π’
1.5s
1.0s π’
1.3s
0.2 π’
0.8
Background Tasks
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Checkpoint: Voice Announcement and Response
BAD
BAD
3.6s
2.6s π’
2.8s
2.3s π’
1.0 π’
3.3
Core Assistant
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Someone Is Non-Interactive
BAD
BAD
907ms
738ms π’
377ms π’
405ms
0.0
0.0
βοΈ Mixed Results (18)
Protocols
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Protocol: Ambiguous Profile Validation
FAILED BUT FINE
BAD
1.7s π’
1.9s
1.2s π’
1.3s
0.7 π’
2.0
Protocol: Hear-Modality Dinner Plans Detection
FAILED BUT FINE
BAD
1.8s
1.3s π’
1.6s
1.2s π’
0.6 π’
1.7
Protocol: Location β Everyone Left the House
BAD
FAILED BUT FINE
1.6s
1.6s
1.4s π’
1.6s
0.4 π’
1.0
Protocol: Location β First Person Home
GOOD BUT SLOW
FAILED BUT FINE
1.4s π’
1.6s
1.4s
1.3s π’
0.3 π’
2.0
Protocol: Paraphrase Robustness
GOOD BUT SLOW
BAD
1.5s
1.2s π’
1.1s
868ms π’
0.3 π’
1.0
Protocol: Specificity Threshold Gradient
FAILED BUT FINE
BAD
1.1s π’
1.1s
1.0s
771ms π’
0.5 π’
1.0
Protocol: Subjective Observation β Baby Monitor
FAILED BUT FINE
BAD
1.2s
1.2s
928ms
838ms π’
0.3 π’
1.0
Protocol: Subjective Observation β Interesting Things Outside
FAILED BUT FINE
BAD
1.6s
793ms π’
1.6s
535ms π’
0.8
0.5 π’
Protocol: Subjective Observation β Unusual Activity at Night
BAD
GOOD BUT SLOW
1.8s
1.4s π’
1.5s
1.1s π’
0.7 π’
2.0
Protocol: Visual Event Trigger
FAILED BUT FINE
GOOD BUT SLOW
1.5s
1.1s π’
1.2s
746ms π’
0.5 π’
1.0
Privacy
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Privacy: Multi-Speaker Tool Availability Switching
FAILED BUT FINE
FAILED BUT FINE
2.4s π’
2.6s
2.2s π’
2.4s
1.0
1.0
Privacy: Owner Full Access
FAILED BUT FINE
FAILED BUT FINE
2.1s
2.1s
2.0s
2.0s
1.0 π’
2.0
Privacy: Tool Bucket Overrides
FAILED BUT FINE
BAD
1.5s
1.5s
1.3s π’
1.4s
0.8 π’
2.2
Privacy: Trust Level Override via Profile Metadata
FAILED BUT FINE
BAD
2.1s
1.4s π’
1.7s
1.2s π’
0.7 π’
1.0
Background Tasks
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Background Task: Recall Full Output
BAD
-
3.7s
1.7s π’
5.3s
1.4s π’
2.0
2.0
Background Task: Report Quality
BAD
-
4.2s
1.7s π’
4.1s
1.4s π’
1.0 π’
2.0
Tools & Skills
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Medical Help: Shoulder Injury
BAD
FAILED BUT FINE
2.0s π’
2.3s
2.3s
2.3s
1.0
1.0
Accessories
Scenario
Main
Skills
Main 1st Audio
Skills 1st Audio
Main Latency
Skills Latency
Main Tools
Skills Tools
Accessory Response Isolation
BAD
GOOD BUT SLOW
2.8s π’
3.2s
2.3s π’
2.6s
0.5
0.5
Both GREAT: 34
Skills improved over main: 14
Main better than skills: 10
Both BAD: 11
Metric
Main
Skills
Avg first audio (across scenarios)
1812ms
1603ms
Median first audio
1678ms
1496ms
Avg total latency
1669ms
1332ms
Scenarios with metrics
86
83