Tiros is an AI engineering assistant for process plants — it ingests a project's source documents (drawings, datasheets, vendor packs) and produces structured engineering deliverables grounded in the relevant standards.
We ran the same Tiros pipeline end-to-end against three different customer engagements in a single day:
- A steam-turbine feasibility study for a refinery operator
- A hydrogen-production feasibility study for a renewables developer
- A water-treatment plant automation upgrade for a utility
Each customer arrived with a different shape of input (size, format, language, completeness) and a different engineering domain (refinery processes, hydrogen / electrolysis, water treatment).
| Project | Input | Outcome |
|---|---|---|
| Project A (turbine) | 4 documents, very thin | The question pack is the deliverable |
| Project B (hydrogen) | 184 files, but the wrong domain for our system | Discovered our system doesn't fit; that discovery was itself useful |
| Project C (water) | 69 files, mature engagement, real plant drawings | Genuine workflow output; multiple deliverables ran successfully |
Customers often arrive at the early "feasibility study" stage. They have a brochure, a vendor email, a few operating-condition tables — but no plant drawings, no equipment list, no hazardous-substance register, no piping diagrams. Our system can structure what's there, but it cannot manufacture content from nothing.
What we found: in those cases, the most valuable output isn't a generated deliverable — it's a structured list of questions to send the customer. Our system aggregates 50–250 deduplicated customer questions across all the deliverables it would have produced, prioritised by how many downstream documents each question unblocks. That list collapses what's normally a two-week email thread into one round-trip.
Generalisable lesson: the value proposition for early-stage customers is "we'll send your customer one structured question pack instead of you emailing them ad-hoc for two months." That's different from the value proposition for late-stage customers, which is "we'll generate the deliverables."
The system was designed for refinery / petrochemical process plants. It ships with equipment classes (pumps, vessels, exchangers, columns), safety methodology (HAZOP per IEC 61882, materials per NACE for sour service, relief per API 521), and a ~74-workflow catalogue all targeting that domain.
When we ran it against a hydrogen-production plant (electrolyser + battery storage + photovoltaics), none of the workflows transferred cleanly. The failure modes, standards, and equipment classes are fundamentally different.
What we found:
- Our system honestly identified the domain mismatch instead of churning out irrelevant content. This is the correct behaviour — better to say "this isn't our domain" than to fabricate plausible-looking deliverables.
- The mismatch surfaced exactly what we'd need to add to handle the new domain. That diagnosis became four new "equipment-class starter" packs that any future hydrogen project will benefit from.
For the water-treatment project, the picture was much better. Water treatment shares enough with refinery process (pumps, vessels, tanks, materials, corrosion) that 3 of our 11 workflows ran cleanly and 4 more ran in degraded mode. Different domain, partial fit.
Generalisable lesson: the system has a "native domain" where everything works, an "adjacent domain" where most things work with caveats, and a "wrong domain" where almost nothing fits. Pricing, marketing, and onboarding need to set expectations differently for each.
Three integration issues surfaced during the runs:
- An auth handler silently failed on plant-drawing conversion. Graceful-degradation logic masked it from the user, but two runs were affected before we caught it.
- Missing dependencies in one sub-tool's package definition — a fresh clone would crash on first use.
- Path encoding mismatch (macOS uses one Unicode form, the system used another) — caused workflow-name lookups to fail silently.
All three were edge-case integration issues, not algorithmic ones. But all three would have repeatedly hit any new operator running the system.
Generalisable lesson: as we onboard more operators — internal team → external contractors → customer engineers — the cost of broken setup multiplies. We need a doctor / health-check command that catches these gotchas at setup time instead of letting operators discover them mid-customer-run.
Each customer run revealed a class of problem; we shipped a small fix for each. Six improvements, all small, none algorithmic:
| Improvement | Triggered by |
|---|---|
| Per-machine setup script that installs Python + tool dependencies | Refinery run — needed setup tooling |
| "Equipment class starter" pattern for class-typical engineering knowledge | Refinery run — first starter (steam turbine) |
| Hydrogen-domain class starters (electrolyser, battery, hydrogen storage, PV) | Hydrogen run — surfaced the domain gap |
| Setup-time check that warns if API auth isn't configured | Both feasibility runs — silent auth failures |
| Auth-passthrough fix for the plant-drawing converter | Water run — 2 plant drawings rescued after the fix |
| Cost reductions: cheaper plan-phase dispatch + skip decorative images | All 3 runs — observed wasted spend on low-value content |
Three things would meaningfully improve the next customer run:
- The hydrogen-domain class starters unlock hydrogen-domain projects without re-discovering the same class-typical knowledge each time.
- The cost-reduction work cuts spend per customer roughly in half on the next run.
The thin-pack pattern (Project A) is going to repeat. We should explicitly support "feasibility-stage entry" as a first-class mode that:
- Skips most workflows by default (because they'll all be unanswerable anyway)
- Runs only the question-pack roll-up logic
- Outputs a polished customer-question email in 5 minutes for ~$1
This is the simplest, highest-leverage product to wrap around what the system already does well.
Currently every project runs against all 11 workflows regardless of domain fit. A small "what is this customer's domain?" step at project setup would let us skip wholesale workflow categories that obviously don't apply (e.g., refinery HAZOP for a hydrogen plant). Saves operator time and system tokens.
- 3 customer projects exercised in 1 day
- 187 source files total ingested across the three
- 6 improvements shipped
- 4 real bugs found and fixed
- ~$13.50 total LLM spend across all three runs (closer to $6–7 once the cost reductions ship)
- 5 new equipment classes scaffolded (1 refinery turbine, 4 hydrogen-stack)
- ~250 customer questions surfaced across the three projects