Models don’t fail in notebooks.
They fail in production systems.
- Most ML talks focus on:
- Models
- Metrics
- Papers
- Most ML failures come from:
- Systems
- Assumptions
- Operations
This talk is about the part we keep ignoring.
ML ≠ Intelligence
ML = A component in a system
- A model is useless without:
- Data pipelines
- Serving infrastructure
- Monitoring
- Feedback loops
Intelligence only emerges when the system works.
Great models in bad systems fail.
Average models in great systems succeed.
- Netflix
- Google Search
- Fraud detection systems
None rely on perfect models — they rely on well-engineered systems.
Common mistake:
“Once the model is good, we’re done.”
Reality:
- Training is 10%
- Everything else is 90%
Typical journey:
- Notebook works 🎉
- Model is deployed 🚀
- Traffic increases 📈
- Data changes 🔄
- Predictions degrade 📉
- Nobody notices 😬
This is not a modeling problem.
It’s a systems problem.
Ask system-level questions:
- Where does data come from?
- Who owns data quality?
- What happens when inputs change?
- How do we know when the model is wrong?
- What happens when the model is down?
ML systems are not static software.
They:
- Age
- Drift
- Decay
- Break silently
ML systems behave more like organisms than code.
Every production ML system has:
- Data ingestion
- Feature generation
- Model inference
- Decision logic
- Monitoring & feedback
Ignore one → the system collapses.
Hard truth:
- Your model learns your data bugs
- Your model amplifies data bias
- Your model reflects data staleness
Most “model issues” are actually data issues.
What you train on ≠ what you serve.
Causes:
- Different pipelines
- Different preprocessing
- Missing values
- Real-world messiness
Same code ≠ same data.
Users don’t want predictions.
They want:
- Decisions
- Actions
- Outcomes
Examples:
- Fraud score ≠ fraud prevention
- Recommendation ≠ conversion
- Classification ≠ business value
A model output without context is dangerous.
You need:
- Confidence
- Constraints
- Business rules
- Human overrides
Intelligence = Model + Rules + Judgment
- Bad inputs
- Missing features
- Cold start
- Traffic spikes
- Model timeouts
- Partial outages
If you didn’t design for it, it will happen.
Traditional monitoring is not enough.
You must observe:
- Input distributions
- Output distributions
- Decision rates
- Business impact
Logs and metrics ≠ ML observability.
ML systems often:
- Don’t crash
- Don’t alert
- Just get worse
A wrong prediction at scale is worse than downtime.
Without feedback:
- Models never improve
- Errors repeat
- Bias grows
Key question:
How does the system learn it was wrong?
Humans:
- Catch edge cases
- Provide corrections
- Create trust
The best ML systems collaborate with humans, not replace them.
MLOps is not:
- Just tools
- Just pipelines
- Just CI/CD
MLOps is:
- Reliability
- Ownership
- Lifecycle management
Same principles as distributed systems.
Ask:
- Who owns data quality?
- Who owns model performance?
- Who responds when it degrades?
If the answer is “the model”, you already lost.
Avoid:
- Over-complex models
- Fragile pipelines
- Unnecessary automation
Prefer:
- Simple models
- Strong defaults
- Clear fallbacks
Boring systems scale. Fancy ones break.
In production:
- Interpretability > +0.5% accuracy
- Stability > novelty
- Maintainability > brilliance
Stop thinking:
“How good is my model?”
Start thinking:
“How resilient is my system?”
- Design for failure
- Monitor business impact
- Ship small, iterate fast
- Treat ML like infrastructure
Intelligence is engineered, not trained.
- Models are components
- Systems create value
- Engineering makes ML real
“In production, intelligence is not about how smart your model is —
it’s about how well your system survives reality.”