Problem: Robot datasets have wildly variable episode lengths (10 seconds to 5+ minutes), causing massive load imbalance in distributed training. Worker 1 might process 50 short episodes while Worker 2 gets stuck on 3 long cooking demos.
Solution: Content-aware sharding that balances total compute workload, not episode count.
# Install dependencies