Skip to content

Instantly share code, notes, and snippets.

View HusseinLezzaik's full-sized avatar
💫
dreaming

Hussein Lezzaik HusseinLezzaik

💫
dreaming
View GitHub Profile

Smart Robot Episode Dataloader

Problem: Robot datasets have wildly variable episode lengths (10 seconds to 5+ minutes), causing massive load imbalance in distributed training. Worker 1 might process 50 short episodes while Worker 2 gets stuck on 3 long cooking demos.

Solution: Content-aware sharding that balances total compute workload, not episode count.

Quick Demo

# Install dependencies

What happens in a ML Interview?

  • Background and culture fit
  • Whiteboard Coding (similar to SWE interviews)
  • Pair Coding (similar to SWE-interviews)
  • Pair debugging (often ML-specific code)
  • Math puzzles (e.g. involving Linear Algebra)
  • Take-home ML project
  • Applied ML (e.g. explain how you'd solve this problem with ML)
  • Previous ML projects (e.g. probing on what you tried, why things did/didn't work)
  • ML Theory (e.g. bias-variance tradeoff, overfitting, underfitting, understanding of specific algorithms)

Strategy

  • Gather System Requirements
  • Plan
  • Estimate
  • Communicate
  • Diagram

Estimation Cheatsheet

Units

  • 1 kB = 1000 bytes
# Vanilla Crew Scheduling
# Step 1: Define the flights and crew members, along with the flight times and max work hours.
# Flight times in hours
flight_times = {
"flight_1": 5,
"flight_2": 3,
"flight_3": 4,
}