Skip to content

Instantly share code, notes, and snippets.

@qpwo
Created January 22, 2025 02:35
Show Gist options
  • Save qpwo/7a0400f6b1b627eb5fdc5c8900e32cf0 to your computer and use it in GitHub Desktop.
Save qpwo/7a0400f6b1b627eb5fdc5c8900e32cf0 to your computer and use it in GitHub Desktop.
o1 fail
Why does this run instantly after first trial?
import subprocess
import time
import shutil
import os
def run_rclone_command(command, num_trials=3, dest_path_base="/home/ubuntu/hff/405b-base"):
"""
Runs an rclone command multiple times and calculates the average execution time.
Simplified version with command executed directly.
"""
times = []
dest_path = os.path.expanduser(dest_path_base)
for i in range(num_trials):
print(f"Running trial {i+1} for command: {command}")
# Delete destination directory before each trial
if os.path.exists(dest_path):
print(f"Deleting destination directory: {dest_path}")
try:
shutil.rmtree(dest_path)
except Exception as e:
print(f"Error deleting destination directory: {e}")
return None
start_time = time.time()
process = subprocess.run(command, shell=True, capture_output=True, text=True)
end_time = time.time()
if process.returncode != 0:
print(f"Command failed with return code: {process.returncode}")
print(f"Stdout: {process.stdout}")
print(f"Stderr: {process.stderr}")
return None
times.append(end_time - start_time)
print(f"Trial {i+1} completed in {times[-1]:.2f} seconds\n")
average_time = sum(times) / len(times)
return average_time
if __name__ == "__main__":
source_path = os.path.expanduser("/home/ubuntu/shared/hff/70b-base")
dest_path_base = "/home/ubuntu/hff/70b-base"
num_trials = 3
commands = [
# Variation 1: Baseline (Concurrency Focused - your original settings)
f"rclone sync --exclude-if-present=.dontclone --transfers=64 --checkers=64 --multi-thread-streams=128 --cache-workers=32 --buffer-size=16M --fast-list --stats=1s -q --stats-one-line --size-only --ignore-errors --exclude '**/node_modules/**' --use-mmap --metadata --use-server-modtime --progress --links '{source_path}' '{dest_path_base}'",
# Variation 2: Low Concurrency
f"rclone sync --exclude-if-present=.dontclone --transfers=8 --checkers=8 --buffer-size=16M --fast-list --stats=1s -q --stats-one-line --size-only --ignore-errors --exclude '**/node_modules/**' --use-mmap --metadata --use-server-modtime --progress --links '{source_path}' '{dest_path_base}'",
# Variation 3: High Concurrency (Be cautious with very high values)
f"rclone sync --exclude-if-present=.dontclone --transfers=128 --checkers=128 --multi-thread-streams=256 --cache-workers=64 --buffer-size=16M --fast-list --stats=1s -q --stats-one-line --size-only --ignore-errors --exclude '**/node_modules/**' --use-mmap --metadata --use-server-modtime --progress --links '{source_path}' '{dest_path_base}'",
# Variation 4: No fast-list (Default Listing)
f"rclone sync --exclude-if-present=.dontclone --transfers=64 --checkers=64 --multi-thread-streams=128 --cache-workers=32 --buffer-size=16M --stats=1s -q --stats-one-line --size-only --ignore-errors --exclude '**/node_modules/**' --use-mmap --metadata --use-server-modtime --progress --links '{source_path}' '{dest_path_base}'",
# Variation 5: No size-only (Modtime & Size check)
f"rclone sync --exclude-if-present=.dontclone --transfers=64 --checkers=64 --multi-thread-streams=128 --cache-workers=32 --buffer-size=16M --fast-list --stats=1s -q --stats-one-line --ignore-errors --exclude '**/node_modules/**' --use-mmap --metadata --use-server-modtime --progress --links '{source_path}' '{dest_path_base}'",
]
print("Starting rclone performance comparison...\n")
results = {}
for i, command in enumerate(commands):
avg_time = run_rclone_command(command, num_trials, dest_path_base)
if avg_time is not None:
results[f"Variation {i+1}"] = avg_time
else:
results[f"Variation {i+1}"] = "Error during execution"
print("\n--- Average Execution Times ---")
for variation, avg_time_or_error in results.items():
if isinstance(avg_time_or_error, (int, float)):
print(f"{variation}: {avg_time_or_error:.2f} seconds")
else:
print(f"{variation}: {avg_time_or_error}")
print("\nComparison complete.")
ChatGPT
Thought about rclone command behavior for 6 seconds
When you copy files locally on a Linux system, the kernel will cache recently accessed data in memory (the page cache). On your first run, rclone must read every file from disk; but on subsequent runs, those same files are typically still “hot” in the page cache, so file I/O is dramatically faster (effectively in-memory), making it look “instant.”
Even though your code deletes the destination directory each time, the source directory remains cached. Deleting files from the destination does not clear the kernel’s page cache for the source. Hence, the second and third rclone sync commands don’t need to do as much physical disk I/O, which is why the runs appear to complete almost immediately.
If your goal is to measure true end-to-end performance (particularly read performance from disk), you’ll want to clear the page cache before each trial, for example:
bash
Copy
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
However, be aware that clearing the cache can have side effects on other running processes. If you’re just interested in how rclone performs in a “normal” workload situation, the current behavior is actually typical, since repeated reads are almost always cached in real-world scenarios as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment