Skip to content

Instantly share code, notes, and snippets.

@nousr
Last active July 29, 2023 23:16
Show Gist options
  • Save nousr/619c546b3f7ce43f1fd8ddd80b3e52a8 to your computer and use it in GitHub Desktop.
Save nousr/619c546b3f7ce43f1fd8ddd80b3e52a8 to your computer and use it in GitHub Desktop.
How to compute clip embeddings easily with clip-retrieval & slurm

Install stuff

  1. create a virtual environment python3 -m venv .env then activate it source .env/bin/activate
  2. install pytorch
  3. install clip-retrieval pip install clip-retrieval
  4. install s3fs pip install s3fs
  5. (optional) install wandb pip install wandb and login wandb login

Create a script that points to your data & and output folder

  1. create a folder of image & txt pairs with the same filename (excepting the extension)
  • example: img0.png, img0.txt
  1. fill out the input_dataset and output_dataset fields of the script below
  2. change the wandb project name & toggle if using wandb
  3. adjust the clip model preference
  4. adjust the slurm job comment to use your team's account
  5. set your slurm cache path ( can be anything you'd like )

more notes & advanced usage at: https://github.com/rom1504/clip-retrieval

#!/bin/bash
clip-retrieval inference \
--input_dataset="<parent folder containing images>" \
--output_folder="<output s3 bucket or local folder>" \
--input_format="files" \
--enable_metadata=False \
--write_batch_size=500 \
--num_prepro_workers=2 \
--batch_size=64 \
--enable_wandb=True \
--wandb_project="<project name>" \
--clip_model="open_clip:ViT-H-14" \
--use_jit=False \
--distribution_strategy="slurm" \
--slurm_job_name="shot-deck-embed" \
--slurm_partition="g40423" \
--slurm_nodes=1 \
--slurm_job_comment="<your account" \
--slurm_job_timeout=350000 \
--cache_path=None \
--clip_cache_path=None \
--slurm_cache_path="<your cache path>" \
--slurm_verbose_wait=False \
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment