Skip to content

Instantly share code, notes, and snippets.

View mehdidc's full-sized avatar

Mehdi Cherti mehdidc

View GitHub Profile
import ast
import json
import logging
import math
import os
import random
import sys
import braceexpand
from dataclasses import dataclass
from multiprocessing import Value
We can't make this file beautiful and searchable because it's too large.
language,acc1,model,mean_per_class_recall,dataset,pretrained,image_retrieval_recall@5,acc,mean_average_precision,acc5,task,model_fullname,text_retrieval_recall@5
en,0.18418107200596942,RN50-quickgelu,0.18626162135734856,cars,cc12m,,,,0.5200845665961945,zeroshot_classification,RN50-quickgelu cc12m,
en,0.17945529163039423,RN50,0.1817410995349502,cars,cc12m,,,,0.5139907971645318,zeroshot_classification,RN50 cc12m,
en,0.7211789578410646,ViT-B-16,0.7212942008721658,cars,commonpool_l_basic_s1b_b8k,,,,0.9759980101977366,zeroshot_classification,ViT-B-16 commonpool_l_basic_s1b_b8k,
en,0.8166894664842681,ViT-B-16,0.8155803538003477,cars,commonpool_l_clip_s1b_b8k,,,,0.9907971645317747,zeroshot_classification,ViT-B-16 commonpool_l_clip_s1b_b8k,
en,0.6628528789951499,ViT-B-16,0.6613039344249736,cars,commonpool_l_image_s1b_b8k,,,,0.9621937569953986,zeroshot_classification,ViT-B-16 commonpool_l_image_s1b_b8k,
en,0.6775276706877255,ViT-B-16,0.6744836925516351,cars,commonpool_l_laion_s1b_b8k,,,,0.9602039547319985,zeroshot_cla
import sys
fd = open(sys.argv[2], "w")
lines = open(sys.argv[1]).readlines()
for l in lines:
toks = l.split(" ")
t = toks[0]
image_id = t.split("#")[0]
caption = " ".join(toks[1:])
L = f"{image_id}.jpg,{caption}"
fd.write(L)
import io
import tarfile
import random
from collections import defaultdict
from lxml import etree
import uuid
from PIL import Image, ImageDraw
from glob import glob
import time
import os
@mehdidc
mehdidc / pytorch_performance_profiling.md
Created February 16, 2023 08:44 — forked from mingfeima/pytorch_performance_profiling.md
How to do performance profiling on PyTorch

(Internal Tranining Material)

Usually the first step in performance optimization is to do profiling, e.g. to identify performance hotspots of a workload. This gist tells basic knowledge of performance profiling on PyTorch, you will get:

  • How to find the bottleneck operator?
  • How to trace source file of a particular operator?
  • How do I indentify threading issues? (oversubscription)
  • How do I tell a specific operator is running efficiently or not?

This tutorial takes one of my recent projects - pssp-transformer as an example to guide you through path of PyTorch CPU peformance optimization. Focus will be on Part 1 & Part 2.

model_fullname,model_fullname_pretty,model_arch,samples_seen,gmacs_per_sample,gmacs_total,upstream_dataset,downstream_dataset,acc1,acc5,mean_per_class_recall,image_retrieval_recall@5,text_retrieval_recall@5
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab+,0.5654112282297443,0.8329414582676622,0.56279878057792,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/caltech101,0.8522353714661407,0.963346482577252,0.944284654839904,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenet1k,0.76664,0.9485,0.76656,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/cifar100,0.8391,0.9729,0.8388,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenetv2,0.6961,0.9086,0.6
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 13 columns, instead of 11 in line 5.
model_fullname,model_fullname_pretty,model_arch,samples_seen,gmacs_per_sample,gmacs_total,upstream_dataset,downstream_dataset,acc1,acc5,mean_per_class_recall,image_retrieval_recall@5,text_retrieval_recall@5
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab+,0.5654112282297443,0.8329414582676622,0.56279878057792,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/caltech101,0.8522353714661407,0.963346482577252,0.944284654839904,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenet1k,0.76664,0.9485,0.76656,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,vtab/cifar100,0.8391,0.9729,0.8388,,
ViT-g-14 /fsx/rom1504/open_clip/good_models/g_90.pt,g/14 2B,ViT-g-14,12208147020,290.74,3549396664594.8003,LAION-2B,imagenetv2,0.6961,0.9086,0.6
@mehdidc
mehdidc / example.sbatch
Created September 27, 2022 10:17
Content of the files
#!/bin/bash -x
#SBATCH --account=cstdl
#SBATCH --nodes=8
#SBATCH --gres=gpu:4
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=12
#SBATCH --wait-all-nodes=1
#SBATCH --time=00:30:00
#SBATCH --partition=batch
#SBATCH --job-name=open_clip
This file has been truncated, but you can view the full file.
{"info": {"description": "COCO 2014 Dataset", "url": "http://cocodataset.org", "version": "1.0", "year": 2014, "contributor": "COCO Consortium", "date_created": "2017/09/01"}, "images": [{"license": 3, "file_name": "COCO_val2014_000000391895.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000391895.jpg", "height": 360, "width": 640, "date_captured": "2013-11-14 11:18:45", "flickr_url": "http://farm9.staticflickr.com/8186/8119368305_4e622c8349_z.jpg", "id": 391895}, {"license": 4, "file_name": "COCO_val2014_000000060623.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000060623.jpg", "height": 427, "width": 640, "date_captured": "2013-11-14 17:24:15", "flickr_url": "http://farm7.staticflickr.com/6080/6113512699_37b4c98473_z.jpg", "id": 60623}, {"license": 3, "file_name": "COCO_val2014_000000483108.jpg", "coco_url": "http://images.cocodataset.org/val2014/COCO_val2014_000000483108.jpg", "height": 640, "width": 428, "date_captured": "2013-11-14 18:27:53", "flickr_u
import matplotlib as mpl
mpl.use('Agg')
import argparse
import pandas as pd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
def plot_scaling_and_efficiency(df):
"""