Skip to content

Instantly share code, notes, and snippets.

@TACIXAT
Last active August 3, 2024 12:48
Show Gist options
  • Save TACIXAT/ecd4f636bf6af28cb69d641e29d7b362 to your computer and use it in GitHub Desktop.
Save TACIXAT/ecd4f636bf6af28cb69d641e29d7b362 to your computer and use it in GitHub Desktop.
Throughput benchmarks for all OpenCLIP models

OpenCLIP Throughput Benchmark (image embeddings per second)

This should give an idea of relative throughput of the models. I could not discern what would be fastest from the names alone.

This is just a speed test. Obviously the larger models will perform better on evaluation benchmarks at the tradeoff of speed. Find the models that meet your throughput requirements then benchmark those for performance on the task you are doing.

Tested on an NVIDIA RTX 3090. CPU is an AMD 7950x, though that should not affect the benchmark much.

Code

Images are pre-loaded into memory. The benchmark measures preprocess + encode.

    t0 = time.time()

    preprocd = []
    for image in images:
        preprocd.append(clip_preprocess(image).unsqueeze(0).to(device))
 
    with torch.no_grad():
        embeds = []
        for image in preprocd:
            embeds.append(clip_model.encode_image(image).to("cpu"))

    t1 = time.time()
    print(arch, pretrained, len(images)/(t1-t0))

Average throughput per architecture

Architecture Images Per Second
ViT-B-32 170.317
nllb-clip-base 166.847
xlm-roberta-base-ViT-B-32 163.626
ViT-B-32-256 161.603
ViT-B-32-quickgelu 161.004
roberta-ViT-B-32 159.02
coca_ViT-B-32 156.964
ViT-B-16-SigLIP 152.068
ViT-B-16 150.31
RN50-quickgelu 147.837
ViT-B-16-quickgelu 145.091
ViT-B-16-SigLIP-256 141.253
ViT-B-16-SigLIP-i18n-256 140.578
RN50 129.033
ViT-B-16-plus-240 126.04
RN50x4 110.681
EVA02-B-16 100.629
RN101-quickgelu 93.8347
convnext_base 93.7734
RN101 92.2537
convnext_base_w 90.138
convnext_base_w_320 89.1247
convnext_large_d 83.5527
ViT-B-16-SigLIP-384 79.4187
RN50x16 65.2571
convnext_large_d_320 60.0695
ViT-L-16-SigLIP-256 59.8122
ViT-L-14-CLIPA 57.4581
ViT-L-14 54.3296
coca_ViT-L-14 53.7369
ViT-L-14-quickgelu 52.6421
ViT-SO400M-14-SigLIP 50.0526
EVA02-L-14 43.3144
ViT-B-16-SigLIP-512 42.4944
ViT-H-14-CLIPA 31.9688
ViT-L-14-CLIPA-336 31.7083
convnext_xxlarge 31.5165
ViT-L-16-SigLIP-384 30.9679
xlm-roberta-large-ViT-H-14 30.9487
nllb-clip-large 30.9176
ViT-H-14 30.8775
ViT-H-14-quickgelu 30.0928
ViT-L-14-336 29.4929
EVA02-L-14-336 26.5657
RN50x64 26.1571
EVA01-g-14 20.4507
EVA01-g-14-plus 20.4077
ViT-g-14 20.0696
ViT-SO400M-14-SigLIP-384 17.6165
ViT-H-14-CLIPA-336 16.354
ViT-bigG-14-CLIPA 12.1527
ViT-bigG-14 12.0347
ViT-bigG-14-CLIPA-336 6.52044
EVA02-E-14-plus 5.52984
EVA02-E-14 5.5032

Raw data for all pretrains

Architecture Pretrained Images Per Second
RN50 openai 107.945
RN50 yfcc15m 139.889
RN50 cc12m 139.266
RN50-quickgelu openai 164.015
RN50-quickgelu yfcc15m 139.801
RN50-quickgelu cc12m 139.694
RN101 openai 100.867
RN101 yfcc15m 83.64
RN101-quickgelu openai 102.801
RN101-quickgelu yfcc15m 84.868
RN50x4 openai 110.681
RN50x16 openai 65.2571
RN50x64 openai 26.1571
ViT-B-32 openai 163.519
ViT-B-32 laion400m_e31 171.204
ViT-B-32 laion400m_e32 170.896
ViT-B-32 laion2b_e16 169.895
ViT-B-32 laion2b_s34b_b79k 173.671
ViT-B-32 datacomp_xl_s13b_b90k 171.88
ViT-B-32 datacomp_m_s128m_b4k 172.488
ViT-B-32 commonpool_m_clip_s128m_b4k 169.047
ViT-B-32 commonpool_m_laion_s128m_b4k 170.736
ViT-B-32 commonpool_m_image_s128m_b4k 172.236
ViT-B-32 commonpool_m_text_s128m_b4k 173.43
ViT-B-32 commonpool_m_basic_s128m_b4k 168.535
ViT-B-32 commonpool_m_s128m_b4k 170.343
ViT-B-32 datacomp_s_s13m_b4k 170.372
ViT-B-32 commonpool_s_clip_s13m_b4k 170.721
ViT-B-32 commonpool_s_laion_s13m_b4k 170.097
ViT-B-32 commonpool_s_image_s13m_b4k 170.926
ViT-B-32 commonpool_s_text_s13m_b4k 169.276
ViT-B-32 commonpool_s_basic_s13m_b4k 169.176
ViT-B-32 commonpool_s_s13m_b4k 167.898
ViT-B-32-256 datacomp_s34b_b86k 161.603
ViT-B-32-quickgelu openai 163.239
ViT-B-32-quickgelu laion400m_e31 159.452
ViT-B-32-quickgelu laion400m_e32 161.956
ViT-B-32-quickgelu metaclip_400m 159.77
ViT-B-32-quickgelu metaclip_fullcc 160.604
ViT-B-16 openai 148.401
ViT-B-16 laion400m_e31 151.745
ViT-B-16 laion400m_e32 151.355
ViT-B-16 laion2b_s34b_b88k 147.765
ViT-B-16 datacomp_xl_s13b_b90k 148.236
ViT-B-16 datacomp_l_s1b_b8k 152.87
ViT-B-16 commonpool_l_clip_s1b_b8k 150.399
ViT-B-16 commonpool_l_laion_s1b_b8k 149.891
ViT-B-16 commonpool_l_image_s1b_b8k 150.263
ViT-B-16 commonpool_l_text_s1b_b8k 151.263
ViT-B-16 commonpool_l_basic_s1b_b8k 151.263
ViT-B-16 commonpool_l_s1b_b8k 150.274
ViT-B-16-quickgelu metaclip_400m 145.243
ViT-B-16-quickgelu metaclip_fullcc 144.938
ViT-B-16-plus-240 laion400m_e31 125.802
ViT-B-16-plus-240 laion400m_e32 126.279
ViT-L-14 openai 53.6898
ViT-L-14 laion400m_e31 54.5316
ViT-L-14 laion400m_e32 54.1111
ViT-L-14 laion2b_s32b_b82k 54.339
ViT-L-14 datacomp_xl_s13b_b90k 54.6971
ViT-L-14 commonpool_xl_clip_s13b_b90k 54.8216
ViT-L-14 commonpool_xl_laion_s13b_b90k 54.2918
ViT-L-14 commonpool_xl_s13b_b90k 54.155
ViT-L-14-quickgelu metaclip_400m 52.2084
ViT-L-14-quickgelu metaclip_fullcc 53.0757
ViT-L-14-336 openai 29.4929
ViT-H-14 laion2b_s32b_b79k 30.8775
ViT-H-14-quickgelu metaclip_fullcc 30.0928
ViT-g-14 laion2b_s12b_b42k 20.0461
ViT-g-14 laion2b_s34b_b88k 20.093
ViT-bigG-14 laion2b_s39b_b160k 12.0347
roberta-ViT-B-32 laion2b_s12b_b32k 159.02
xlm-roberta-base-ViT-B-32 laion5b_s13b_b90k 163.626
xlm-roberta-large-ViT-H-14 frozen_laion5b_s13b_b90k 30.9487
convnext_base laion400m_s13b_b51k 93.7734
convnext_base_w laion2b_s13b_b82k 88.6564
convnext_base_w laion2b_s13b_b82k_augreg 90.0901
convnext_base_w laion_aesthetic_s13b_b82k 91.6674
convnext_base_w_320 laion_aesthetic_s13b_b82k 89.222
convnext_base_w_320 laion_aesthetic_s13b_b82k_augreg 89.0274
convnext_large_d laion2b_s26b_b102k_augreg 83.5527
convnext_large_d_320 laion2b_s29b_b131k_ft 60.3519
convnext_large_d_320 laion2b_s29b_b131k_ft_soup 59.7872
convnext_xxlarge laion2b_s34b_b82k_augreg 31.4495
convnext_xxlarge laion2b_s34b_b82k_augreg_rewind 31.504
convnext_xxlarge laion2b_s34b_b82k_augreg_soup 31.5961
coca_ViT-B-32 laion2b_s13b_b90k 156.336
coca_ViT-B-32 mscoco_finetuned_laion2b_s13b_b90k 157.592
coca_ViT-L-14 laion2b_s13b_b90k 53.6466
coca_ViT-L-14 mscoco_finetuned_laion2b_s13b_b90k 53.8271
EVA01-g-14 laion400m_s11b_b41k 20.4507
EVA01-g-14-plus merged2b_s11b_b114k 20.4077
EVA02-B-16 merged2b_s8b_b131k 100.629
EVA02-L-14 merged2b_s4b_b131k 43.3144
EVA02-L-14-336 merged2b_s6b_b61k 26.5657
EVA02-E-14 laion2b_s4b_b115k 5.5032
EVA02-E-14-plus laion2b_s9b_b144k 5.52984
ViT-B-16-SigLIP webli 152.068
ViT-B-16-SigLIP-256 webli 141.253
ViT-B-16-SigLIP-i18n-256 webli 140.578
ViT-B-16-SigLIP-384 webli 79.4187
ViT-B-16-SigLIP-512 webli 42.4944
ViT-L-16-SigLIP-256 webli 59.8122
ViT-L-16-SigLIP-384 webli 30.9679
ViT-SO400M-14-SigLIP webli 50.0526
ViT-SO400M-14-SigLIP-384 webli 17.6165
ViT-L-14-CLIPA datacomp1b 57.4581
ViT-L-14-CLIPA-336 datacomp1b 31.7083
ViT-H-14-CLIPA datacomp1b 31.9688
ViT-H-14-CLIPA-336 laion2b 16.3764
ViT-H-14-CLIPA-336 datacomp1b 16.3315
ViT-bigG-14-CLIPA datacomp1b 12.1527
ViT-bigG-14-CLIPA-336 datacomp1b 6.52044
nllb-clip-base v1 166.847
nllb-clip-large v1 30.9176
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment