Skip to content

Instantly share code, notes, and snippets.

@Lyken17
Created March 13, 2024 01:18
Show Gist options
  • Save Lyken17/b07ed5b53eae18df00da6c45dbb3de3a to your computer and use it in GitHub Desktop.
Save Lyken17/b07ed5b53eae18df00da6c45dbb3de3a to your computer and use it in GitHub Desktop.
WIDS Usage example
from llava.wids import ShardListDataset
train_url = "https://storage.googleapis.com/webdataset/fake-imagenet/imagenet-train.json"
'''
{
"__kind__": "wids-shard-index-v1",
"wids_version": 1,
"shardlist": [
{
"url": "imagenet-train-000000.tar",
"md5sum": "8b661e95d30727b7bebdfdebdd5c5b5a",
"nsamples": 100,
"filesize": 24729600
},
{
"url": "imagenet-train-000001.tar",
"md5sum": "2ddf4be55441d2847db40ec7c6f650c5",
"nsamples": 100,
"filesize": 24545280
},
{
"url": "imagenet-train-000002.tar",
"md5sum": "136f7d9e595cc5c2dd762c2ad0be1718",
"nsamples": 100,
"filesize": 23695360
},
......
'''
dataset = wids.ShardListDataset(train_url)
sample = dataset[1900]
print(sample.keys())
print(sample[".txt"])
plt.imshow(sample[".jpg"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment