Skip to content

Instantly share code, notes, and snippets.

@neelabalan
Created September 2, 2024 11:06
Show Gist options
  • Save neelabalan/1808fe615a8354d31b6e3cc42ddd0145 to your computer and use it in GitHub Desktop.
Save neelabalan/1808fe615a8354d31b6e3cc42ddd0145 to your computer and use it in GitHub Desktop.
#!/bin/bash
if [ $# -ne 1 ]; then
echo "Usage: $0 <max_suffix_number>"
exit 1
fi
MAX_SUFFIX_NUMBER=$1
BASE_URL="https://huggingface.co/datasets/mteb/raw_arxiv/resolve/main/train_"
for ((i=0; i<=MAX_SUFFIX_NUMBER; i++)); do
URL="$BASE_URL$i.jsonl.gz"
FILENAME="filename_$i.jsonl.gz"
echo "Downloading $URL..."
curl -L -o "$FILENAME" "$URL"
if [ $? -eq 0 ]; then
echo "Downloaded $FILENAME successfully."
echo "Extracting $FILENAME..."
gzip -dk "$FILENAME"
if [ $? -eq 0 ]; then
echo "Extraction completed for $FILENAME."
else
echo "Error extracting $FILENAME."
fi
else
echo "Failed to download $URL."
fi
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment