Skip to content

Instantly share code, notes, and snippets.

@andrefs
Created December 11, 2017 01:04
Show Gist options
  • Save andrefs/418cb1016286572c306397e46c2cea2c to your computer and use it in GitHub Desktop.
Save andrefs/418cb1016286572c306397e46c2cea2c to your computer and use it in GitHub Desktop.
Split a dataset into train, eval and dev smaller subsets
#!/bin/sh
CATEGORIES=(desporto economia)
mkdir -p dev train 'eval'
for c in "${CATEGORIES[@]}"; do
for i in $(find ./_/$c -type f | sort -R | head); do
j=$(basename $i)
mv $i dev/${c}_${j}
done
done
for c in "${CATEGORIES[@]}"; do
for i in $(find ./_/$c -type f | sort -R | head -n 2000); do
j=$(basename $i)
mv $i train/${c}_${j}
done
done
for c in "${CATEGORIES[@]}"; do
for i in $(find ./_/$c -type f | sort -R | head -n 700); do
j=$(basename $i)
mv $i 'eval'/${c}_${j}
done
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment