Last active
June 28, 2018 03:40
-
-
Save carlosmcevilly/4d568b71a89de70c8e1c27ee57c68cc6 to your computer and use it in GitHub Desktop.
Manage (create and delete) a throwaway subset/ directory containing some files copied from an existing source training/ directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# create-subset.sh | |
# | |
# Create a throwaway subset/ directory containing files copied from an existing source | |
# training/ directory. Assumes .jpg. Once you are done with the (redundant) subset/ and | |
# want to clean it up, it's recommended to use remove-subset.sh instead of manual | |
# commands, since with these scripts we are operating close to training data and we | |
# don't want a mistake to result in the deletion of the wrong directory. | |
export size=100 # number of images per category... change this | |
export fisher_yates_shuffle='for(@l=<>,$i=@l;--$i;){$j=int rand($i+1);next if $i==$j;@l[$i,$j]=@l[$j,$i];}print(@l);' | |
which perl > /dev/null || { echo "perl was not found. This tool relies on perl to randomly partition the data. Exiting without making any changes."; exit -1; } | |
if [[ -d "subset" ]]; then | |
echo "subset/ already exists." | |
echo "Exiting without making any changes." | |
exit -1 | |
fi | |
if [[ ! -d "training" ]]; then | |
echo "training/ source directory not found." | |
echo "Exiting without making any changes." | |
exit -1 | |
fi | |
cd training | |
for category in *; do | |
if [[ -d "$category" ]]; then | |
echo doing [$category] | |
mkdir -p ../subset/training/$category | |
cd $category | |
for file in `find . -type f -name "*.jpg" -print | perl -e "$fisher_yates_shuffle" | head -$size`; do | |
cp $file ../../subset/training/$category | |
done | |
cd .. | |
fi | |
done | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [[ -d "subset" ]]; then | |
/bin/rm -r subset | |
fi |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment