Skip to content

Instantly share code, notes, and snippets.

@carlosmcevilly
Last active June 28, 2018 03:40
Show Gist options
  • Save carlosmcevilly/4d568b71a89de70c8e1c27ee57c68cc6 to your computer and use it in GitHub Desktop.
Save carlosmcevilly/4d568b71a89de70c8e1c27ee57c68cc6 to your computer and use it in GitHub Desktop.
Manage (create and delete) a throwaway subset/ directory containing some files copied from an existing source training/ directory.
#!/bin/bash
# create-subset.sh
#
# Create a throwaway subset/ directory containing files copied from an existing source
# training/ directory. Assumes .jpg. Once you are done with the (redundant) subset/ and
# want to clean it up, it's recommended to use remove-subset.sh instead of manual
# commands, since with these scripts we are operating close to training data and we
# don't want a mistake to result in the deletion of the wrong directory.
export size=100 # number of images per category... change this
export fisher_yates_shuffle='for(@l=<>,$i=@l;--$i;){$j=int rand($i+1);next if $i==$j;@l[$i,$j]=@l[$j,$i];}print(@l);'
which perl > /dev/null || { echo "perl was not found. This tool relies on perl to randomly partition the data. Exiting without making any changes."; exit -1; }
if [[ -d "subset" ]]; then
echo "subset/ already exists."
echo "Exiting without making any changes."
exit -1
fi
if [[ ! -d "training" ]]; then
echo "training/ source directory not found."
echo "Exiting without making any changes."
exit -1
fi
cd training
for category in *; do
if [[ -d "$category" ]]; then
echo doing [$category]
mkdir -p ../subset/training/$category
cd $category
for file in `find . -type f -name "*.jpg" -print | perl -e "$fisher_yates_shuffle" | head -$size`; do
cp $file ../../subset/training/$category
done
cd ..
fi
done
#!/bin/bash
if [[ -d "subset" ]]; then
/bin/rm -r subset
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment