Skip to content

Instantly share code, notes, and snippets.

@RicherMans
Created April 19, 2020 06:24
Show Gist options
  • Save RicherMans/816333a600f14a56ae8bc9b686a95bb9 to your computer and use it in GitHub Desktop.
Save RicherMans/816333a600f14a56ae8bc9b686a95bb9 to your computer and use it in GitHub Desktop.
Download Audioset V2 with FFmpeg. Requires proxychains + Ffmpeg + gnu parallel
# @Author: richman
# @Date: 2018-03-15
# @Last Modified by: richman
# @Last Modified time: 2018-03-30
if [[ $# < 1 ]]; then
echo "Input .csv file .e.g balanced_train_segments.csv"
exit
fi
inp=$1
njobs=${2:-4}
SAMPLE_RATE=16000
EXTENSION="flac"
fetch_clip() {
# echo "Fetching $1 ($2 to $3)..."
outname="$1_$2_$3"
if [ -f "${outname}.${EXTENSION}" ]; then
return
fi
link=$(youtube-dl -g https://youtube.com/watch?v=$1 | awk 'NR==2{print}')
if [ $? -eq 0 ]; then
proxychains -q ffmpeg -loglevel quiet -i "$link" -ar $SAMPLE_RATE \
-ss "$2" -to "$3" "./${outname}.${EXTENSION}"
fi
}
export SAMPLE_RATE
export EXTENSION
export -f fetch_clip
#until proxychains -q wget -q --spider http://google.com
#do
#echo "No connection. Sleeping 1 minute."
#sleep 1m;
#done
grep "^[^#;]" $inp | parallel --resume --joblog job.log -j $njobs --colsep=, fetch_clip {1} {2} {3} > /dev/null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment