-
-
Save ivan/411e75128eb22f4a278a87f98a58ef74 to your computer and use it in GitHub Desktop.
#!/usr/bin/env bash | |
# Download a podcast episode from anchor.fm | |
# | |
# Usage: | |
# grab-anchor-episode "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-from-the-Life-and-Death-of-the-Integral-Center-e31val" # (m4a example) | |
# grab-anchor-episode "https://anchor.fm/free-chapel/episodes/Are-You-Still-In-Love-With-Praise--Pastor-Jentezen-Franklin-e19u4i8" # (mp3 example) | |
# | |
# anchor.fm serves a list of m4a or mp3 files that need to be concatenated with ffmpeg. | |
# | |
# For debugging, uncomment: | |
# set -o verbose | |
set -eu -o pipefail | |
url=$1 | |
json=$(curl -sL "$url" | grep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g') | |
ymd=$(echo -E $json | jq -r '.episodePreview.publishOn' | cut -d 'T' -f 1) | |
extension=$((echo -E $json | jq -r '.[].episodeEnclosureUrl' | grep -F --max-count=1 :// | grep -oP '\.[0-9a-z]+$' | cut -d . -f 2) || echo m4a) | |
output_basename=$ymd-$(basename -- "$url").$extension | |
if [[ -f "$output_basename" ]]; then | |
echo "$output_basename already exists; skipping download" | |
exit | |
fi | |
temp_dir="$(mktemp -d)" | |
cd "$temp_dir" | |
audio_urls=$(echo -E $json | jq -r '.station.audios|map(.audioUrl)|.[]') | |
for i in $audio_urls; do | |
output_file=$(basename -- "$i") | |
wget "$i" -O "$output_file" | |
echo "file '$output_file'" >> .copy_list | |
done | |
ffmpeg -f concat -safe 0 -i .copy_list -c copy "$output_basename" | |
cd - | |
mv "$temp_dir/$output_basename" ./ | |
rm -rf "$temp_dir" |
If there is no .copy_list
, the issue is that it did not find any audio_urls
.
You can add some debug prints e.g. echo -E $json
before audio_urls=$(echo -E $json | jq -r '.station.audios|map(.audioUrl)|.[]')
if you would like to investigate the JSON.
Which URL causes that?
I have a 2 step hybrid solution on Mac that I think is a bit easier, only requires ggrep (via brew install ggrep
)
From terminal, insert your anchor url into the following code and run curl -sL "<insert-url-here>" | ggrep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g'
Cmd-F the printed output for "episodeEnclosureUrl":
, and copy the string that follows it (e.g. "https: ...")
Replace any \u002F
in that string with /
, and paste the resultant url into your web browser. Then click the 3 dots and click download!
The whole point of the script is to deal with anchor.fm's multi-file serving: for many podcasts, anchor publishes audio as multiple files that need to be concatenated. I believe the segments are split up as they were originally edited using their software.
Great point. I guess my solution is only useful for single-file podcasts from anchor.fm
Thank you so much for this!
In my case, I'm trying to download all episodes of a certain podcast. After poking around this one for a bit, I found that the Json returned by the curl request contains Urls for other episodes. (Maybe all of them? seems like it was in my case)
For those looking to do the same, here is a helper script that works with this one
./grab-all-anchor-episodes.sh
#!/bin/bash
# Downloads all(?) episodes from a podcaster
# Usage:
# ./grab-all-anchor-episodes.sh "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-from-the-Life-and-Death-of-the-Integral-Center-e31val"
#
# Must be run from same directory as ./grab-anchor-episodes.sh
# URL from an episode seems to contain information about other episodes too
# writes JSON to file in /tmp and iterates through each 'shareLinkPath' and writes to urlList
#
# Runs ./grab-anchor-episode.sh for each URL in list
#
#
url=$1
echo $url
json=$(curl -sL "$url" | grep -P -o 'window.__STATE__ = .*' | cut -d ' ' -f 3- | sed -r 's/;$//g')
echo $json > /tmp/json
python3 - <<END
import os
import json
data = open("/tmp/json", "r")
file = json.load(data)
for url in file['episodePreview']['episodes']:
urlPath = "https://anchor.fm%s" % url['shareLinkPath']
os.system("echo %s >> /tmp/urlList" % urlPath)
END
urlList=$(cat /tmp/urlList)
for url in $urlList
do
./grab-anchor-episode.sh $url
done
#cleanup
rm /tmp/json
rm /tmp/urlList
@Potatrix Did the script actually produce incorrect audio files? If it needs to be fixed, it would really help to have the URL for testing.
This seems to have stopped working at some point. I have the latest version, and anything I try to download, even the provided examples, just results in .copy_list: No such file or directory
.
The command I used: bash grab-anchor-episode.sh "https://anchor.fm/emerge/episodes/Robert-MacNaughton---Learnings-fr om-the-Life-and-Death-of-the-Integral-Center-e31val"
With the help of a friend, I've managed to modify the script so that it works again (at least for my purposes). I've forked it here: https://gist.github.com/viocar/a6b6a0f485b3f400b8bcb0f8334b454d
@ivan the script downloaded the audio files fine. Sometimes it didn't convert to mp3 but wasn't really an issue for me. I had a task to download all of the recordings for anchor podcast I manage and needed a quick way to download all of them which is why I made the modification
@viocar I notice a space in your URL but I assume it wasn't like this when you tried to run the script?
No, I tried several URLs that I copied directly from my browser. I'm not sure why there's a space in my post.
Please any one could suggest me script code for tracking anchor.fm podcast audio in Tag Manager tools ?
Where can I change the output location, sorry, I am new to linux
Can't think why its complaining but
.copy_list: No such file or directory