Skip to content

Instantly share code, notes, and snippets.

@davidlj95
Last active May 4, 2023 02:21
Show Gist options
  • Save davidlj95/b9d962ad9bf62a14ebe73e12aac27c70 to your computer and use it in GitHub Desktop.
Save davidlj95/b9d962ad9bf62a14ebe73e12aac27c70 to your computer and use it in GitHub Desktop.
FlaixFm podcast URL extractor
function getNextButton() {
return document.querySelector('.podcast-pagination .right_arrow');
}
function hasNextButton() {
const nextButtonStyle = window.getComputedStyle(getNextButton());
return nextButtonStyle.opacity !== '0'
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
async function grabAudioUrlAndSleep() {
const audioItem = document.querySelector('audio#soundId');
const audioUrl = audioItem.src
console.log(`Found audio URL: ${audioUrl}`)
audioUrls.push(audioUrl)
// To avoid get banned from the audio server
await sleep(1000);
}
const audioUrls = []
while (hasNextButton()) {
const podcastPlayButtons = document.querySelectorAll(".podcast-right-bottom .llista-button-component-wrapper > div")
for (const podcastPlayButton of podcastPlayButtons) {
podcastPlayButton.click();
// Podcast item may be divided by hours
const hours = document.querySelectorAll(".player .time_handlers-hours");
// Not divided by hours, grab audio and go
if (hours.length == 0) {
await grabAudioUrlAndSleep();
}
// Divided by hours, loop them and grab audio URLs
for (const hour of hours) {
hour.click()
await grabAudioUrlAndSleep();
}
}
getNextButton().click();
}
@davidlj95
Copy link
Author

davidlj95 commented May 3, 2023

FlaixFM podcast audio URLs extractor

Small snippet so that you can extract podcasts' audio URLs from FlaixFM's podcast website

Extracting audio files URLs

Go there, open a DevTools window. Paste the snippet there. You'll see how you start loading each podcast item. And for each item, you check every hour of the podcast. For every audio file, the URL is displayed in the console and pushed into the audioUrls array.

Copying audio files URLs to clipboard

Once all podcast URLs have been stored, you can copy the audioUrls into the clipboard so you can download them. Type that into the DevTools and then, switch to the website window (otherwise you'll get an error).

setTimeout(async()=> navigator.clipboard.writeText(audioUrls.join('\n')), 2000)

Downloading audio files using their URLs

Paste the URLs into a file. If using MacOs:

pbpaste > audio_urls.txt

Then, you can use something like aria2c to download them all

aria2c -i audio_urls.txt

🎉 You've downloaded all podcast items from FlaixFM

⚠️ After doing so, seems that some audio files are duplicate 🤔 Maybe it's not the perfect script and not everything was grabbed properly 🤷
☝️ After testing manually, seems at some point it just grabbed the first podcast item on the page :S so started copy pasting an excerpt of this script in every page manually

Bonus: Merging audio files

Some podcasts items are divided into different audio files, 1 per each hour of the podcast. To join them again, you can use mp3wrap

mp3wrap 2023-03-05_Podcast.mp3  20230430*.mp3

Title and Album will be a bit weird. You can fix that with id3v2

id3v2 -a "FlaixFM" -A "Podcast" -t "2023-03-05" 2023-03-05_Podcast.mp3

Or if you want to add a cover image too, try with eyeD3

 eyeD3 --add-image "cover.jpg:FRONT_COVER" 2023-03-05_Podcast.mp3.mp3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment