Skip to content

Instantly share code, notes, and snippets.

@yukiarimo
Created March 14, 2024 20:52
Show Gist options
  • Save yukiarimo/8be25767362f580434aec0fc39504d3d to your computer and use it in GitHub Desktop.
Save yukiarimo/8be25767362f580434aec0fc39504d3d to your computer and use it in GitHub Desktop.
Web novel Downloader
function extractAndDownloadAllChapters() {
// Find all containers that hold chapter text
const chapterContainers = document.querySelectorAll('.cha-words');
// Initialize an array to hold all chapter texts
let allChaptersText = [];
// Iterate over each chapter container
chapterContainers.forEach(container => {
// Get all paragraph elements within the container
const paragraphs = container.querySelectorAll('p');
// Extract the text from each paragraph and join them with a newline character
const chapterText = Array.from(paragraphs).map(p => p.textContent.trim()).join('\n');
// Add the chapter text to the array
allChaptersText.push(chapterText);
});
// Join all chapters with two newline characters to separate them
const allText = allChaptersText.join('\n\n');
// Create a Blob with the combined text content
const blob = new Blob([allText], {
type: 'text/plain'
});
// Create an anchor element and use it to trigger the download
const anchor = document.createElement('a');
anchor.href = URL.createObjectURL(blob);
anchor.download = 'allChaptersText.txt';
document.body.appendChild(anchor);
anchor.click();
document.body.removeChild(anchor);
}
@aarongithu
Copy link

This seems really close to what I've been looking for. Thanks so much!

Is there a way to get it to save chapter numbers and titles, too? As it is, the text is just all one continuous block.

@yukiarimo
Copy link
Author

Didn’t it save them if it was written in the novel?

@aarongithu
Copy link

I'm not sure what you mean, but I started (as an example) here:

https://www.webnovel.com/book/i-can-copy-curses_29252483808354105/

...then I paged down as far as it would let me. Then I pasted the code in Console and used extractAndDownloadAllChapters();

It generated allChaptersText.txt , but none of the chapter titles were there; it was just a single uninterrupted flow of text.

@callofdady2k
Copy link

Hi, I’m facing an issue where the text downloaded using my script is coming out in Chinese for the initial chapters, even though it is supposed to be in English. After scrolling down on the page, I noticed that the text in the later chapters is in English as expected.

It seems like the content for earlier chapters might not be fully loaded when the script runs. Could this be related to lazy loading or dynamic content that only appears as I scroll? If so, how can I ensure all the content is properly loaded before extracting it?

Any guidance would be greatly appreciated. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment