Skip to content

Instantly share code, notes, and snippets.

@miy4
Last active November 23, 2024 09:15
Show Gist options
  • Select an option

  • Save miy4/a050122a5517e5b869a5fbd8679a42a8 to your computer and use it in GitHub Desktop.

Select an option

Save miy4/a050122a5517e5b869a5fbd8679a42a8 to your computer and use it in GitHub Desktop.
HTML to Markdown: @mozilla/readability + pandoc
#!/usr/bin/env -S deno run -A --ext ts
import $ from "jsr:@david/[email protected]";
import { parseArgs } from "node:util";
import { Readability } from "npm:@mozilla/[email protected]";
import { JSDOM } from "npm:[email protected]";
const { code } = await $`/bin/which pandoc`.noThrow().quiet();
if (code !== 0) {
console.error("Command not found: pandoc\nPlease install pandoc first.");
Deno.exit(1);
}
const { positionals } = parseArgs({
args: Deno.args,
allowPositionals: true,
options: {},
});
const url = positionals[0];
const html = await (await fetch(url)).text();
const doc = new JSDOM(html).window.document;
const article = new Readability(doc).parse();
if (!article) {
console.error(`Failed to parse article: ${url}`);
Deno.exit(1);
}
const buf = `<header><h1>${article.title}</h1></header>${article.content}`;
const md = await $`pandoc -f html -t gfm-raw_html`.stdinText(buf).text();
console.log(md);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment