Skip to content

Instantly share code, notes, and snippets.

@hubgit
hubgit / main.ts
Created December 11, 2023 12:18
Fetch tracks played on the Independent Music Podcast
import { DOMParser, type Element, } from "https://deno.land/x/[email protected]/deno-dom-wasm.ts";
const parser = new DOMParser()
const fetchDOM = async (url: string) => {
const response = await fetch(url)
if (!response.ok) {
throw new Error('Response was not ok')
}
const html = await response.text()
@hubgit
hubgit / chat.ts
Last active May 11, 2023 08:35
Vercel Edge Function for an OpenAI API request
import type { NextRequest } from 'next/server'
import { createParser } from 'eventsource-parser'
export const config = {
runtime: 'edge',
}
export default async function handler(req: NextRequest) {
const encoder = new TextEncoder()
const decoder = new TextDecoder()
@hubgit
hubgit / textract-pdf-tables.sh
Last active June 15, 2023 13:31
Extract tabular data from a PDF to CSV
# brew install awscli
# aws configure
aws s3 cp your-file.pdf s3://your-bucket/your-file.pdf
# https://pypi.org/project/amazon-textract-helper/
# https://github.com/aws-samples/amazon-textract-textractor/tree/master/helper
# pip install amazon-textract-helper
amazon-textract --input-document s3://your-bucket/your-file.pdf --features TABLES --pretty-print TABLES --pretty-print-table-format=csv
# https://aws.amazon.com/blogs/machine-learning/automatically-extract-text-and-structured-data-from-documents-with-amazon-textract/
[...document.querySelectorAll('div,main,body')].forEach(node => {
node.style.position = 'relative'
node.style.height = 'auto'
node.style.overflowY = 'visible'
});
[...document.querySelectorAll('button')].forEach(node => {
node.remove()
});
get_iplayer --pid m001d2h4 --subtitles --output "m001d2h4"
ffmpeg -i m001d2h4/Only_Connect_Series_18_-_07._Scrummagers_v_Crustaceans_m001d2h4_original.mp4 -vf "subtitles=m001d2h4/Only_Connect_Series_18_-_07._Scrummagers_v_Crustaceans_m001d2h4_original.srt" -ss 17:49 -t 5 -copyts output.mov
@hubgit
hubgit / line-reader-transform-stream.js
Created September 18, 2022 19:32
LineReader TransformStream
lineReader = () => {
let buffer = "";
return new TransformStream({
transform(chunk, controller) {
buffer += chunk;
const parts = buffer.split("\n");
parts.slice(0, -1).forEach((part) => controller.enqueue(part));
buffer = parts[parts.length - 1];
},
@hubgit
hubgit / genbank-to-sqlite.ts
Last active September 5, 2022 22:04
A ReadableStream created from an async iterator which fetches paginated data, piped into a WritableStream which inserts items into an SQLite database.
import { parse } from 'https://deno.land/x/[email protected]/mod.ts'
import { readableStreamFromIterable } from 'https://deno.land/[email protected]/io/streams.ts'
import { Database } from 'https://deno.land/x/[email protected]/mod.ts'
import ProgressBar from 'https://deno.land/x/[email protected]/mod.ts'
let counter = 0
const progress = new ProgressBar({
title: 'processing:',
interval: 100,
@hubgit
hubgit / README.md
Last active September 2, 2022 07:05
Processing the Crossref Public Data File

First, download the data files using a BitTorrent client:

aria2c https://academictorrents.com/download/4dcfdf804775f2d92b7a030305fa0350ebef6f3e.torrent

Next, convert the data files to a single newline-delimited JSON file:

deno run process.ts
@hubgit
hubgit / deno-cloud-storage-web-streams.ts
Last active August 27, 2022 20:58
Write to a file in a Google Cloud Storage bucket by piping a Web Stream to stdin of a `gcloud alpha storage cp` process.
export const cloudStorageJsonLinesWriter = (url: string) => {
// gcloud components install alpha
const process = Deno.run({
cmd: [
'gcloud',
'alpha',
'storage',
'cp',
'-',
url,
@hubgit
hubgit / deno-web-streams.ts
Last active December 10, 2023 22:31
Reader and Writer web streams for Deno
import { TextLineStream } from 'https://deno.land/[email protected]/streams/mod.ts'
// const input = await jsonLinesReader('input.jsonl.gz')
// const output = await jsonLinesWriter('output.jsonl.gz')
// for await (const item of input) {
//// do something
// await output.write(item)
// }