Skip to content

Instantly share code, notes, and snippets.

@zacharysyoung
zacharysyoung / main.go
Created September 22, 2022 06:23
ChromeDP, get whole response body
// Get the entire response body of the Navigate() (?) as a string.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/chromedp/cdproto/cdp"
@zacharysyoung
zacharysyoung / README.md
Last active November 1, 2022 17:09
A suggestion for SO-74269825

I want to suggest that you first combine all your CSVs into a single CSV, make sure that's correct, then convert the single CSV to XML:

  1. you'll be able to verify the intermediate, combined result
  2. issues like overwriting data simply disappear when any file is only being written to once

This takes more lines of code, and I find it's easier to get correct:

@zacharysyoung
zacharysyoung / getForks.jq
Created January 3, 2023 01:52
Get forks JQ
# $ageThresholdInSeconds passed in from command-line, like `jq --arg ageThresholdInSeconds 10 -f <this-file>`
def selectOnlyModifiedForks:
((.updated_at | fromdate) - (.created_at | fromdate)) as $ageDiffInSeconds
| select(
$ageDiffInSeconds > ($ageThresholdInSeconds | tonumber)
)
;
[
.[]
@zacharysyoung
zacharysyoung / gentree.go
Last active January 26, 2023 21:51
Print directory/file tree.
package main
// Print directory tree from first arg.
//
// https://gist.github.com/zacharysyoung/64b6593f7d0314d0eb29bbc9ef121f1e
import (
"fmt"
"log"
"os"
@zacharysyoung
zacharysyoung / run_time.py
Last active February 8, 2023 18:51
Iteratively run and time a program, extracting results from `/usr/bin/time -l ...`
#!/usr/bin/env python3
import csv
import re
import subprocess
from typing import TypedDict
import glob, os, sys
@zacharysyoung
zacharysyoung / README.md
Last active February 20, 2023 17:58
Trying to help OP solve SO-75413294

Establish CSV read/process/write baseline

  • gen.py: pass N as a cmd arg for the number of rows to create and save as test-N.csv, incrementing a date and time column for each row by 1 hour
  • filter.py: pass N as cmd arg to filter test-N.csv by some date criteria and write test-N-out.csv
  • run_test.py: run gen and filter together for a few Ns and get their timings:

When I run python3 run_test.py I get:

Test N row specs DT Start DT End time (s)
  1. Download all the .py scripts and run.sh
  2. pip install json-stream
  3. sh run.sh

run.sh calls the run_*.py scripts, which will run gen_json.py to generate three JSON test files of varying size.

The generate JSON looks like:

@zacharysyoung
zacharysyoung / README.md
Last active March 4, 2023 20:36
Answering SO-75608149

For CSV to dataclass...

I orginally had this logic to check if the row of a CSV contained any blank values:

n_cols = len(rows[0])
for row in rows:
 if len([x for x in row if x]) != n_cols:
@zacharysyoung
zacharysyoung / README.md
Last active March 11, 2023 01:21
Trying to help answer SO-75698546
  • input.xml: a sample of OP's XML. The downloaded XML incorrectly states its encoding as ISO-8859-1; it really is encoded as Windows-1252. I've tried viewing the Raw representation in this Gist, and copying-pasting over my original file; doing so, git doesn't alert me of any modificaions, so I presume we are copying-pasting the Windows-1252 encoding.
  • main.py: OP's original program with some small tweaks for style and type correction, and I fixed the issue with not iterating the rupture nodes.
  • output.csv: what main.py generates given input.xml
@zacharysyoung
zacharysyoung / README.md
Last active March 30, 2023 21:33
Airtable Rate-limiting, concurrent requests

Testing Airtable Rate-limiting

The following code has revealed different behavior over time. When I first tested this, I saw behavior like the following where 50 requests were kicked off at once, all requests completed (with a 200), but subsequent requests took longer and longer. The following output also shows a minimum response time of 1.5s, which I think (I remember) shows that Airtable's dynamic back-offs were sticky:

Submitting 50 requests "at once"

All 50 are sent in the span of one second, but each subsequent request takes progressively longer to respond:

started request 42 at Jul 21 11:23:27.565001, ended in 1.50s