Skip to content

Instantly share code, notes, and snippets.

@zacharysyoung
zacharysyoung / getForks.jq
Created January 3, 2023 01:52
Get forks JQ
# $ageThresholdInSeconds passed in from command-line, like `jq --arg ageThresholdInSeconds 10 -f <this-file>`
def selectOnlyModifiedForks:
((.updated_at | fromdate) - (.created_at | fromdate)) as $ageDiffInSeconds
| select(
$ageDiffInSeconds > ($ageThresholdInSeconds | tonumber)
)
;
[
.[]
@zacharysyoung
zacharysyoung / README.md
Last active November 1, 2022 17:09
A suggestion for SO-74269825

I want to suggest that you first combine all your CSVs into a single CSV, make sure that's correct, then convert the single CSV to XML:

  1. you'll be able to verify the intermediate, combined result
  2. issues like overwriting data simply disappear when any file is only being written to once

This takes more lines of code, and I find it's easier to get correct:

@zacharysyoung
zacharysyoung / main.go
Created September 22, 2022 06:23
ChromeDP, get whole response body
// Get the entire response body of the Navigate() (?) as a string.
package main
import (
"context"
"fmt"
"log"
"time"
"github.com/chromedp/cdproto/cdp"
@zacharysyoung
zacharysyoung / main.go
Created June 16, 2022 10:47
Go VS Python, silly but real metrics
package main
func main() {
for i := 0; i < N; i++ {
}
}
@zacharysyoung
zacharysyoung / README.md
Last active July 5, 2023 17:27
Start an HTTP server and listen for a response, but only for so long

Shutting down HTTP servers

Shutdown with WaitGroup and Context

wg_context.go

From

@zacharysyoung
zacharysyoung / make_editable.js
Last active May 11, 2022 17:34
Make all PDF form fields editable, in Acrobat.
/* globals getField */
// From https://answers.acrobatusers.com/Script-change-fields-read-specific-fields-q296813.aspx
for(var i = 0; i < this.numFields; i++)
{
var fieldName = this.getNthFieldName(i);
getField(fieldName).readonly = false;
getField(fieldName).locked = false;
}
console.println('\nDone');
#!/usr/bin/env python3
# https://stackoverflow.com/a/71784820/246801
# Misses "interior" extraneous whitespaces
block = ["Line 1\n", " Line 2\n", "Line 3\n"]
list_comp = [x.strip() for x in block]
func_chain = "".join(block).strip().split("\n")
#!/bin/sh
# Join Part-A and Part-B
gocsv join -c 'label' -outer file1.csv file2.csv > joined.csv
echo 'Joined'
gocsv view joined.csv
# Rename the two samely-named 'label' columns to unique names
gocsv rename -c 1 -names 'Label_A' joined.csv | gocsv rename -c 3 -names 'Label_B' > renamed.csv
echo 'Renamed key cols'

Merge hundreds of CSVs, each with millions of rows

500 CSVs, each with over 1 million rows need to be merged together into one CSV.

  • each CSV represents a sensor which recorded a value and the timestamp of the recording, will millions of timestamp/value rows
  • all CSVs have the same number of rows

How can we "merge" the CSVs such that each sensor's value-column is added to the merged CSV (500 value columns), and the timestamps for each row for each sensor are averaged into a single column?

sensor1.csv
#!/usr/bin/env python3
import csv
import random
# Used to characterize answer for https://stackoverflow.com/questions/75578992
with open("input.csv", "w", newline="") as f:
w = csv.writer(f)
w.writerow(["RowNum", "ID"])
for i in range(20_000_000):