Skip to content

Instantly share code, notes, and snippets.

@zacharysyoung
zacharysyoung / README.md
Last active April 20, 2023 17:02
Answering SO-76062643

The simplest and (probably) most efficient way to read/write any-sized CSV in Python will always be to use the csv module—bar none.

Its reader provides a very simple interface for iterating the CSV a-row-at-a-time (so never more than one row's worth of memory consumed), and that row can be passed directly to the writer (which will probably be buffered, so minimal sys calls). But, the documentation doesn't show you how to do this, even though it's so simple:

reader → row → process(row) → writerow(row)
@zacharysyoung
zacharysyoung / README.md
Last active June 30, 2023 03:18
SO-76508000

Making it run not so slow

I mocked up a 60 MB XML by taking all the small samples in your original ZIP archive and just copying them all 200 times, which ended up with over 425k tok elements.

I then profiled your code and found a really bad culprit for chewing up time.

To process that XML took about 35 seconds:

Thu Jun 29 10:50:59 2023 profile.stats
@zacharysyoung
zacharysyoung / README.md
Last active November 17, 2023 20:22
To Go's encoding/csv: let my data be.

Let my data be

Go's encoding/csv Reader type takes the novel (to me) approach of deciding that carriage return line feeds (CRLFs) should be replaced with newlines (LFs).

It not only replaces CRLFs that mark then end of one record and the beginning of the next—the encoding of the data—it replaces all CRLFs at the end of any line of text—the data itself.

The CSV:

ID,Data
@zacharysyoung
zacharysyoung / README.md
Last active August 20, 2023 20:46
SO-76931363

I went for a solution that doesn't presuppose any kind of sorting: it just looks for a value and remembers in which column (on any row) it appeared.

Starting with this input:

a,b,a
c,c,b
d,e,e
@zacharysyoung
zacharysyoung / README.md
Last active October 19, 2023 00:00
SO-77312927

I recommend restructuring your filters from only proceeding (and indenting) if the criterium passes, to skipping the row if any criterium fails. This has a couple of benefits:

  • keeping the code from creeping to the right
  • you can add debug messages to print when a row doesn't match
  • you can comment-out any single criterium without affecting the others

I test the participant IDs differently than you did, but your method of:

@zacharysyoung
zacharysyoung / main.go
Last active October 24, 2023 16:35
The "PIN code problem": combinatoric iteration, with recursion and attempt at something like a cartesian product
package main
import (
"fmt"
"slices"
"strings"
)
// <https://codereview.stackexchange.com/questions/229042/find-neighboring-pins-on-a-numeric-keypad>
// Your colleague forgot the pin code from the door to the office.
@zacharysyoung
zacharysyoung / README.md
Last active December 5, 2023 22:07
Single-byte encodings

Character abbreviations

Abbrev Description Decimal Hex
NUL null character 0 00
SOH start of heading 1 01
STX start oftext 2 02
@zacharysyoung
zacharysyoung / open-tabs.js
Created November 7, 2023 04:38
Open multiple tabs from JavaScript
/**
* Make sure to check in the tab you run this script from
* for any kind of notification about pop-ups being blocked
* then allow for this site/page only.
*
* https://stackoverflow.com/questions/63237482/open-multiple-tabs-with-javascript
*/
const anchors = document.getElementsByTagName('a');