Skip to content

Instantly share code, notes, and snippets.

View dataders's full-sized avatar

Anders dataders

View GitHub Profile
@dfalster
dfalster / addNewData.R
Last active February 19, 2023 00:29
The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
@scmx
scmx / using-details-summary-github.md
Last active March 10, 2025 06:40
Using <details> <summary> expandable content on GitHub with Markdown #details #summary #markdown #gfm #html

How to use <details> <summary> expandable content on GitHub with Markdown

Firstly, what is <details> <summary>?

The HTML Details Element (<details>) creates a disclosure widget in which information is visible only when the widget is toggled into an "open" state. A summary or label can be provided using the <summary> element. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/details.

Example

@tamuhey
tamuhey / tokenizations_post.md
Last active July 27, 2024 14:46
How to calculate the alignment between BERT and spaCy tokens effectively and robustly

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

image

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links:

@swanjson
swanjson / functionUpload.ipynb
Last active August 5, 2021 19:57
Local Blob Upload
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dataders
dataders / README.md
Created August 29, 2021 19:15
Stopping Junk Mail

swanderz's War on Junk Mail

1: Blanket Unsubscribes

Make sure everyone in your household has signed up for these two things. They'll prolly take a few months to kick in.

DMA Choice Pay $2 for 10 years of opt-outs

Moved to repo: /quenhus/uBlock-Origin-dev-filter

In order to keep filters up to date, please use this repo.

Evolving the dbt-materialize adapter

Tracking issue: #10600

Some things to consider:

Although sources and sinks are inverse concepts, sources have a one-to-many relationship with downstream relations, while sinks have a one-to-one relationship with upstream relations. Relations have a zero-to-many relationship with downstream sinks, though, which gets in the way of implementing them as inverse dbt concepts (e.g. using pre- and post-hooks).

Something else to consider is that source and sink configuration might have different ownership than model development in the wild (e.g. data engineers vs. analytics engineers), so it'd be preferable not to tightly couple them.

@ScottMaclure
ScottMaclure / slack_draft_deleter.js
Last active May 9, 2024 14:58
Slack Draft Deleter
// Remove all drafts from your drafts view
// Navigate to drafts
// F12 to raise dev console
// Paste the below
(async function(x) {
for (let e = document.querySelector('[type="trash"]'); e != null; e = document.querySelector('[type="trash"]')) {
e.click();
await new Promise(resolve => setTimeout(resolve, 500))
document.querySelector('[data-qa="drafts_page_draft_delete_confirm"]').click();
await new Promise(resolve => setTimeout(resolve, 1500))
@dataders
dataders / substrait_is_now.md
Created January 24, 2023 18:31
thinking out loud about substrait

working title

Codd, Chomsky, McKinney, and Wickham walk into a bar… (maybe Chamberlain and Wittgenstein should also be included?)

background

  • the dream of substrait is true separation b/w query engines and transformation APIs
  • previously, particular APIs would give better performance due to their inextricable link to the architecture of the underlying compute engine
  • if the above benefit is removed, folks could use the API with which they are most familiar
  • given this, we could see an industry consolidation around the “best” transformation API.

12 Principles of Agile Software Development

background

source: agilemanifesto.org/principles

Writing these up in a numbered, markdown-friendly list, because I'm nitpicky

principles