Skip to content

Instantly share code, notes, and snippets.

View dannguyen's full-sized avatar
💭
havin a normal one

Dan Nguyen dannguyen

💭
havin a normal one
View GitHub Profile
@dannguyen
dannguyen / fix-macos-excel-access-denied-powerquery.md
Last active February 26, 2025 18:14
MacOS Excel Power Query M: How to fix DataSource.Error Access to the path '/yourpath.csv' is denied error

How to fix "DataSource.Error Access to the path '/yourpath.csv' is denied error" in macOS Excel Power Query

excel error box: [DataSource.Error] Access to the path '/Users/dan/Dropbox/mybook/mydata.csv' is denied.

One of the most incomprehensible errors I have ever run into, with Microsoft forums and ChatGPT/Claude being almost totally useless. Hopefully anyone else running into this situation will come across this gist and save themselves hours of frustration.

Huge thanks to Mr. Excel for the solution, with a major assist by r/excel

Background

@dannguyen
dannguyen / README.openai-structured-output-demo.md
Last active January 3, 2025 19:55
A basic test of OpenAI's Structured Output feature against financial disclosure reports and a newspaper's police blotter. Code examples use the Python SDK and pydantic for the schema definition.

Extracting financial disclosure reports and police blotter narratives using OpenAI's Structured Output

tl;dr this demo shows how to call OpenAI's gpt-4o-mini model, provide it with URL of a screenshot of a document, and extract data that follows a schema you define. The results are pretty solid even with little effort in defining the data — and no effort doing data prep. OpenAI's API could be a cost-efficient tool for large scale data gathering projects involving public documents.

OpenAI announced Structured Outputs for its API, a feature that allows users to specify the fields and schema of extracted data, and guarantees that the JSON output will follow that specification.

For example, given a Congressional financial disclosure report, with assets defined in a table like this:

@dannguyen
dannguyen / skimschema.py
Created September 18, 2024 18:03
A command-line python script that reads CSV files, samples their data, and prints the samples in transposed longform, i.e. one column per data row, one row per data attribute
#!/usr/bin/env python3
"""
skimschema.py
==============
Create an excel file of transposed data rows, for easy browsing of
a data file's contents (csvs only for now)
Longer description
@dannguyen
dannguyen / bq-sfpd-query.sql
Created July 26, 2022 16:00
Example of querying BigQuery's public dataset of SFPD crime incidents
SELECT
unique_key
, pddistrict AS pd_district
, DATE(timestamp) AS incident_date
, category
, descript AS description
, dayofweek AS day_of_week
, resolution
, UPPER(address) AS address
, longitude
@dannguyen
dannguyen / fetch_ghstars.md
Last active January 11, 2025 13:42
fetch_ghstars.py: quick CLI script to fetch from Github API all of a user's starred repos and save it as raw JSON and wrangled CSV

fetch_ghstars.py: quick CLI script to fetch and collate from Github API all of a user's starred repos

  • Requires Python 3.6+
  • Creates a subdir 'ghstars-USERNAME' at the current working directory
  • the raw JSON of each page request is saved as: 01.json, 02.json 0n.json
  • A flattened, filtered CSV is also created: wrangled.csv

Example usage:

@dannguyen
dannguyen / aws-transcribe-2020-10-biden-palin.md
Last active February 10, 2021 01:29
i only created this gist to respond to someone responding to my older aws-transcribe-via-cli gist

Amazon Transcribe (real-time) streaming sample, with speakers identified (2020-10-09)

Note: This gist refers this older gist that shows the AWS transcribe API: https://gist.github.com/dannguyen/9b8c51f5bb853209f19f1a0f18f0f74c

I went into the AWS console for Transcription, which has an interface for real-time transcription here: https://console.aws.amazon.com/transcribe/home?region=us-east-1#realTimeTranscription

Then I used my phone to play out this snippet of the 2008 VP presidential debate, featuring speech from Biden and Palin: https://twitter.com/dancow/status/1313951588428517385

fieldname value
act 1
scene 5
speaker Horatio
lines Propose the oath, my lord.
~~~~~~~~~
act 1
scene 5
speaker Hamlet
@dannguyen
dannguyen / README-xsv-split-windows.md
Last active August 27, 2020 07:00
How to install and use xsv to split a large CSV file (Windows)

How to use xsv (in Windows) to split up a CSV file too big for Excel

I wrote these instructions on how to install and use xsv – a powerful CSV-handling command-line tool, because someone asked how to deal with a data file that was too big to open in Excel or even Notepad. I didn't know how familiar the person was with installing/running downloadable .exe files or with Powershell, so I've tried to include some general instructions that hopefully are useful to even novices.

This mini-guide is not at all meant to be exhaustive as it basically shows just one of xsv's many useful functions. But if you're new to the idea of using command-line tools to do things, hopefully this can be a friendly intro to it.


Here's an example of a CSV that, at 3 million rows, is too big for Excel to open: https://burntsushi.net/stuff/worldcitiespop.csv

@dannguyen
dannguyen / bash-prompt.md
Last active August 19, 2020 00:05
my bash prompt with a ghost and stuff

this goes in my bash profile:

XRESET='\[\033[00m\]'
PROMPT_PATH="\[\033[0;33m\]\W${XRESET} \[\033[1;37m\]\$${XRESET}"
PROMPT_GHOST="༼ つ\[\033[1;33m\]°${XRESET}\[\033[1;31m\]︻\[\033[1;33m\]゜${XRESET}༽つ🐕"

export PS1="${PROMPT_GHOST} ${PROMPT_PATH} "
@dannguyen
dannguyen / normalize-ascii-google-sheet-README.md
Last active August 25, 2020 22:17
A modified Google App Script hack to normalize Vietnamese characters into ASCII