Skip to content

Instantly share code, notes, and snippets.

View MarkrJames's full-sized avatar

Mark James MarkrJames

  • Cardiff, Wales, UK
View GitHub Profile
@mehd-io
mehd-io / timewrapper.py
Created August 22, 2024 12:31
Measure time wrapper
import logging
from datetime import datetime
import functools
# Setting up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def measure_time(func):
@functools.wraps(func)
import requests
import time
import json
import base64
def get_notebook_content(notebook_id_or_name):
nb = notebookutils.notebook.get(notebook_id_or_name)
workspaceId = nb['workspaceId']
notebookId = nb['id']
format = 'ipynb'
@marklit
marklit / places.sql
Last active May 19, 2024 22:43
Pull H3s for Overture's Places Dataset for May 2024
COPY (
WITH a AS (
SELECT h3_cell_to_parent(h3_string_to_h3(SUBSTR(id, 0, 17)), 2) h3_2,
COUNT(*) num_recs
FROM read_parquet('s3://overturemaps-us-west-2/release/2024-05-16-beta.0/theme=places/type=place/*.parquet',
filename=true,
hive_partitioning=1)
GROUP BY 1
)
SELECT h3_cell_to_boundary_wkt(h3_2),
@Bilbottom
Bilbottom / customers-and-loans.sql
Created May 9, 2024 06:05
Mermaid + DuckDB for generating customer hierarchy diagrams
/*
Mermaid + DuckDB for generating customer hierarchy diagrams
DuckDB version: 0.10.2
Bill Wallis, 2024-05-09
*/
select version();
@Bilbottom
Bilbottom / er-diagram.mermaid
Last active November 21, 2024 14:59
Mermaid + DuckDB for generating entity-relationship diagrams
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mitchellh
mitchellh / merge_vs_rebase_vs_squash.md
Last active February 24, 2025 08:19
Merge vs. Rebase vs. Squash

I get asked pretty regularly what my opinion is on merge commits vs rebasing vs squashing. I've typed up this response so many times that I've decided to just put it in a gist so I can reference it whenever it comes up again.

I use merge, squash, rebase all situationally. I believe they all have their merits but their usage depends on the context. I think anyone who says any particular strategy is the right answer 100% of the time is wrong, but I think there is considerable acceptable leeway in when you use each. What follows is my personal and professional opinion:

@JHibbard
JHibbard / make_table_uris.py
Last active May 10, 2024 04:10
Example function for generating normalized table URIs reflecting the medallion architecture
from pathlib import Path
def make_table_uris(name: str, basepath: str='.'):
"""Example function for generating normalized table URIs
Args:
name: name of table to generate table URIs for
basepath: directory to nest table URIs under
@bluet
bluet / Database Naming Convention and Data Warehouse Design Principles.md
Last active March 2, 2025 00:29
Database Naming Convention and Data Warehouse Design Principles
@dgkeyes
dgkeyes / dk_summarize_with_totals
Created August 16, 2019 19:57
summarize with totals
dk_summarize_with_totals <- function(.data, group_by_var, mean_var){
groups_summary <- .data %>%
dplyr::group_by({{ group_by_var }}) %>%
dplyr::summarize(mean = mean({{ mean_var }})) %>%
dplyr::rename("group" = {{ group_by_var }} )
overall_summary <-.data %>%
dplyr::summarize(mean = mean({{ mean_var }})) %>%
dplyr::mutate(group = "Total")
@rgreenjr
rgreenjr / postgres_queries_and_commands.sql
Last active February 26, 2025 10:52
Useful PostgreSQL Queries and Commands
-- show running queries (pre 9.2)
SELECT procpid, age(clock_timestamp(), query_start), usename, current_query
FROM pg_stat_activity
WHERE current_query != '<IDLE>' AND current_query NOT ILIKE '%pg_stat_activity%'
ORDER BY query_start desc;
-- show running queries (9.2)
SELECT pid, age(clock_timestamp(), query_start), usename, query
FROM pg_stat_activity
WHERE query != '<IDLE>' AND query NOT ILIKE '%pg_stat_activity%'