Skip to content

Instantly share code, notes, and snippets.

# loads the result of a query on Tinybird to a Pandas DataFrame
import pandas as pd
import urllib.parse
def build_tinybird_query_url(query: str, token: str) -> str:
url = f"https://api.tinybird.co/v0/sql?q={urllib.parse.quote(query + ' FORMAT CSVWithNames')}&token={token}"
return url
@xoelop
xoelop / regexp_quote.sql
Last active March 10, 2021 13:23
Function to escape special characters in a pattern in Postgres. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it
CREATE OR REPLACE FUNCTION public.regexp_quote(IN TEXT)
RETURNS TEXT
LANGUAGE plpgsql
STABLE
AS $$
/*******************************************************************************
* Function Name: regexp_quote
* In-coming Param:
* The string to decoded and convert into a set of text arrays.
* Returns:
@xoelop
xoelop / create_events_table_postgres_inserts_data.sql
Created March 17, 2021 10:33
Creates a postgres db locally, runs postgres server and inserts events data
DROP TABLE IF EXISTS events;
CREATE TABLE events
(
date timestamp,
product_id text,
user_id bigint,
event text,
extra_data json
);
@xoelop
xoelop / Stock charts with dynamic tooltip.json
Created May 10, 2021 12:21
Stock charts with dynamic tooltip
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {"url": "data/stocks.csv"},
"width": 400,
"height": 300,
"encoding": {"x": {"field": "date", "type": "temporal"}},
"layer": [
{
"encoding": {
"color": {"field": "symbol", "type": "nominal"},
@xoelop
xoelop / convert_to_csv_middleware.py
Last active January 17, 2022 16:53
A middleware that transforms the JSON output from a FastAPI endpoint into CSV content
import json
import io
from fastapi.responses import StreamingResponse
from fastapi import Request, Response
from fastapi import FastAPI
app = FastAPI()
@xoelop
xoelop / ingest_data_to_tb_deduplicate.sh
Last active April 23, 2023 19:43
Deduplicating rows on Tinybird almost on ingestion time
# full process. Ingests data from postgres to tb, calculates duplicates, inserts them to a new datasource and removes the rows from that datasource that appear in the original one.
source $(pwd)/.env
# related, to ingest data from postgres: https://blog.tinybird.co/2019/10/14/the-one-cron-job-that-will-speed-up-your-analytical-queries-in-postgres-a-hundred-fold/
echo 'Ingesting most recently update jobs'
psql $HEROKU_POSTGRES_URL -c "COPY (SELECT id, url, job_title, company, description, description_cleaned, date_posted, now() FROM job WHERE COALESCE(description, '') <> '' AND updated_at > now() - interval '70 minutes') TO STDOUT WITH (FORMAT CSV)" | curl -F csv=@- "https://api.tinybird.co/v0/datasources?name=jobs&mode=append&token=$TINYBIRD_ADMIN_TOKEN";
-- s1 is not used
WITH s1 AS (
SELECT
linkedin_slug
, count(*) OVER (PARTITION BY 1)
, max(updated_at) max_updated_at
FROM person
GROUP BY linkedin_slug
HAVING count(*) > 1
)
@xoelop
xoelop / twitter_theirstack_enrich_result.json
Last active June 23, 2022 16:14
This is the result of enriching 'twitter.com' with TheirStack.com
[
{
"name": "Twitter",
"url": "twitter.com",
"industry": "internet",
"country": "United States",
"employee_count": 8200,
"linkedin_url": "http://www.linkedin.com/company/twitter",
"technology_names": [
"4D",
@xoelop
xoelop / Explain SQLAlchemy Postgres queries
Last active August 10, 2022 10:20
Some functions to see the execution plan of a Postgres query emitted by SQLAlchemy
# Source: https://github.com/sqlalchemy/sqlalchemy/wiki/Query-Plan-SQL-construct
# This adds the last function, to print the query plan
# Caveats: stmt has to be built using sqlalchemy.select(...). If you use session.query(...) it'll fail.
# This is Postgres-only
# Guide to migrate to SQLAlchemy 2.0-style (from session.query() to select(...) ): https://docs.sqlalchemy.org/en/14/changelog/migration_20.html#migration-orm-usage
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy.sql.expression import Executable
@xoelop
xoelop / download_stripe_invoices.sh
Last active January 25, 2023 15:05
Downloads all paid invoices on Stripe, collected automatically, between 2 dates, in PDF
# if .env, source .env
if test -f .env; then
source .env &&
echo `date`: sourcing .env
fi
# mkdir if not exists
mkdir -p data/script_invoices
# download invoices created later or on this date (yyyy-mm-dd)