Skip to content

Instantly share code, notes, and snippets.

View tswast's full-sized avatar

Tim Sweña (Swast) tswast

View GitHub Profile
@tswast
tswast / bigquery-to-polars-no-pyarrow.ipynb
Created September 6, 2024 21:16
notebooks demonstrating bigquery and polars integration without pyarrow
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
{"rowindex": 0, "repeated_struct_col": [{"nested_struct_col": [{"doubly_nested_array": [1, 2, 3], "doubly_nested_field": "a"}, {"doubly_nested_array": [4, 5, 6], "doubly_nested_field": "b"}]}, {"nested_struct_col": [{"doubly_nested_array": [-1, -2, -3], "doubly_nested_field": "z"}, {"doubly_nested_array": [-4, -5, -6], "doubly_nested_field": "y"}]}]}
{"rowindex": 1, "repeated_struct_col": [{"nested_struct_col": [{"doubly_nested_array": [10, 20, 30], "doubly_nested_field": "aa"}, {"doubly_nested_array": [40, 50, 60], "doubly_nested_field": "bb"}]}, {"nested_struct_col": [{"doubly_nested_array": [-10, -22, -33], "doubly_nested_field": "zz"}, {"doubly_nested_array": [-44, -55, -66], "doubly_nested_field": "yy"}]}]}
@tswast
tswast / generate_avro.py
Created July 23, 2020 21:49
Generate Random Data for Google Cloud Spanner Import
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@tswast
tswast / bigquery_github.py
Created September 12, 2018 21:26 — forked from crwilcox/bigquery_github.py
Scan GitHub using BigQuery
from google.cloud import bigquery
import json
GITHUB_USERNAME = 'crwilcox'
START_DATE = "2018-03-05"
END_DATE = "2018-08-31"
client = bigquery.client.Client()
query = f"""SELECT repository, type, event AS status, COUNT(*) AS count

Keybase proof

I hereby claim:

  • I am tswast on github.
  • I am timswast (https://keybase.io/timswast) on keybase.
  • I have a public key ASCPBmBWMaMiH6y4FPDP2Z_EbU9q5ASgdl3zoJAXrM5fGwo

To claim this, I am signing this object:

@tswast
tswast / 0-tinyarchive.py
Last active May 20, 2018 16:10
Create a self-contained HTML archive of your TinyLetter newsletter
# coding: utf-8
# Copyright 2018 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
@tswast
tswast / typical-usa-names-by-state.sql
Created April 4, 2017 19:49
Typical USA Names by State
#standardSQL
SELECT
a.name AS name,
a.state AS state,
a.gender AS gender,
a.year AS year,
a.number AS number,
a.name_frequency AS name_frequency
FROM
`usa_names.names_conditional_probabilities` a
@tswast
tswast / usa-names-conditional-probabilities.sql
Created April 4, 2017 19:29
USA Names Conditional Probabilities
#standardSQL
SELECT
a.name AS name,
a.state AS state,
a.gender AS gender,
a.year AS year,
a.number AS number,
(a.number / b.total_number) AS name_frequency
FROM
`bigquery-public-data.usa_names.usa_1910_current` a
@tswast
tswast / count-names.sql
Created March 3, 2017 22:23
Count the number of people with each name in the [USA Names public dataset](https://cloud.google.com/bigquery/public-data/usa-names).
#standardSQL
SELECT
name,
name_total,
SUM(name_total) OVER(ORDER BY name ASC) AS name_cumulative
FROM (
SELECT
name,
SUM(number) AS name_total
FROM
#standardSQL
SELECT
MAX(max) AS highest_high,
stn,
wban
FROM `bigquery-public-data.noaa_gsod.gsod*`
WHERE max != 9999.9
GROUP BY stn, wban
ORDER BY highest_high DESC