Skip to content

Instantly share code, notes, and snippets.

View janbenetka's full-sized avatar

Jan Benetka janbenetka

  • Unacast
  • Pilsen, Czech Republic
View GitHub Profile
@janbenetka
janbenetka / country_codes.txt
Last active October 5, 2020 07:55
[Country codes] #geo
AF Afghanistan
AL Albania
DZ Algeria
AS American Samoa
AD Andorra
AO Angola
AI Anguilla
AQ Antarctica
AG Antigua and Barbuda
AR Argentina
@janbenetka
janbenetka / pandas_value_counts.py
Created October 3, 2020 22:30
[Pandas value counts] Counts number of observations for distinct values #pandas #dataframes
# number of counts per value
df['genre'].value_counts()
# fraction per value (sums up to 1.0)
df['genre'].value_counts(normalize=True)
@janbenetka
janbenetka / total_row_col_pandas.py
Created October 3, 2020 22:25
[Total sum row and column in Pandas] Add total row and coumn in Pandas dataframe #pandas #dataframes
df = pd.DataFrame(dict(A=[2,6,3],
B=[2,2,6],
C=[3,2,3]))
df['col_total'] = df.apply(lambda x: x.sum(), axis=1)
df.loc['row_total'] = df.apply(lambda x: x.sum())
@janbenetka
janbenetka / pandas_display_options.py
Created October 3, 2020 22:23
[Pandas settings/options] Setting display options in pandas #pandas #dataframes
import pandas as pd
pd.options.display.max_columns = 50 # None -> No Restrictions
pd.options.display.max_rows = 200 # None -> Be careful with this
pd.options.display.max_colwidth = 100
pd.options.display.precision = 3
@janbenetka
janbenetka / bar_chart_plotly.py
Last active August 20, 2021 05:16
[Bar chart in Plotly] #plotly #python
import plotly.graph_objects as go
countries = increase_per_country.country
fig = go.Figure()
fig.add_trace(go.Bar(
x=countries,
y=increase_per_country.identifier_count_new,
name='New data',
marker_color='#FF8000'
@janbenetka
janbenetka / partitioned_table.sql
Last active October 1, 2020 17:18
[BigQuery Partitioned Table DDL] #sql #bigquery
CREATE TABLE dataset.new_table
PARTITION BY DATE(timestamp_column) AS
SELECT x, y, z, timestamp_column
FROM dataset.existing_table
CREATE OR REPLACE TABLE `uc-prox-core-dev.14_days_retention.home_od_flux_2020_aggregated_condensed`
PARTITION BY minProcessingDate
AS
SELECT...
@janbenetka
janbenetka / unacast_template.py
Created September 30, 2020 22:02
[Plotly Unacast Template] #plotly
!pip install plotly==4.8
# Quick way to install hind font
!npm install -g google-font-installer
!gfi install hind -v 300
import plotly.graph_objs as go
import plotly.io as pio
pd.options.plotting.backend = "plotly"
@janbenetka
janbenetka / counties_fips_codes.md
Last active July 21, 2021 17:27
[Important Counties (FIPS codes)] FIPS codes #fips #geo #newyork #sf

San Francisco: 06075
Chicago (Cook county): 17031
Los Angeles County: 06037
Seattle (King County): 53033

New York:
New York County (Manhattan): 36061
Kings County (Brooklyn): 36047
Bronx County (The Bronx): 36005

@janbenetka
janbenetka / airflow_slack_messaging.py
Created September 24, 2020 12:15
[Airflow > Slack message] Creation of DAG that notifies Slack channel #Airflow
slack_message = "Home & Work for bundleId {} is ready for shipping!".format("{{ ti.xcom_pull(task_ids='get_bundle_id') }}")
notify_slack = SimpleHttpOperator(
task_id="notify_slack",
endpoint=slack_webhook_url,
data=json.dumps(
{"attachments": [{"fallback": slack_message, "text": slack_message, "color": "good"}]}),
headers={"Content-Type": "application/json"},
depends_on_past=False,
response_check=lambda response: True if response.status_code == 200 else False,
http_conn_id="http_slack_hook"
@janbenetka
janbenetka / bigquery_centroid.sql
Last active September 21, 2021 04:15
[Bigquery Centroid] #bigquery #geo #coordinates
ST_Y(ST_CENTROID(geog)) AS tract_lat,
ST_X(ST_CENTROID(geog)) AS tract_lon,