Skip to content

Instantly share code, notes, and snippets.

View meddulla's full-sized avatar

Sofia meddulla

  • Cloudflare
  • Portugal
View GitHub Profile
import org.apache.kafka.streams.scala.ImplicitConversions._
import Serdes.{String, Long, sessionWindowedSerde}
case class PageInfo(mean: Long, max: Long, startTs: Long, endTs: Long, totalDuration: Long, totalPages: Int,
clickRate: Long)
implicit val pageInfoFormat = Json.format[PageInfo] // PlayJson
// explicit manifests
val printer: JsValue => String = (x: JsValue) => Json.stringify(x)
val pageManifest: Manifest[PageInfo] = ManifestFactory.classType(classOf[PageInfo])

Keybase proof

I hereby claim:

  • I am meddulla on github.
  • I am medula (https://keybase.io/medula) on keybase.
  • I have a public key ASB8Z86k4-JtpFzgN38CDY4UwcWGVuDKfSdFfMM17JmGlQo

To claim this, I am signing this object:

{"version":"2.0","label":"Fonte; Métrica; Periodicidade; Rácios; Rubrica; Setorização adicional; Atividade económica de referência REV3; Setor institucional de referência; Território de referência; Unidade de medida","id":["18","29","40","47","48","50","52","56","63","70","reference_date"],"size":[1,1,1,5,4,1,1,1,1,1,13],"extension":{"series":[{"id":202354,"label":"CP (% ativo)-SNF priv-exceto sedes sociais-Exportadoras","dimension-member":[{"member_id":35,"dimension_id":18},{"member_id":3609,"dimension_id":29},{"member_id":4267,"dimension_id":40},{"member_id":3502,"dimension_id":47},{"member_id":3704,"dimension_id":48},{"member_id":3169,"dimension_id":50},{"member_id":3156,"dimension_id":52},{"member_id":3021,"dimension_id":56},{"member_id":349,"dimension_id":63},{"member_id":3327,"dimension_id":70}]},{"id":202579,"label":"Fornecedores (% ativo)-SNF priv-exceto sedes sociais-Exportadoras","dimension-member":[{"member_id":35,"dimension_id":18},{"member_id":3609,"dimension_id":29},{"member_id":4267,"dimension_
@meddulla
meddulla / grep.py
Last active June 20, 2019 22:19
example beam pipelines
#!/usr/bin/env python
"""
Copyright Google Inc. 2016
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
@meddulla
meddulla / bigquery_notes.md
Created June 17, 2019 08:17 — forked from robcowie/bigquery_notes.md
Biquery Notes

Biqquery Notes

Require a partition filter on an existing table

bq update --require_partition_filter --time_partitioning_field ts -t page_impressions.raw

Copy a table

#!/usr/bin/env python
"""
Copyright Google Inc. 2016
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
@meddulla
meddulla / complete_shannon_entropy.py
Last active March 8, 2019 06:58
Claude Shannon's entropy
import math
import itertools as it
def window(iterable, size):
shiftedStarts = [it.islice(iterable, s, None) for s in range(size)]
return zip(*shiftedStarts)
def calculate_shannon_entropy(mystring, ngram=1):
@meddulla
meddulla / issue.py
Last active February 28, 2019 18:37
jsonschema issue
import json
from jsonschema import ValidationError, validate
from jsonschema import Draft4Validator, Draft6Validator, Draft7Validator
schema = {
"type" : "object",
"required": ["name"],
"properties" : {
"name" : {
"type" : "string"
@meddulla
meddulla / flatten_json_cols.py
Created October 9, 2018 08:19
flatten json into pandas columns
import os
import json
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize
# from https://www.kaggle.com/julian3833/1-quick-start-read-csv-and-flatten-json-fields
def load_df(csv_path='../input/train.csv', nrows=None):
JSON_COLUMNS = ['device', 'geoNetwork', 'totals', 'trafficSource']
@meddulla
meddulla / impute.py
Last active May 2, 2018 23:26
impute date with statsmodel.mice
import pandas as pd
import numpy as np
import statsmodels as sm
from statsmodels.imputation import mice
# also see https://pypi.org/project/fancyimpute/
data = pd.read_csv("Howell1.csv")
data.head()