Skip to content

Instantly share code, notes, and snippets.

View hamelsmu's full-sized avatar
💻
Always learning.

Hamel Husain hamelsmu

💻
Always learning.
View GitHub Profile
@hamelsmu
hamelsmu / load_csv_data.py
Created May 11, 2022 03:03
Load CSV Data in Metaflow steps
from metaflow import FlowSpec, step
url = "https://raw.githubusercontent.com/Netflix/metaflow/master/metaflow/tutorials/02-statistics/movies.csv"
local_path = "./movies.csv"
class CSVFlow(FlowSpec):
@step
def start(self):
self.next(self.get_csv_from_web)
@step
@hamelsmu
hamelsmu / ghissues.sql
Last active April 28, 2022 01:47
Get Github Issues from Bigquery
-- Access this query at https://console.cloud.google.com/bigquery?sq=235037502967:a71a4b32d74442558a2739b581064e5f
SELECT url, title, body
FROM(
SELECT url, title, body
, RANK() OVER (PARTITION BY SUBSTR(body, 75, 125) ORDER BY url) as count_body_beg
FROM(
SELECT url, title, body
, RANK() OVER (PARTITION BY SUBSTR(body, 50, 100) ORDER BY url) as count_body_beg
FROM(
@hamelsmu
hamelsmu / docs_example.html
Created February 25, 2022 18:49
Example of Schema For Docs
<DocSection type="class" name="ExampleError" module="test_lib.example" link="test_lib/example.py#L155">
<SigArgSection>
<SigArg name="msg" type="str" default="foo"/>
<SigArg name="code" type="str" default="foo"/>
</SigArgSection>
<Description summary="Exceptions are documented in the same way as classes." extended_summary="The __init__ method may be documented in either the class level\ndocstring, or as a docstring on the __init__ method itself.\n\nEither form is acceptable, but the two should not be mixed. Choose one\nconvention to document the __init__ method and be consistent with it." />
<ParamSection name="Parameters">
<Parameter name="msg" type="str" desc="Human readable string describing the exception." />
<Parameter name="code" type=":obj:`int`, optional" desc="Numeric error code." />
</ParamSection>
@hamelsmu
hamelsmu / execute_nb.py
Created February 24, 2022 21:08
How to use nbconvertExecutePreprocessor
#From nbdev.export2html
class ExecuteShowDocPreprocessor(ExecutePreprocessor):
"An `ExecutePreprocessor` that only executes `show_doc` and `import` cells"
def preprocess_cell(self, cell, resources, index):
if not check_re(cell, _re_notebook2script):
if check_re(cell, _re_show_doc):
return super().preprocess_cell(cell, resources, index)
elif check_re_multi(cell, [_re_import, _re_lib_import.re]):
if check_re_multi(cell, [_re_export, 'show_doc', '^\s*#\s*import']):
# r = list(filter(_non_comment_code, cell['source'].split('\n')))
@hamelsmu
hamelsmu / predict_batch.py
Created February 20, 2022 07:15
How to make batch predictions in fastai
@patch
def predict_batch(self:Learner, item, rm_type_tfms=None, with_input=False):
dl = self.dls.test_dl(item, rm_type_tfms=rm_type_tfms, num_workers=0)
inp,preds,_,dec_preds = self.get_preds(dl=dl, with_input=True, with_decoded=True)
i = getattr(self.dls, 'n_inp', -1)
inp = (inp,) if i==1 else tuplify(inp)
dec_inp, nm = zip(*self.dls.decode_batch(inp + tuplify(dec_preds)))
res = preds,nm,dec_preds
if with_input: res = (dec_inp,) + res
return res
@hamelsmu
hamelsmu / example_nb_flow.py
Last active February 2, 2022 06:02
Four outerbounds blog post
from metaflow import step, current, FlowSpec, Parameter, card
from mymodel import train_model
class NBFlow(FlowSpec):
"A toy example of using the notebook card."
@step
def start(self):
# Train a model, save the results in `model_results`
self.model_results = train_model(...)
@hamelsmu
hamelsmu / pytorch_lightning_metaflow.md
Last active January 18, 2022 19:48
Feedback on @pytorch distributed training re: Metaflow

Feedback on @pytorch_parallel

Related to [metaflow #907](Netflix/metaflow#907 and the related docs draft

1. Developer Ergonimics

To use the feature, the user must learn a brand new way of doing foreach. This adds a high degree of congntive load, as the user must remember that for this particular use case and this use case only, they need to use self.next(..., num_parallel=...).

The api makes it very unclear where the parallelization is happening. For example, pytorch_lightning.trainer has an argument gpus=-1, which means that it will use all available gpus. In this case, what does num_parallel add to this? The user has lots of cognitive overload to have to reason about where and what kind of parallelization is happening.

@hamelsmu
hamelsmu / upload.py
Created May 26, 2021 03:51
How to upload data to Azure Blob Store
import os
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string(os.getenv('AZURE_STORAGE_CONNECTION_STRING'))
with open('gh_repo_topics.parquet', 'rb') as data:
blob_client = container_client.upload_blob(name="sample_data/gh_repo_topics.parquet", data=data, overwrite=True)
####### Downloading files ##########
f = container_client.download_blob("sample_data/gh_repo_topics.parquet")
@hamelsmu
hamelsmu / fastai-azureml.ipynb
Last active May 22, 2021 16:15
fastai example doesn't work
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.