Skip to content

Instantly share code, notes, and snippets.

View maneeshdisodia's full-sized avatar

maneesh disodia maneeshdisodia

  • AI Architect @ Altimetrik
  • India
View GitHub Profile
@elavenrac
elavenrac / Offline-Dataflow.md
Last active June 29, 2022 07:02
GCP Dataflow processing with no external IPs

GCP Dataflow Pipelines

This gist is a detailed walkthrough on how to deploy python Dataflow pipelines in GCP to run without external IPs. Full code samples are available below.

This walkthrough assumes you have a already authenticated with gcloud login commands and have the appropriate IAM privileges to execute these operations.

Step 1 - Gather application dependencies

Since we are planning to use no external IPs on our dataflow worker nodes, we must package up all our application dependencies for an offline deployment. I highly recommend using a virtual environment as your global dependencies will be much more than your single application will require.

Dump your application dependencies into a single file.

@widdowquinn
widdowquinn / jupyterhub_aws.md
Created December 9, 2017 16:33
Set up JupyterHub on AWS

JupyterHub on AWS

EC2 Setup

  • Log in to AWS
  • Go to a sensible region
  • Start a new instance with Ubuntu Trusty (14.04) - compute-optimised instances have a high vCPU:memory ratio, and the lowest-cost CPU time. c4.2xlarge is a decent choice.
  • Set security group (firewall) to have ports 22, 80, and 443 open (SSH, HTTP, HTTPS)
  • If you want a static IP address (for long-running instances) then select Elastic IP for this VM
  • If you want to use HTTPS, you'll probably need a paid certificate, or to use Amazon's Route 53 to get a non-Amazon domain (to avoid region blocking).
@w121211
w121211 / spikes.ipynb
Last active March 5, 2025 16:45 — forked from jkibele/spikes
Identifying Spikes in timeseries data with Pandas
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@yong27
yong27 / apply_df_by_multiprocessing.py
Last active April 12, 2023 04:35
pandas DataFrame apply multiprocessing
import multiprocessing
import pandas as pd
import numpy as np
def _apply_df(args):
df, func, kwargs = args
return df.apply(func, **kwargs)
def apply_by_multiprocessing(df, func, **kwargs):
workers = kwargs.pop('workers')