Skip to content

Instantly share code, notes, and snippets.

View ksindi's full-sized avatar
🚀
Shipping

Kamil Sindi ksindi

🚀
Shipping
View GitHub Profile
@ksindi
ksindi / k8s-svc-annotations.md
Created December 8, 2017 18:14 — forked from mgoodness/k8s-svc-annotations.md
AWS ELB-related annotations for Kubernetes Services (v1.5)
  • service.beta.kubernetes.io/aws-load-balancer-access-log-emit-interval (in minutes)
  • service.beta.kubernetes.io/aws-load-balancer-access-log-enabled (true|false)
  • service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name
  • service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix
  • service.beta.kubernetes.io/aws-load-balancer-backend-protocol (http|https|ssl|tcp)
  • service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled (true|false)
  • service.beta.kubernetes.io/aws-load-balancer-connection-draining-timeout (in seconds)
@ksindi
ksindi / pagerank.py
Created December 7, 2017 15:00 — forked from diogojc/pagerank.py
python implementation of pagerank
import numpy as np
from scipy.sparse import csc_matrix
def pageRank(G, s = .85, maxerr = .001):
"""
Computes the pagerank for each of the n states.
Used in webpage ranking and text summarization using unweighted
or weighted transitions respectively.
@ksindi
ksindi / serialize-numpy-array.py
Created November 20, 2017 13:16 — forked from alexland/serialize-numpy-array.py
serialize, persist, retrieve, and de-serialize a NumPy array as a binary string (any dimension, any dtype); exemplary use case: a web app calculates some result--eg, from a Machine Learning algorithm, using NumPy and the result is a NumPy array; it is efficient to just return that result to rather than persist the array then retrieve it via query
import time
import numpy as NP
from redis import StrictRedis as redis
# a 2D array to serialize
A = 10 * NP.random.randn(10000).reshape(1000, 10)
# flatten the 2D NumPy array and save it as a binary string
array_dtype = str(A.dtype)
@ksindi
ksindi / pdio.py
Created November 19, 2017 14:42 — forked from luispedro/pdio.py
Save & load from a pandas DataFrame/Series
import numpy.lib
import numpy as np
import pandas as pd
import cPickle as pickle
def save_pandas(fname, data):
'''Save DataFrame or Series
Parameters
----------
"""
Notes:
- Is about 2ms for (100, (10000, 100)) shape inputs on my i7 laptop
- It's 2x faster without doing vector normalize (might make sense to pre-normalize the vectors)
"""
import numpy as np
import numba
@ksindi
ksindi / gist:09a0fb58d479b55d6168d9094ce38d42
Created November 14, 2017 15:38 — forked from FedericoV/gist:0e7d6d8c8794a99a7a42
Cosine Similarity that handles NaN with Numba
import numba
@numba.jit(target='cpu', nopython=True)
def fast_cosine(u, v):
m = u.shape[0]
udotv = 0
u_norm = 0
v_norm = 0
for i in range(m):
if (np.isnan(u[i])) or (np.isnan(v[i])):
@ksindi
ksindi / inception_annoy.py
Created November 13, 2017 16:27 — forked from thomasdullien/inception_annoy.py
Inception for feature extraction, ANNoy for nearest-neighbor search
"""
Simple, hacked-up image similarity search using Tensorflow + the inception
CNN as feature extractor and ANNoy for nearest neighbor search.
Requires Tensorflow and ANNoy.
Based on gist code under
https://gist.github.com/david90/e98e1c41a0ebc580e5a9ce25ff6a972d
"""
from annoy import AnnoyIndex
"""
Dockerfile
FROM python:3.6
RUN apt-get install -y ghostscript \
libmagickwand-dev
RUN pip install wand PyPDF2
import os
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
NUM_FIELDS = 600
NUM_TABLES = 100
fields = [
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
fields = [
pa.field('column1', pa.string()),
pa.field('column2', pa.int64()),
pa.field('column3', pa.string()),
]