Skip to content

Instantly share code, notes, and snippets.

@achinta
achinta / new_install.sh
Created December 11, 2018 16:08
Mac Post Installations
#Video conversion tools
xcode-select --install
#from https://gist.github.com/clayton/6196167#gistcomment-2777274
brew install ffmpeg --with-aom \
--with-chromaprint \
--with-fdk-aac \
--with-fontconfig \
--with-freetype \
--with-frei0r \
--with-game-music-emu \
@achinta
achinta / pandas_agg.py
Last active April 1, 2019 04:15
Pandas apply multiple aggregations to multiple columns and rename columns appropriately
"""
Groupby, apply aggregations and rename columns for a dataframe. When we apply multiple aggregations to a df,
two level column names are created. Instead, this method returns a dataframe with format 'colname_agg'
example:
df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
'B': range(5),
'C': range(5)})
groupByCols = ['A']
agg = {'B': ['sum','mean'], 'C': 'min'}
@achinta
achinta / biquery.sh
Created February 27, 2019 05:05
export-data-to-bigquery
# export data from database as csv and copy to google storage
gsutil cp abhyasis.csv gs://my-bucket
#create dataset in bigquery (onetime, say 'aims')
bq --location=US mk --dataset [PROJECT_ID]:aims
#create table in bigquery (onetime, say 'abhyasis')
bq mk --table [PROJECT_ID]:aims.abhyasis
#load from google storage to bigquery
@achinta
achinta / pyspark_fill.py
Last active November 7, 2024 23:16
Forward Fill in Pyspark
import pyspark.sql.functions as F
from pyspark.sql import Window
df = spark.createDataFrame([
('d1',None),
('d2',10),
('d3',None),
('d4',30),
('d5',None),
('d6',None),
@achinta
achinta / CategoryEncoder.py
Last active July 18, 2022 11:55
Category Encoder - fit partial
from collections.abc import Iterable
class CategoryEncoder(object):
"""
Once fit method is called, sklearn.preprocessing.LabelEncoder cannot encode new categories.
In this category encoder, fit can be called any number times. It encodes categories which it has not seen before,
without changing the encoding of existing categories.
Usually the first category has encoded value of zero. We can override it with value 'start'
"""
@achinta
achinta / coursera_rename.py
Created January 7, 2020 16:12
Copy downloaded pdfs(and other docs) to a single folder after coursera-dl
"""
Run this in the root of the downloaded folder. It will copy the pdf files from sub folders to base folder
"""
import os
from glob import glob
import shutil
cwd = os.getcwd()
if not os.path.isdir('renamed'):
os.mkdir('renamed')
from sklearn.base import TransformerMixin
class CuCategoryEncoder(TransformerMixin):
"""
Runs on GPU using cudf
Once fit method is called, sklearn.preprocessing.LabelEncoder cannot encode new categories.
In this category encoder, fit can be called any number times. It encodes categories which it has not seen before,
without changing the encoding of existing categories.
"""
# categories as series
@achinta
achinta / lock.py
Created January 29, 2021 13:06
Lock File Decorator
from pathlib import Path
import time
import json
from typing import Dict
def lock_file(func):
'''
Decorator which
'''
def wrapper(path: Path, data):
@achinta
achinta / index_gita_in_es.ipynb
Last active June 18, 2022 06:52
Elastic Search functions
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@achinta
achinta / postman.js
Last active April 22, 2022 16:41
Postman Snippets - creating variables in pre-request script
// create random variable with timestamp as suffix
var ts = pm.variables.replaceIn("{{$timestamp}}");
pm.collectionVariables.set("name_ts", 'test ' + ts);
//create timestamp variables
var moment = require('moment');
pm.collectionVariables.set('arrival_date', moment().format(("YYYY-MM-DD HH:mm:ss")));
pm.collectionVariables.set('departure_date', moment().add(5, 'days').format(("YYYY-MM-DD HH:mm:ss")));