Skip to content

Instantly share code, notes, and snippets.

View akaanirban's full-sized avatar
🏠
Working from home

Anirban Das akaanirban

🏠
Working from home
  • Capital One
  • San Mateo, CA
View GitHub Profile
@akaanirban
akaanirban / README.md
Created November 8, 2023 18:39
Reverse Engineer docker image to build Dockerfile using docker history

The following is how you can sort of reverse engineer a docker image to get the Dockerfile using docker history. You can use a sophisticated tool like dive but that has its own problem.

Lets assume you have an image nvcr.io/nvidia/pytorch:23.10-py3

Run the following command to create a semi correct Dockerfile : docker history --no-trunc nvcr.io/nvidia/pytorch:23.10-py3 --format '{{ .CreatedBy }}' | tail -r > Dockerfile

The dockerfile:

/bin/sh -c #(nop)  ARG RELEASE
/bin/sh -c #(nop)  ARG LAUNCHPAD_BUILD_ARCH
@akaanirban
akaanirban / README.md
Created September 26, 2023 03:40
Download Models from Huggingface to a local directory and exclude `*.bin` files

Run the following code to download model to a. local directory from huggingface and exclude *.bin files:

import huggingface_hub

huggingface_hub.snapshot_download(repo_id="meta-llama/Llama-2-7b-chat-hf", local_dir="./meta-llama_Llama-2-7b-chat-hf", local_dir_use_symlinks=False, resume_download=True, ignore_patterns=["*.msgpack", "*.h5", "*.bin"])
print("done")
@akaanirban
akaanirban / parallel.py
Created August 31, 2021 02:33 — forked from thomwolf/parallel.py
Data Parallelism in PyTorch for modules and losses
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## Created by: Hang Zhang, Rutgers University, Email: [email protected]
## Modified by Thomas Wolf, HuggingFace Inc., Email: [email protected]
## Copyright (c) 2017-2018
##
## This source code is licensed under the MIT-style license found in the
## LICENSE file in the root directory of this source tree
##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"""Encoding Data Parallel"""
@akaanirban
akaanirban / different-ways-to-perform-gradient-accumulation.ipynb
Created August 30, 2021 21:26
Different ways to perform gradient accumulation.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@akaanirban
akaanirban / spacy_preprocessor.py
Created August 5, 2021 14:52 — forked from omri374/spacy_preprocessor.py
Text preprocessing using spaCy
import re
from typing import List
import spacy
from spacy.tokens import Doc
from tqdm import tqdm
class SpacyPreprocessor:
def __init__(
@akaanirban
akaanirban / README.md
Last active July 16, 2021 17:50
Setup script to configure a (GCP/AWS) Ubuntu VM with NVIDIA drivers and NVIDIA docker container toolkit.
#!/bin.bash

set -e

# More details on other OS in https://cloud.google.com/compute/docs/gpus/install-drivers-gpu

# install docker 
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - 
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" 
import dask
import dask.dataframe as dd
import pandas as pd
import pandas as pd
import numpy as np
from pandas.tseries.holiday import USFederalHolidayCalendar
import os
import time
import pyarrow.dataset as ds
@akaanirban
akaanirban / setup.md
Created April 23, 2021 22:47
How to set up Kind with multiple nodes, and Connect from a remote computer

The following shows how to setup Kind locally with multiple nodes and connect to it from a remote computer.

WARNING: DO NOT DO THIS UNLESS YOU KNOW AHT YOU ARE DOING. OR UNLESS YOU ARE IN A SUBNET. KIND HAS VERY LITTLE SECURITY AND EXPOSING IT TO OUTSIDE MAY COMPROMISE YOUR SYSTEM!

Step 1:

  • Install Kind in the local computer. Lets assume the ip of the local computer is a.b.c.d and you want the kubernetes control plane to run on port 4321.
  • Lets further suppose you want a kind deployment with 1 master node and 3 worker node. Some of this is taken from kubernetes-sigs/kind#873 (comment) .
  • Make a file kind_node_config and paste the following in it
    # four node (three workers) cluster config
    

kind: Cluster

@akaanirban
akaanirban / get_matplotlib_cmap_color_list.md
Created March 13, 2021 21:14
Get Color list from matplotlib Cmap
@akaanirban
akaanirban / read_cuda_tensor_in_cpu.md
Created January 24, 2021 16:16
How to read a pickled collection (list or dictionary etc.) of pytorch cuda tensor in cpu

What if you saved some loss values / accuracy values as a list of pytorch tensor in a system with cuda and then trying to plot the losses in a system with no GPU?

With some googling I found that the following code from (pytorch/pytorch#16797 (comment)) works fine! You just need to define the custome unpickler and use it in place of pickle.load!

import io
import torch
class CPU_Unpickler(pickle.Unpickler):
    def find_class(self, module, name):
 if module == 'torch.storage' and name == '_load_from_bytes':