Skip to content

Instantly share code, notes, and snippets.

View napsternxg's full-sized avatar
🎯
Focusing

Shubhanshu Mishra napsternxg

🎯
Focusing
View GitHub Profile
"""Faster Implementation of Unsupervised Query Segmentation.
Uses vectorized operations
- author: @napsternxg
Unsupervised Query Segmentation Using only Query Logs [Mishra et. al. 2011]
https://www.microsoft.com/en-us/research/wp-content/uploads/2011/01/pp0295-mishra.pdf
@napsternxg
napsternxg / app.py
Created June 15, 2023 07:27
Queued Map with retries
from flask import Flask, jsonify, request, render_template
from queued_map import example_items
app = Flask(__name__)
@app.get("/")
@app.get("/<int:n>")
def home(n: int=10):
output = example_items(n)
@napsternxg
napsternxg / async_decorator.py
Created June 15, 2023 07:22
Async Decorator
import asyncio
def async_decorator(acreate_fn):
async def _f(*args, **kwargs):
print(f"Decorated fn: {args=}, {kwargs=}. Sleeping.")
await asyncio.sleep(0.1)
return await acreate_fn(*args, **kwargs)
return _f
@napsternxg
napsternxg / prune_sklearn_model.py
Last active June 14, 2023 21:41
Prune Sklearn TF-IDF Logistic Regression model
from copy import deepcopy
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy import sparse
from joblib import dump, load
import joblib
import time
@napsternxg
napsternxg / food.com.download.sh
Last active June 10, 2023 13:29
Food.com sitemap
mkdir food.com
cd food.com
wget https://www.food.com/sitemap.xml
for url in $(cat sitemap.xml | grep "<loc>https://www.food.com/sitemap-" | sed -n 's:.*<loc>\(.*\)</loc>.*:\1:p');
do echo "Download: $url";
done
for url in $(cat sitemap.xml | grep "<loc>https://www.food.com/sitemap-" | sed -n 's:.*<loc>\(.*\)</loc>.*:\1:p');
do wget "$url";
done
@napsternxg
napsternxg / gen_clip_embeddings.py
Created April 20, 2023 12:12
Gen Text Embeddings
from pathlib import Path
import torch
from transformers import CLIPProcessor, CLIPTextModelWithProjection
from accelerate import Accelerator
from datasets import Dataset
import pandas as pd
import numpy as np
@napsternxg
napsternxg / merge_pdfs.py
Last active April 18, 2023 20:46
Merge PDFs
"""
pip install pypdf
"""
from pypdf import PdfWriter
def main(args):
merger = PdfWriter()
file_paths = args.input_files
for pdf in file_paths:
@napsternxg
napsternxg / accelerated_sentence_transformer.diff
Last active November 7, 2023 16:20
accelerate support for sentence_transformer
diff --git a/sentence_transformers/SentenceTransformer.py b/sentence_transformers/SentenceTransformer.py
index e44e573..ae4dea4 100644
--- a/sentence_transformers/SentenceTransformer.py
+++ b/sentence_transformers/SentenceTransformer.py
@@ -16,6 +16,7 @@ from torch.optim import Optimizer
from torch.utils.data import DataLoader
import torch.multiprocessing as mp
from tqdm.autonotebook import trange
+from tqdm.autonotebook import tqdm
import math
@napsternxg
napsternxg / spacy_transformer.py
Last active April 15, 2023 06:21
Space sklearn Transformer - Use spacy embeddings in Sklearn model pipelines
"""Spacy Embedding Transformer for Sklearn pipeline
Install spacy and floret
```bash
pip install spacy floret scikit-learn
```
First download the vectors from:
```bash