Skip to content

Instantly share code, notes, and snippets.

View adivekar-utexas's full-sized avatar
🦙

Abhishek Divekar adivekar-utexas

🦙
View GitHub Profile
@adivekar-utexas
adivekar-utexas / realnews_download_split.py
Last active May 21, 2023 17:51
Download and split REALNEWS into multiple small parquet files
"""
REALNEWS is a big dataset of several million news articles obtained from Common Crawl.
It was used to train the Grover news generation language model.
Details here: https://arxiv.org/abs/1905.12616
In this script, we download it following instructions from https://github.com/rowanz/grover/tree/master/realnews
(please make sure to fill in the survey in the link above!)
After downloading, the file is a .tar.gz containing an enormous .jsonl file.
To split it into multiple small .parquet files, I've written the script below.
"""A collection of utilities to augment the Python language:"""
from typing import *
import time, traceback, random, sys
import math, gc
from datetime import datetime
from math import inf
import numpy as np
from threading import Semaphore
import multiprocessing as mp
from concurrent.futures._base import Future
@adivekar-utexas
adivekar-utexas / gist:95758673d5014a9556a027a1712a80ca
Created February 12, 2024 05:24
How to ensure you kill a ThreadPoolExecutor or ProcessPoolExecutor in Python
from typing import *
import ctypes
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def stop_executor(
executor: Optional[Union[ThreadPoolExecutor, ProcessPoolExecutor]],
force: bool = True, ## Forcefully terminate, might lead to work being lost.
):
if executor is not None:
if isinstance(executor, ThreadPoolExecutor):
@adivekar-utexas
adivekar-utexas / arxiv-preparation.md
Created July 3, 2024 11:44 — forked from xiaohk/arxiv-preparation.md
Prepare for an arXiv submission

Submission Steps

  1. Download source code from Overleaf if you use it: menu -> download -> source.

  2. Strip comments and combine all tex files (f01-main.tex, f02-intro.tex, etc.) into one file arxiv_main.tex.

# Replace f01-main.tex with the main tex file in your overleaf project
latexpand --empty-comments f01-main.tex > arxiv_main.tex
@adivekar-utexas
adivekar-utexas / tips-for-successful-referrals-adivekar.md
Last active October 27, 2024 18:10
Tips for a successful referral at Amazon (Abhishek's guide)

I am always happy to provide referrals for folks applying to Amazon for a variety of roles. Amazon is a FAANG, so I know that landing a job there can make a big difference to someone's career, and I am happy to spend the time to provide a referral.

If you are interested, please read the points below very carefully before reaching out (either on LinkedIn or on [email protected]). This is to save time on both sides 😄 I will not entertain requests from those who have clearly not read these points, regarless of how great your profile is.

  1. It is up to you to visit amazon.jobs and search for open positions.

  2. ‼️ Once you find a job, DO NOT apply for the job yourself; if you do so, the portal does not allow me to refer you for the same job‼️. I will not go forward with a candidate who has done this.

  3. I can manage upto 5-6 referrals per person, for at most 2 different roles (e.g. Software Engineer and Data Scientist). Beyond this does not make sense. Pick jobs wh