Skip to content

Instantly share code, notes, and snippets.

View adivekar-utexas's full-sized avatar
🦙

Abhishek Divekar adivekar-utexas

🦙
View GitHub Profile
@adivekar-utexas
adivekar-utexas / good-pdf-printing-css
Last active May 10, 2025 10:59
Good PDF Printing CSS Style
@media print {
/* ---------- wrap long code lines ---------- */
pre,
/* standalone code blocks */
code,
/* inline snippets <code>like this</code> */
pre code {
/* <pre><code> … </code></pre> combos */
white-space: pre-wrap !important;
@adivekar-utexas
adivekar-utexas / abhishek-divekar-thesis-experience.md
Last active May 2, 2025 04:10
Abhishek's experience doing a Thesis while working full-time as an ML Scientist at Amazon

(Reproduced from https://mscshub.com/)

Background: I work as an ML Scientist at FAANG. I started the UT Austin MSCSO program in Fall'2020, and this review was written in Dec'2023. This review is very specific, so a motivated person can probably find me. Let me save you the trouble, here is my LinkedIn: linkedin.com/in/ardivekar/ . Feel free to reach out with questions. I'm usually also on the MSCSO Slack.

I applied for UT Austin's MSCS Thesis in Dec'2022, and completed the Thesis over 3 semesters (Spring'23 to Spring'24). My thesis was in the NLP domain, during 2023 when AI went mainstream via ChatGPT, and new LLMs were released every single week. As you can expect, this makes my experience unique. But I expect everyone's thesis experience will be unique since it depends on your personal strengths and interests.

Another thing which I did different from other students is that I framed my own research problem. This is recommended only if you're really dead-set on one idea, like I was. Otherwise it's okay t

@adivekar-utexas
adivekar-utexas / tips-for-successful-referrals-adivekar.md
Last active May 9, 2025 09:40
Tips for a successful resume & referral at Amazon (Abhishek's guide)

I am always happy to provide referrals for folks applying to Amazon for a variety of roles. Amazon is a FAANG, so I know that landing a job there can make a big difference to someone's career, and I am happy to spend the time to provide a referral.

If you are interested, please read the points below very carefully before reaching out (I only accept requests via LinkedIn). This is to save time on both sides 😄 I will not entertain requests from those who have clearly not read these points, regarless of how great your accomplishments are. No exceptions.

I usually check my personal email around 9am EST / 8pm IST. If you reach out, please give me upto 48 hours to revert.

The tips:

  1. It is up to you to visit amazon.jobs and search for open positions.
@adivekar-utexas
adivekar-utexas / arxiv-preparation.md
Created July 3, 2024 11:44 — forked from xiaohk/arxiv-preparation.md
Prepare for an arXiv submission

Submission Steps

  1. Download source code from Overleaf if you use it: menu -> download -> source.

  2. Strip comments and combine all tex files (f01-main.tex, f02-intro.tex, etc.) into one file arxiv_main.tex.

# Replace f01-main.tex with the main tex file in your overleaf project
latexpand --empty-comments f01-main.tex > arxiv_main.tex
@adivekar-utexas
adivekar-utexas / gist:95758673d5014a9556a027a1712a80ca
Created February 12, 2024 05:24
How to ensure you kill a ThreadPoolExecutor or ProcessPoolExecutor in Python
from typing import *
import ctypes
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
def stop_executor(
executor: Optional[Union[ThreadPoolExecutor, ProcessPoolExecutor]],
force: bool = True, ## Forcefully terminate, might lead to work being lost.
):
if executor is not None:
if isinstance(executor, ThreadPoolExecutor):
"""A collection of utilities to augment the Python language:"""
from typing import *
import time, traceback, random, sys
import math, gc
from datetime import datetime
from math import inf
import numpy as np
from threading import Semaphore
import multiprocessing as mp
from concurrent.futures._base import Future
@adivekar-utexas
adivekar-utexas / realnews_download_split.py
Last active May 21, 2023 17:51
Download and split REALNEWS into multiple small parquet files
"""
REALNEWS is a big dataset of several million news articles obtained from Common Crawl.
It was used to train the Grover news generation language model.
Details here: https://arxiv.org/abs/1905.12616
In this script, we download it following instructions from https://github.com/rowanz/grover/tree/master/realnews
(please make sure to fill in the survey in the link above!)
After downloading, the file is a .tar.gz containing an enormous .jsonl file.
To split it into multiple small .parquet files, I've written the script below.