Skip to content

Instantly share code, notes, and snippets.

View Beomi's full-sized avatar

Junbum Lee Beomi

View GitHub Profile
A::B is a system with 4 tokens: `A#`, `#A`, `B#` and `#B`.
An A::B program is a sequence of tokens. Example:
B# A# #B #A B#
To *compute* a program, we must rewrite neighbor tokens, using the rules:
A# #A ... becomes ... nothing
A# #B ... becomes ... #B A#
@padeoe
padeoe / README_hfd.md
Last active August 2, 2025 11:33
CLI-Tool for download Huggingface models and datasets with aria2/wget: hfd

🤗Huggingface Model Downloader

Note

(2025-01-08) Add feature for 🏷️Tag(Revision) Selection, contributed by @Bamboo-D.
(2024-12-17) Add feature for ⚡Quick Startup and ⏭️Fast Resume, enabling skipping of downloaded files, while removing the git clone dependency to accelerate file list retrieval.

Considering the lack of multi-threaded download support in the official huggingface-cli, and the inadequate error handling in hf_transfer, This command-line tool leverages curl and aria2c for fast and robust downloading of models and datasets.

Features

  • ⏯️ Resume from breakpoint: You can re-run it or Ctrl+C anytime.
@datasciencemonkey
datasciencemonkey / gptq_lora.py
Created September 13, 2023 21:57
train a gptq model using peft/lora
# %%
# this is run from /notebooks on paperspace
from huggingface_hub import login
from dotenv import load_dotenv
load_dotenv("/notebooks/.env")
import os
os.environ["TOKENIZERS_PARALLELISM"]="false"
login(token=os.getenv("HUGGINGFACE_TOKEN"))
@onelharrison
onelharrison / pandas_write_to_s3_using_s3fs.py
Last active May 10, 2022 01:23
Demo script for writing a pandas data frame to a CSV file on S3 using s3fs-supported pandas APIs
"""
Demo script for writing a pandas data frame to a CSV file on S3 using s3fs-supported pandas APIs
"""
import os
import pandas as pd
AWS_S3_BUCKET = os.getenv("AWS_S3_BUCKET")
AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID")
@Beomi
Beomi / .vimrc
Last active August 7, 2023 02:08
기본적인 VIM용 세팅
set backspace=2
set autoindent
set smartindent
set expandtab
set et
set tabstop=4
set shiftwidth=4
set showmatch
set ignorecase
set incsearch
@aditya-malte
aditya-malte / smallberta_pretraining.ipynb
Created February 22, 2020 13:41
smallBERTa_Pretraining.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@Beomi
Beomi / pandas_multicolum_index_plot.py
Last active August 9, 2019 17:50
Pandas Multi Column 그리기
def plot_multi(data, cols=None, spacing=.1, **kwargs):
from pandas import plotting
# Get default color style from pandas - can be changed to any other color list
if cols is None: cols = data.columns
if len(cols) == 0: return
# colors = getattr(getattr(plotting, '_style'), '_get_standard_colors')(num_colors=len(cols))
# colors = [rand_color.generate() for _ in range(len(cols))]
# colors = rand_color.generate(hue="blue", count=len(cols))
@haven-jeon
haven-jeon / naver_review_classifications_gluon_bert.ipynb
Last active February 25, 2023 08:36
BERT with Naver Sentiment Movie Corpus
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nat-n
nat-n / Recipe-bundling-fonts-with-headless-chrome.md
Last active February 28, 2024 19:23
How to build a fontconfig bundle for adding arbitrary fonts to headless chrome independent of the OS. This is specifically useful for deploying headless chrome to AWS lambda where it is necessary to include fonts for rendering CJK (Chinese, Japanese, Korean) characters into the deployed bundle.

Building fontconfig

Start up a lambda-like docker container:

docker run -i -t -v /tmp:/var/task lambci/lambda:build /bin/bash

Install some dependencies inside the container:

yum install gperf freetype-devel libxml2-devel git libtool -y

easy_install pip

# Sorry for the scrappiness of this. It's supposed to be scrappy.
import time
from botocore.exceptions import ClientError
import boto3
import boto3.session
# boto3.set_stream_logger('')
session = boto3.session.Session()
s3 = session.client('s3')