curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git lfs install
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import glob | |
import json | |
import multiprocessing | |
from tqdm import tqdm | |
from transformers import AutoTokenizer | |
model_id = "tokenzer_model" | |
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Given list of filenames | |
file_names = ['chunk_1.jsonl', 'chunk_2.jsonl'] | |
# Function to convert file names to the required format | |
def convert_filenames(filenames): | |
total_files = len(filenames) | |
new_file_names = [] | |
for i, filename in enumerate(filenames, start=1): | |
# Extract the base name without extension and the chunk number |
A lot of GitHub projects need to have pretty math formulas in READMEs, wikis or other markdown pages. The desired approach would be to just write inline LaTeX-style formulas like this:
$e^{i \pi} = -1$
Unfortunately, GitHub does not support inline formulas. The issue is tracked here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib.pyplot as plt | |
from matplotlib import font_manager | |
import seaborn as sns | |
font_path = './font/IPAGothic_24302.ttf' # Your font path goes here, as example using jp fonts | |
font_manager.fontManager.addfont(font_path) | |
prop = font_manager.FontProperties(fname=font_path) | |
plt.rcParams['font.family'] = 'sans-serif' |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Script for downloading all GLUE data. | |
Modified by: Sagor Sarker | |
Dependency: | |
pip install wget | |
pip install wasabi | |
""" | |
import os |
NewerOlder