Skip to content

Instantly share code, notes, and snippets.

View hotchpotch's full-sized avatar

Yuichi Tateno (secon) hotchpotch

View GitHub Profile
@hotchpotch
hotchpotch / cross_encoder_to_onnx_pr.py
Created May 9, 2025 00:15
cross_encoder_to_onnx_pr.py
from sentence_transformers import CrossEncoder, export_dynamic_quantized_onnx_model, export_optimized_onnx_model
# モデル名の定義
MODEL_NAME = "hotchpotch/japanese-reranker-xsmall-v2"
# 基本モデルの読み込み(CPUを使用、ONNXバックエンド)
model = CrossEncoder(MODEL_NAME, device="cpu", backend="onnx")
# 1. 基本モデル (model.onnx)
# Hubにプッシュして、必要に応じてPRを作成
@hotchpotch
hotchpotch / query-crafter-japanese-example.py
Created May 4, 2025 02:22
query-crafter-japanese-example.py
"""
query-crafter-japanese のサンプルコード。
実際に大量に処理するときは、vllm などを利用することで、高速処理が可能
"""
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "hotchpotch/query-crafter-japanese-Qwen3-1.7B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
@hotchpotch
hotchpotch / xlm_roberta_embeddings_convert.rb
Last active March 2, 2025 03:55
transformer modelのembeddings をいい感じに小さくする
from transformers import AutoModel, AutoTokenizer
import torch
from tqdm import tqdm
import numpy as np
def adapt_model_to_new_tokenizer(model_name, new_tokenizer_name):
# 元のモデルとトークナイザーをロード
original_model = AutoModel.from_pretrained(model_name)
original_tokenizer = AutoTokenizer.from_pretrained(model_name)
@hotchpotch
hotchpotch / spm_train_jp_tokenizer_xlm_roberta.py
Created July 7, 2024 05:13
XLMRobertaTokenizer を日本語で学習させて動かす
# %%
from datasets import load_dataset
dataset = load_dataset("hpprc/jawiki-paragraphs", split="train")
# %%
len(dataset)
# %%
# head N
@hotchpotch
hotchpotch / onnx_to_fp16.py
Created April 9, 2024 00:40
ONNX model to float16 precision
"""
This script converts an ONNX model to float16 precision using the onnxruntime transformers package.
It takes an input ONNX model file as a mandatory argument. The output file name is optional; if not provided,
the script generates the output file name by appending "_fp16" to the base name of the input file.
"""
import argparse
import onnx
from onnxruntime.transformers.float16 import convert_float_to_float16
import os
@hotchpotch
hotchpotch / bench_gpu_sift1m_ivf_hnsw.py
Created November 18, 2023 08:44
IVS, HNSW, PQ benchmark
# base: https://github.com/facebookresearch/faiss/blob/main/benchs/bench_gpu_sift1m.py
# base code License: MIT License
#
# Copyright (c) Facebook, Inc. and its affiliates.
#
# This source code is licensed under the MIT license found in the
# LICENSE file in the root directory of this source tree.
import os
import time
#!ruby
require "tmpdir"
require "pathname"
require "base64"
target_nb = ARGF.read
cmd = %w(jupyter nbconvert --to markdown)
png_optimizer_cmd = ["optipng", "-quiet"] # or nil
@hotchpotch
hotchpotch / results.md
Last active November 12, 2020 10:23
E-M1 mark III get_commandlist.cgi results

http://192.168.0.10/get_commandlist.cgi

<oishare>
<version>4.40</version>
<oitrackversion>3.10</oitrackversion>
<support func="web"/>
<support func="remote"/>
<support func="gps"/>
<support func="release"/>
@hotchpotch
hotchpotch / user_profile.ps1
Last active December 21, 2022 06:28
PowerShell Profile
# PS Modules
Import-Module posh-git
Import-Module oh-my-posh
Import-Module ZLocation
Set-Theme robbyrussell # お好きなテーマ
Set-PSReadLineOption -EditMode Emacs
@hotchpotch
hotchpotch / settings.json
Created August 2, 2020 03:37
windows terminal settings.json
// For documentation on these settings, see: https://aka.ms/terminal-documentation
{
"$schema": "https://aka.ms/terminal-profiles-schema",
"defaultProfile": "{574e775e-4f2a-5b96-ac1e-a2962a402336}",
"copyOnSelect": true,
"copyFormatting": false,
"profiles": {
"defaults": {
"closeOnExit": "always",
"colorScheme": "One Half Dark",