Voice: serena Device: Apple Silicon (MLX) Date: 2025-02-28
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Benchmark TriAttention on MATH 500 — matching the paper's evaluation protocol. | |
| Paper settings: max_tokens=32768, temp=0.6, top_p=0.95, budget=512/1024/2048 | |
| We use max_tokens=4096 for practical runtime on Apple Silicon. | |
| USAGE | |
| python bench_triattention_math.py \ | |
| --model /tmp/gemma-4-26b-a4b-it-5bit \ | |
| --calib /tmp/gemma4_26b_5bit_calib.safetensors \ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """ | |
| Benchmark TurboQuant (TBQ) vs baseline on MM-NIAH (Multimodal Needle-in-a-Haystack). | |
| INSTALL | |
| pip install -U mlx-vlm | |
| # or | |
| uv pip install -U mlx-vlm | |
| SETUP — Extract images (one-time) | |
| huggingface-cli download OpenGVLab/MM-NIAH mm_niah_val/images.tar.gz --repo-type dataset |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| """Benchmark TurboQuant vs baseline on LongBench-v2. | |
| Usage: | |
| python scripts/bench_longbench_v2.py --model google/gemma-4-e4b-it --num-samples 10 --max-tokens-ctx 260000 | |
| python scripts/bench_longbench_v2.py --model google/gemma-4-26b-a4b-it --num-samples 5 --max-tokens-ctx 128000 --kv-bits 4 | |
| """ | |
| import argparse | |
| import importlib | |
| import time |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| Benchmark for Qwen3-TTS: measures TTFB, inter-chunk latency, and throughput. | |
| Usage: | |
| # Sequential only (short/medium/long) | |
| python qwen3_tts_benchmark.py --model mlx-community/Qwen3-TTS-12Hz-0.6B-Base-bf16 | |
| # Sequential + batched (1,2,3,4,8) | |
| python qwen3_tts_benchmark.py --batch-size 1 2 3 4 8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| """ | |
| Benchmark for Qwen3-TTS: measures TTFB, inter-chunk latency, and throughput. | |
| Usage: | |
| python benchmarks/qwen3_tts_benchmark.py | |
| python benchmarks/qwen3_tts_benchmark.py --model mlx-community/Qwen3-TTS-0.6B-bf16 | |
| python benchmarks/qwen3_tts_benchmark.py --num-trials 3 --streaming-interval 1.0 | |
| python benchmarks/qwen3_tts_benchmark.py --prompts short medium long | |
| """ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| // | |
| // BaseConfiguration.swift | |
| // mlx-test | |
| // | |
| // Created by Prince Canuma on 29/12/25. | |
| // | |
| import Foundation | |
| import MLX |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| mlx_audio.tts.generate \ | |
| --model mlx-community/chatterbox-turbo-fp16 \ | |
| --text 'Abstract | |
| The dominant sequence transduction models are based on complex recurrent or | |
| convolutional neural networks that include an encoder and a decoder. The best | |
| performing models also connect the encoder and decoder through an attention | |
| mechanism. We propose a new simple network architecture, the Transformer, | |
| based solely on attention mechanisms, dispensing with recurrence and convolutions | |
| entirely. Experiments on two machine translation tasks show these models to | |
| be superior in quality while being more parallelizable and requiring significantly |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/usr/bin/env python3 | |
| # Copyright (c) 2025 Resemble AI | |
| # MIT License | |
| # Weight conversion script: PyTorch -> MLX | |
| """ | |
| Converts Chatterbox Turbo weights from PyTorch to MLX format. | |
| Usage: | |
| python convert_weights.py --output model.safetensors |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import json | |
| from functools import partial | |
| from json import JSONDecodeError | |
| from typing import List | |
| from transformers import AutoTokenizer | |
| import tokenizers | |
| REPLACEMENT_CHAR = "\ufffd" |
NewerOlder