Skip to content

Instantly share code, notes, and snippets.

@vifly
Last active April 7, 2025 14:09
Show Gist options
  • Save vifly/078645da573cbe6bd68f11d0545b9984 to your computer and use it in GitHub Desktop.
Save vifly/078645da573cbe6bd68f11d0545b9984 to your computer and use it in GitHub Desktop.

构建

安装依赖(Arch):

sudo pacman -S blas64-openblas blas-openblas

拉取源码及其子模块:

git clone https://github.com/mozilla/bergamot-translator.git

cd bergamot-translator
git submodule update --init --recursive

在最新版本的 Arch 上构建会报错,使用我在这里附带的 fix-build.patch 来解决。

git apply fix-build.patch

mkdir -p build
cd build
cmake ../ -DUSE_WASM_COMPATIBLE_SOURCES=off -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

配置使用

这里以英文翻译为简中作为例子,需要不同的语言翻译对请根据下面的说明自行下载模型和修改配置文件。

https://github.com/mozilla/firefox-translations-models/tree/main/models/prod/enzh 下载所有文件,并且全部解压后放到一个文件夹下,这里假设该文件夹路径为 /data/firefox-translations-models。

可以从 https://github.com/mozilla/firefox-translations-models/blob/main/evals/translators/bergamot.config.yml 获取示例配置文件,参考这里来修改,可以照抄如下的内容(记得检查模型文件路径):

bergamot-mode: wasm
models:
  - /data/firefox-translations-models/model.enzh.intgemm.alphas.bin
vocabs:
  - /data/firefox-translations-models/srcvocab.enzh.spm
  - /data/firefox-translations-models/trgvocab.enzh.spm
# TODO: enable back when the issues with Chinese shortlist and document level shortlisting on inference are fixed
#shortlist:
#    - /data/firefox-translations-models/lex.50.50.enzh.s2t.bin
#    - false
beam-size: 1
normalize: 1.0
word-penalty: 0
max-length-break: 128
mini-batch-words: 1024
workspace: 128
max-length-factor: 2.0
skip-cost: true
cpu-threads: 0
quiet: false
quiet-translation: false
gemm-precision: int8shiftAlphaAll
alignment: soft

使用示例(假设你在上一步的配置文件路径为 /data/firefox-translations-models/bergamot.config.yml):

cd bergamot-translator/build/app

echo "hello world" | ./bergamot --model-config-paths /data/firefox-translations-models/bergamot.config.yml
Submodule 3rd_party/marian-dev contains modified content
diff --git a/3rd_party/marian-dev/src/3rd_party/faiss/Index.h b/3rd_party/marian-dev/src/3rd_party/faiss/Index.h
index deaabcaa..6281ff9a 100644
--- a/3rd_party/marian-dev/src/3rd_party/faiss/Index.h
+++ b/3rd_party/marian-dev/src/3rd_party/faiss/Index.h
@@ -12,6 +12,7 @@
#include "utils/misc.h"
#include <cstdio>
+#include <cstdint>
#include <typeinfo>
#include <string>
#include <sstream>
Submodule src/3rd_party/sentencepiece contains modified content
diff --git a/3rd_party/marian-dev/src/3rd_party/sentencepiece/CMakeLists.txt b/3rd_party/marian-dev/src/3rd_party/sentencepiece/CMakeLists.txt
index 0353f85..3a6f585 100644
--- a/3rd_party/marian-dev/src/3rd_party/sentencepiece/CMakeLists.txt
+++ b/3rd_party/marian-dev/src/3rd_party/sentencepiece/CMakeLists.txt
@@ -12,7 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.!
-cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
+cmake_minimum_required(VERSION 3.5 FATAL_ERROR)
file(STRINGS "VERSION.txt" SPM_VERSION)
message(STATUS "VERSION: ${SPM_VERSION}")
project(sentencepiece VERSION ${SPM_VERSION} LANGUAGES C CXX)
diff --git a/CMakeLists.txt b/CMakeLists.txt
index dc51acf..6724673 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,5 @@
cmake_minimum_required(VERSION 3.5.1)
+add_compile_options(-Wno-error=template-id-cdtor)
set(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)
if (POLICY CMP0074)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment