Skip to content

Instantly share code, notes, and snippets.

@laubonghaudoi
Last active November 20, 2024 01:49
Show Gist options
  • Save laubonghaudoi/da84f1c47ff6ca25e618febf4249ff46 to your computer and use it in GitHub Desktop.
Save laubonghaudoi/da84f1c47ff6ca25e618febf4249ff46 to your computer and use it in GitHub Desktop.
How to fine-tune SoVITS Cantonese 點樣微調張悦楷 TTS

點樣微調 SOVITS 嚟整張悦楷 TTS

https://github.com/RVC-Boss/GPT-SoVITS

1 配置環境

  1. https://www.runpod.io/console/pods 開一個新嘅 pod,我用嘅係 RTX 4000 Ada,顯存 20GB。注意 disk volume 最好至少 80 GB,模板就用 PyTorch 2.1 就得。

  2. 連上個 Pod 之後按照呢度 https://docs.anaconda.com/miniconda/install/#quick-command-line-install 安裝 miniconda

    mkdir -p ~/minicond~a3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm ~/miniconda3/miniconda.sh
    ~/miniconda3/bin/conda init bash
    ~/miniconda3/bin/conda init zsh
  3. 跟住新開一個終端跑下面嘅命令嚟配置環境

    apt update
    apt install git-lfs
    
    # 複製 repo 安裝環境依賴
    # git clone https://github.com/RVC-Boss/GPT-SoVITS
    git clone https://github.com/hon9kon9ize/GPT-SoVITS-Cantonese
    cd GPT-SoVITS
    git lfs install
    git lfs pull
    
    # 安裝環境依賴
    conda create -n GPTSoVits python=3.9
    conda activate GPTSoVits
    bash install.sh
    ```
    

然後進入下一步,準備預訓練模型。

2 下載預訓練模型

到到呢步,你會有個GPT-SoVITS/GPT_SoVITS/pretrained_models, 不過入面係空嘅。你可以刪咗佢先,然後用下面嘅命令嚟將 https://huggingface.co/lj1995/GPT-SoVITS 入面嘅預訓練模型下載落去:

cd GPT_SoVITS/pretrained_models
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s1bert25hz-2kh-longer-epoch%3D68e-step%3D50232.ckpt
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s2D488k.pth
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/s2G488k.pth

mkdir chinese-hubert-base
mkdir chinese-roberta-wwm-ext-large
mkdir gsv-v2final-pretrained

cd chinese-hubert-base
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/config.json
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/preprocessor_config.json
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-hubert-base/pytorch_model.bin

cd chinese-roberta-wwm-ext-large
# wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/config.json
# wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/pytorch_model.bin
# wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/chinese-roberta-wwm-ext-large/tokenizer.json
wget https://huggingface.co/hon9kon9ize/bert-large-cantonese/resolve/main/config.json
wget https://huggingface.co/hon9kon9ize/bert-large-cantonese/resolve/main/pytorch_model.bin
wget https://huggingface.co/hon9kon9ize/bert-large-cantonese/resolve/main/tokenizer.json


cd gsv-v2final-pretrained
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/gsv-v2final-pretrained/s1bert25hz-5kh-longer-epoch%3D12-step%3D369668.ckpt
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/gsv-v2final-pretrained/s2D2333k.pth
wget https://huggingface.co/lj1995/GPT-SoVITS/resolve/main/gsv-v2final-pretrained/s2G2333k.pth

跟住跑去webui.py 同埋GPT_SoVITS/inference_webui.py嗰度,將最下面嘅lauch()嗰度改成share=True,噉樣方便等陣直接喺自己瀏覽器度操作。然後進入下一步準備訓練數據。

3 準備訓練數據

我哋呢度用嘅係張悦楷語音數據集

  1. 首先跑下面嘅命令拉取數據:

    cd /workspace
    
    # 複製個數據集落嚟
    git clone --filter=blob:none --sparse https://huggingface.co/datasets/laubonghaudoi/zoengjyutgaai_saamgwokjinji
    cd zoengjyutgaai_saamgwokjinji
    git sparse-checkout init --cone
    git sparse-checkout set wav
    git checkout
    # 將啲 wav 搬過去
    mv wav/ ../GPT-SoVITS
  2. 跟住返去GPT-SoVITS,將個wav/文件夾重命名成一個數據集名,例如zoengjyutgaai/

  3. 然後將入面嘅metadata.csv改成下面呢個格式嘅metadata.list文件:

    /workspace/GPT-SoVITS/zoengjyutgaai/001/001_001.wav|zoengjyutgaai|yue|各位朋友,喺講《三國演義》之前啊,我唸一首詞畀大家聽下吓。
    /workspace/GPT-SoVITS/zoengjyutgaai/001/001_002.wav|zoengjyutgaai|yue|滾滾長江東逝水,浪花淘盡英雄。
    ...
    

    注意係冇表頭嘅,直接第一行就擺數據。

呢個時候啲數據就準備好嘞,可以下一步開始訓練。

4 開始訓練

cd /workspace/GPT-SoVITS
python3 webui.py

跟住打開個 gradio 頁面,直接去 1-GPT-SOVITS-TTS 嗰個 tab 度,可以見到下面有三個子 tab。

  1. 首先寫低個 Experiment/model name 譬如exp1,跟住喺 Text labelling file 嗰度寫zoengjyutgaai/metadata.list,跟住 Audio dataset folder 嗰度直接留空唔寫。
  2. 然後直接撳最下面嘅Start one-click formatting,佢就會開始預處理數據。注意個第二步會花比較長時間,要慢慢等。你亦都可以分開三粒掣撳
  3. 跟住就去1B-Fine-tuned training個標籤頁度,揀好 batch size、epoch、text model learning rate weighting、save frequency 幾個參數,然後撳掣開始訓練 SoVITS。作為參考,我用 RTX 4000 Ada 嘅 20GB 顯存可以開到 batch size=16。一般 epcoh 都唔會超過 20。
  4. 訓練完 SoVITS 之後,就繼續揀好下面嘅參數訓練 GPT。
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment