Skip to content

Instantly share code, notes, and snippets.

View candlewill's full-sized avatar

Yunchao He candlewill

  • Beijing, China
View GitHub Profile

深度学习于语音合成研究综述

本文综述近年来深度学习用于语音合成的一些方法。

WaveNet

在自回归生成模型在图像和文本领域广泛应用的时候,WaveNet [4] 尝试将这些思想应用于语音领域。仿照PixelRNN (van den Oord et al., 2016)图像生成的做法, WaveNet依据之前采样点来生成下一个采样点。生成下一个采样点的模型为CNN结构。为了生成指定说话人的声音,以及生成指定文本的声音,引入了全局条件和局部条件,来控制合成内容。为了扩大感受野,带洞卷积,使filter的按照指数扩张。

WaveNet存在的问题是,1) 每次预测一个采样点,速度太慢;2)如果用于TTS,那初始采样点选择将会很重要;3)以及需要文本前端的支持,前端分析出错,将直接影响合成效果。

TensorFlow Serving in 10 minutes!

TensorFlow SERVING is Googles' recommended way to deploy TensorFlow models. Without proper computer engineering background, it can be quite intimidating, even for people who feel comfortable with TensorFlow itself. Few things that I've found particularly hard were:

  • Tutorial examples have C++ code (which I don't know)
  • Tutorials have Kubernetes, gRPG, Bezel (some of which I saw for the first time)
  • It needs to be compiled. That process takes forever!

After all, it worked just fine. Here I present an easiest possible way to deploy your models with TensorFlow Serving. You will have your self-built model running inside TF-Serving by the end of this tutorial. It will be scalable, and you will be able to query it via REST.

@candlewill
candlewill / waya-dl-setup.sh
Last active December 20, 2017 08:25 — forked from mjdietzx/waya-dl-setup.sh
Install CUDA Toolkit v8.0 and cuDNN v6.0 on Ubuntu 16.04
#!/bin/bash
# install CUDA Toolkit v8.0
# instructions from https://developer.nvidia.com/cuda-downloads (linux -> x86_64 -> Ubuntu -> 16.04 -> deb (network))
CUDA_REPO_PKG="cuda-repo-ubuntu1604_8.0.61-1_amd64.deb"
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/${CUDA_REPO_PKG}
sudo dpkg -i ${CUDA_REPO_PKG}
sudo apt-get update
sudo apt-get -y install cuda
@candlewill
candlewill / TensorFlow Severing.md
Last active April 9, 2018 06:11
TensorFlow Severing

TensorFlow Severing

本文讲解如何使用TensorFlow Severing落地一个训好的模型。

安装

Bazel (可选,编译源代码才用)

# 从https://github.com/bazelbuild/bazel/releases下载bazel安装包
cd ~/Downloads
chmod +x bazel-0.4.5-installer-linux-x86_64.sh
./bazel-0.4.5-installer-linux-x86_64.sh --user
@candlewill
candlewill / nohup-output-to-file.sh
Created August 29, 2017 13:48 — forked from umidjons/nohup-output-to-file.sh
Redirect nohup output to a file
# redirect output and errors into file output.log:
nohup some_command > output.log 2>&1&
# abbreviated syntax for bash version >= ver.4:
nohup some_command &> output.log
@candlewill
candlewill / 1_Ossian初探.md
Last active July 12, 2021 05:05
Chinese TTS based on Ossian

Ossian初探

本文先完全按照官方教程跑通一个合成流程,然后尝试在中文上进行合成。

安装

虽然官方提供了一键安装方法:./scripts/setup_tools.sh $HTK_USERNAME $HTK_PASSWORD,但在我们的尝试中,未能成功。

以下是Debug过程

直接运行出现的错误为:

@candlewill
candlewill / 多说话人合成论文.md
Created August 14, 2017 10:09
Multi-speaker TTS papers

多说话人合成论文集

百度

  • Deep voice 2
  • Deep voice 1

Facebook

  • in the wild

日本

Merlin for Chinese

用于中文语音合成的Merlin。本文,主要利用Merlin,合成中文语音。

数据准备

为了测试方法是否可行,我们仅使用100条数据。待确认可行,再使用完整数据。

由于缺少中文前端,我们仅使用音素。

@candlewill
candlewill / extract_features_for_merlin.md
Last active November 2, 2022 08:34
Analysis the source code of merlin

声学特征提取

本文介绍如何提取提取声学特征用于Merlin训练。在语音合成中,属于声码器(vocoder)的内容。

Merlin可以使用两种vocoder,STRAIGHTWORLDWORLD的目标是提取60-dim MGC, variable-dim BAP (BAP dim: 1 for 16Khz, 5 for 48Khz), 1-dim LF0;STRAIGHT的目标是提取60-dim MGC, 25-dim BAP, 1-dim LF0。

新版本的WORLD_v2还在开发中,目标是提取60-dim MGC, 5-dim BAP, 1-dim LF0(MGC和BAP的维度支持微调)。

由于STRAIGHT的使用有严格的证书限制,本文,主要介绍WORLD

@candlewill
candlewill / keras_models.md
Last active December 9, 2022 02:43
A collection of Various Keras Models Examples

Keras Models Examples

一系列常用模型的Keras实现

DNN

Multilayer Perceptron (MLP) for multi-class softmax classification

from keras.models import Sequential