Last active
June 10, 2024 08:24
-
-
Save nickovchinnikov/84894395ee9c9387d53cc80e6c4e253d to your computer and use it in GitHub Desktop.
FastPitch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Let $x = (x_1, \\dots, x_n)$ be the sequence of input lexical units, and $y = (y_1, \\dots, y_t)$ be the sequence of target mel-scale spectrogram frames. The first FFTr stack produces the hidden representation $h = \\text{FFTr}(x)$. The hidden representation $h$ is used to make predictions about the duration and average pitch of every character with a 1-D CNN\n", | |
"\n", | |
"$$d = \\text{DurationPredictor}(h), \\hat{p} = \\text{PitchPredictor}(h)$$\n", | |
"\n", | |
"where $\\hat{d} \\in \\mathbb{N}^n$ and $\\hat{p} \\in \\mathbb{R}^n$. Next, the pitch is projected to match the dimensionality of the hidden representation $h \\in \\mathbb{R}^{n×d}$ and added to $h$. The resulting sum $g$ is discretely up-sampled and passed to the output FFTr, which produces the output mel-spectrogram sequence:\n", | |
"\n", | |
"$g = h + \\text{PitchEmbedding}(p)$\n", | |
"$\\hat{y} = \\text{FFTr}([\\underbrace{g_1, \\dots, g_1}_{d_1}, \\dots \\underbrace{g_n, \\dots, g_n}_{d_n} ])$\n", | |
"\n", | |
"Ground truth $p$ and $d$ are used during training, and predicted $\\hat{p}$ and $\\hat{d}$ are used during inference. The model optimizes mean-squared error (MSE) between the predicted and ground-truth modalities\n", | |
"$\\mathcal{L} = ||\\hat{y} − y||^2_2 + α|| \\hat{p} − p ||^2_2 + γ||\\hat{d} − d||^2_2$" | |
] | |
} | |
], | |
"metadata": { | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment