Hui Zhang zh794390558

Image-to-Image Translation with Conditional Adversarial Networks

Notes from arXiv:1611.07004v1 [cs.CV] 21 Nov 2016

Euclidean distance between predicted and ground truth pixels is not a good method of judging similarity because it yields blurry images.
GANs learn a loss function rather than using an existing one.
GANs learn a loss that tries to classify if the output image is real or fake, while simultaneously training a generative model to minimize this loss.
Conditional GANs (cGANs) learn a mapping from observed image x and random noise vector z to y: y = f(x, z)
The generator G is trained to produce outputs that cannot be distinguished from "real" images by an adversarially trained discrimintor, D which is trained to do as well as possible at detecting the generator's "fakes".
The discriminator D, learns to classify between real and synthesized pairs. The generator learns to fool the discriminator.
Unlike an unconditional GAN, both th

Creating Input Frames from a Video

First create an image sequence from a video with:

ffmpeg -i path/to/video.mp4 -r 30 path/to/output/folder/%06d.png

Where -r specifies the frequency to save an image (in Hz, i.e. 30 == 30fps) and %06d.png creates a zero-padded filename with 6 zeros.

Next images must be scaled and cropped. For my original case, I need to generate new images based on 512x512 input images, so I will crop a greedy (720x720) square out of a 1280x720 video in the direct center and then scale to 512x512. mogrify, unlike convert, edits images in place without creating copies :)

声学特征提取

本文介绍如何提取提取声学特征用于Merlin训练。在语音合成中，属于声码器(vocoder)的内容。

Merlin可以使用两种vocoder，STRAIGHT或WORLD。WORLD的目标是提取60-dim MGC, variable-dim BAP (BAP dim: 1 for 16Khz, 5 for 48Khz), 1-dim LF0；STRAIGHT的目标是提取60-dim MGC, 25-dim BAP, 1-dim LF0。

新版本的WORLD_v2还在开发中，目标是提取60-dim MGC, 5-dim BAP, 1-dim LF0(MGC和BAP的维度支持微调)。

由于STRAIGHT的使用有严格的证书限制，本文，主要介绍WORLD。

Merlin for Chinese

用于中文语音合成的Merlin。本文，主要利用Merlin，合成中文语音。

数据准备

为了测试方法是否可行，我们仅使用100条数据。待确认可行，再使用完整数据。

由于缺少中文前端，我们仅使用音素。

	BasedOnStyle: Webkit
	BreakBeforeBraces: Allman
	BreakConstructorInitializersBeforeComma: false
	ConstructorInitializerAllOnOneLineOrOnePerLine: true
	Cpp11BracedListStyle: true
	IndentCaseLabels: true
	MaxEmptyLinesToKeep: 2
	PointerBindsToType: false
	SpacesBeforeTrailingComments: 2
	Standard: Cpp11

	#coding=utf8
	import os

	import itchat
	from NetEaseMusicApi import interact_select_song
	# 第三方包通过该命令安装：pip install itchat, NetEaseMusicApi

	HELP_MSG = u'''\
	欢迎使用微信网易云音乐
	帮助：显示帮助

	import datetime as dt

	import tensorflow as tf
	import tensorflow.contrib.slim as slim
	from tensorflow.contrib.slim.nets import resnet_v1
	import threading

	from PoseDataset import PoseDataset
	from TrainParams import TrainParams

	// A simple quickref for Eigen. Add anything that's missing.
	// Main author: Keir Mierle

	#include <Eigen/Dense>

	Matrix<double, 3, 3> A; // Fixed rows and cols. Same as Matrix3d.
	Matrix<double, 3, Dynamic> B; // Fixed rows, dynamic cols.
	Matrix<double, Dynamic, Dynamic> C; // Full dynamic. Same as MatrixXd.
	Matrix<double, 3, 3, RowMajor> E; // Row major; default is column-major.
	Matrix3f P, Q, R; // 3x3 float matrix.

	development:
	adapter: mysql2
	encoding: utf8
	database: my_database
	username: root
	password:
	apt:
	- somepackage
	- anotherpackage

	# based on https://github.com/google/seq2seq/blob/master/bin/tools/generate_beam_viz.py

	# extracts probabilities and sequences from .npz file generated during beam search.
	# and pickles a list of the length n_samples that has beam_width most probable tuples
	# (path, logprob, prob)
	# where probs are scaled to 1.

	import numpy as np
	import networkx as nx
	import pickle