Skip to content

Instantly share code, notes, and snippets.

View zh794390558's full-sized avatar

Hui Zhang zh794390558

  • Baidu
  • Beijing
View GitHub Profile
@zh794390558
zh794390558 / .clang-format
Created August 23, 2016 11:00 — forked from kristopherjohnson/.clang-format
Script that runs clang-format on files in a set of directories
BasedOnStyle: Webkit
BreakBeforeBraces: Allman
BreakConstructorInitializersBeforeComma: false
ConstructorInitializerAllOnOneLineOrOnePerLine: true
Cpp11BracedListStyle: true
IndentCaseLabels: true
MaxEmptyLinesToKeep: 2
PointerBindsToType: false
SpacesBeforeTrailingComments: 2
Standard: Cpp11
@littlecodersh
littlecodersh / PCMusicViaWechat.py
Created September 28, 2016 02:12
Demo of controlling music player through wechat.
#coding=utf8
import os
import itchat
from NetEaseMusicApi import interact_select_song
# 第三方包通过该命令安装:pip install itchat, NetEaseMusicApi
HELP_MSG = u'''\
欢迎使用微信网易云音乐
帮助: 显示帮助
@eldar
eldar / tf-resnet-fcn.py
Last active September 11, 2017 06:20
import datetime as dt
import tensorflow as tf
import tensorflow.contrib.slim as slim
from tensorflow.contrib.slim.nets import resnet_v1
import threading
from PoseDataset import PoseDataset
from TrainParams import TrainParams
@brannondorsey
brannondorsey / pix2pix_paper_notes.md
Last active January 3, 2022 09:57
Notes on the Pix2Pix (pixel-level image-to-image translation) Arxiv paper

Image-to-Image Translation with Conditional Adversarial Networks

Notes from arXiv:1611.07004v1 [cs.CV] 21 Nov 2016

  • Euclidean distance between predicted and ground truth pixels is not a good method of judging similarity because it yields blurry images.
  • GANs learn a loss function rather than using an existing one.
  • GANs learn a loss that tries to classify if the output image is real or fake, while simultaneously training a generative model to minimize this loss.
  • Conditional GANs (cGANs) learn a mapping from observed image x and random noise vector z to y: y = f(x, z)
  • The generator G is trained to produce outputs that cannot be distinguished from "real" images by an adversarially trained discrimintor, D which is trained to do as well as possible at detecting the generator's "fakes".
  • The discriminator D, learns to classify between real and synthesized pairs. The generator learns to fool the discriminator.
  • Unlike an unconditional GAN, both th

Creating Input Frames from a Video

First create an image sequence from a video with:

ffmpeg -i path/to/video.mp4 -r 30 path/to/output/folder/%06d.png

Where -r specifies the frequency to save an image (in Hz, i.e. 30 == 30fps) and %06d.png creates a zero-padded filename with 6 zeros.

Next images must be scaled and cropped. For my original case, I need to generate new images based on 512x512 input images, so I will crop a greedy (720x720) square out of a 1280x720 video in the direct center and then scale to 512x512. mogrify, unlike convert, edits images in place without creating copies :)

@gocarlos
gocarlos / Eigen Cheat sheet
Last active January 19, 2025 21:34
Cheat sheet for the linear algebra library Eigen: http://eigen.tuxfamily.org/
// A simple quickref for Eigen. Add anything that's missing.
// Main author: Keir Mierle
#include <Eigen/Dense>
Matrix<double, 3, 3> A; // Fixed rows and cols. Same as Matrix3d.
Matrix<double, 3, Dynamic> B; // Fixed rows, dynamic cols.
Matrix<double, Dynamic, Dynamic> C; // Full dynamic. Same as MatrixXd.
Matrix<double, 3, 3, RowMajor> E; // Row major; default is column-major.
Matrix3f P, Q, R; // 3x3 float matrix.
@richtr
richtr / config.yml
Last active November 28, 2017 21:36
Parse YAML from bash with sed and awk.
development:
adapter: mysql2
encoding: utf8
database: my_database
username: root
password:
apt:
- somepackage
- anotherpackage
@MInner
MInner / top_k_seq2seq.py
Last active October 25, 2017 02:46
This snipped extracts top k beams from the beam search output of github.com/google/seq2seq.
# based on https://github.com/google/seq2seq/blob/master/bin/tools/generate_beam_viz.py
# extracts probabilities and sequences from .npz file generated during beam search.
# and pickles a list of the length n_samples that has beam_width most probable tuples
# (path, logprob, prob)
# where probs are scaled to 1.
import numpy as np
import networkx as nx
import pickle
@candlewill
candlewill / extract_features_for_merlin.md
Last active November 2, 2022 08:34
Analysis the source code of merlin

声学特征提取

本文介绍如何提取提取声学特征用于Merlin训练。在语音合成中,属于声码器(vocoder)的内容。

Merlin可以使用两种vocoder,STRAIGHTWORLDWORLD的目标是提取60-dim MGC, variable-dim BAP (BAP dim: 1 for 16Khz, 5 for 48Khz), 1-dim LF0;STRAIGHT的目标是提取60-dim MGC, 25-dim BAP, 1-dim LF0。

新版本的WORLD_v2还在开发中,目标是提取60-dim MGC, 5-dim BAP, 1-dim LF0(MGC和BAP的维度支持微调)。

由于STRAIGHT的使用有严格的证书限制,本文,主要介绍WORLD

Merlin for Chinese

用于中文语音合成的Merlin。本文,主要利用Merlin,合成中文语音。

数据准备

为了测试方法是否可行,我们仅使用100条数据。待确认可行,再使用完整数据。

由于缺少中文前端,我们仅使用音素。