Skip to content

Instantly share code, notes, and snippets.

View Puriney's full-sized avatar

Yun YAN Puriney

View GitHub Profile
@Puriney
Puriney / PrettyGitLogCommand
Created May 13, 2014 22:25
Git: pretty log
# Pretty git log
# https://github.com/tiimgreen/github-cheat-sheet/blob/master/README.zh-cn.md#%E6%9B%B4%E7%9B%B4%E8%A7%82%E7%9A%84git-log
# echo $0 >> ~/.bash_profile
alias gitlog="git log --all --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%an>%Creset' --abbrev-commit --date=relative"
@Puriney
Puriney / AddColumn.R
Created July 30, 2014 17:03
R: GenomicRanges elementMetadata add column
library(GenomicRanges)
sampleChr <- paste0("Chr", 1:3)
sampleStarts <- sample(1:100, size = 3, replace = F)
sampleEnds <- sampleStarts + 10
sampleStrand <- sample(c("+", "-"), size = 3, replace = T)
sampleValue <- rnorm(3)
sampleGR1 <- GRanges(seqnames = Rle(sampleChr), ranges = IRanges(sampleStarts, sampleEnds), strand = Rle(sampleStrand), value = sampleValue)
sampleGR2 <- GRanges(seqnames = Rle(sampleChr), ranges = IRanges(sampleStarts + 10 , sampleEnds + 100), strand = Rle(sampleStrand), value = sampleValue)
@Puriney
Puriney / vimrc
Created July 31, 2014 10:52
Vim: Vimrc
" don't bother with vi compatibility
set nocompatible
" Color Scheme
" enable syntax highlighting
syntax enable
set background=dark
" configure Vundle
filetype on " without this vim emits a zero exit status, later, because of :ft off
filetype off
set rtp+=~/.vim/bundle/vundle/
@Puriney
Puriney / ChemmineOB and open-babela.md
Last active August 29, 2015 14:07
ChemmineOB and open-babel.md

install open-babel

brew install open-babel

locate OPEN_BABEL_INCDIR and OPEN_BABEL_LIBDIR

OPEN_BABEL_INCDIR is /usr/local/Cellar/open-babel/2.3.2/lib/openbabel/2.3.2, where files like alias.h, descriptor.h, pointgroup.h, atom.h etc.

OPEN_BABEL_LIBDIR is /usr/local/Cellar/open-babel/2.3.2/lib/openbabel/2.3.2, where files like APIInterface.so, fastsearchformat.so, outformat.so, CSRformat.so, etc.

@Puriney
Puriney / README.md
Last active August 29, 2015 14:07 — forked from psychemedia/README.md

Minimal R shiny app demonstrating:

  1. how to upload a CSV file into an R/shiny app
  2. how to automatically populate list selectors based on column headers
  3. how to use optional list selectors
  4. how to populate a list selector with column names of numerical columns only
  5. how to use an action button to trigger an event when you're ready to do so

@Puriney
Puriney / README.md
Last active August 29, 2015 14:10 — forked from yihui/README.md

See https://yihui.shinyapps.io/voice for the live demo. Make sure you have turned on your microphone and allow the web browser to have access to it. Credits go to annyang and also to @wch for showing me a minimal Shiny example. You can do four things on the scatterplot using your voice input:

  • say "title something" to change the plot title, e.g. title good morning
  • say "color a color name" to change the color, e.g. color blue
  • say "bigger" or "smaller" to change the size of points
  • say "regression" to add a linear regression line
@Puriney
Puriney / note.md
Last active August 29, 2015 14:14
note

Sporadic Breast Cancer为案例,提出了整合大数据多层次去解析生物数据以获取对自然更加准确的认知。

文章提出了meta-dimentional analysis 以及multi-staged analysis (或systems genomics approaches)这样比较“新潮”的概念。虽然是生物领域里的问题,但既然是数据分析,一旦从生物背景剥离出来,仍旧是一个个经典的机器学习教科书式案例。

处理单个数据

不积跬步无以至千里,在做大数据整合之前必须先好好审视每一个单个数据。关乎单个数据,文中提到了至少有以下该考量的方面:

  • 数据质量控制 (Data Quality Control) 。所谓龙生龙凤生凤老鼠儿子会打洞,垃圾数据出来的肯定是垃圾结果(Garbage in, Garbage out) 。

  • 数据降维 (Data Reduction)。大数据一来,想搞5百万个SNP之间的两两相互关系,反正你们搞计算机的不是很厉害么?哪个经费足的生物大佬一拍桌子,买台服务器大不了就是穷举嘛,大不了就是五百万选二的排列组合。羡慕又可惜,高帅富刷硬件;可怜又幸运,屌丝刷算法。回到分析问题的根源:自变的变量你用了太多太多,计算的维度你升了太多太多,瞳孔放大不代表就能看的更多,难怪大人们说生物博士永远都不要念。文中举出了一些经典算法去实现数据降维,如ReliefF, chi-square statistics, PCA, factor analysis, genetic algorithm 和 linkage disequilibrium。顺带一提,找到对应的每一篇引用文献,这又是一篇篇计算生物学的入门读物。不难发现其实文中是把降维(Dimension Reduction) 和 特征选择(Feature Selection)一并揉在一起,而这两个概念我想有必要一提。不同于互联网世界,生物科研世界更注重模拟模型和预测结果的可解读性。比如PCA这种降维方法最后汇报的主成分(一般两个),你很难让一个生物学家具体的去解释,因为杂糅了诸多个自变量的主成分没有办法直接获得生物意义的诠释。相比之下,如果可以通过一系列尽管媲美黑魔法但的确有逻辑解释的计算方法,即特征选择,撇去某一些无关紧要的自变量,那么恭喜你,你又迈出了刷算法当屌丝的一步。关于特征选择,这里有几个我读过的相关资料:

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Screencapture GIF

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

@Puriney
Puriney / bst.py
Created December 18, 2015 03:15
Python: BST (Optimal BST)
#!/usr/bin/env python
import numpy as np
from graphviz import Graph
#==== Genreate data ====
# nodes = map(str, [2, 4, 5, 6, 7, 8])
n = 20
nodes = np.random.randint(1, 100, n)
nodes = map(str, sorted(set(nodes)))
nInf = np.inf
@Puriney
Puriney / RNN.cpp
Last active April 17, 2022 22:36
RNN to learn binary addition implemented in Rcpp
//
// Yun Yan
//
// [[Rcpp::plugins(cpp11)]]
#include <bitset>
#include <unordered_set>
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// #include <RcppEigen.h>