Skip to content

Instantly share code, notes, and snippets.

View crazyhottommy's full-sized avatar
🎯
Focusing

Ming Tang crazyhottommy

🎯
Focusing
View GitHub Profile
library(tidyverse)

## read in the mutect files
mix.files<- as.list(dir(".", pattern= "*.tsv"))

## need to add the file name into a column
mix_mutect_datlist <- lapply(mix.files, function(f) {
        dat = read.table(f, header =T, sep ="\t", quote = "\"")
@crazyhottommy
crazyhottommy / bkup_dotfiles_configs.md
Last active November 27, 2022 07:31 — forked from sbamin/bkup_dotfiles_configs.md
How to rsync dot files and directories of remote server

backup dotfiles

  • Following will copy all of dot ~/. files and directories (including its contents) directly underneath home directory.
  • To avoid copying cache and other local configs, e.g., that of web browser, java apps, etc., preferably query directory size tool under entire home $HOME/, using ncdu $HOME of similar tool.
  • Exclude all those large directories using rsync --exclude=.local --exclude=.cache format
  • Avoid rsync password, ssh keys, .bash_history, etc. if you are uploading to github, etc.
  • rsync home dotfiles and configs as follows:
# in your local machine
@crazyhottommy
crazyhottommy / maf_legacy.R
Created February 22, 2017 15:39 — forked from tiagochst/maf_legacy.R
Get MAF files aligned against hg19
query.maf.hg19 <- GDCquery(project = "TCGA-COAD",
data.category = "Simple nucleotide variation",
data.type = "Simple somatic mutation",
access = "open",
legacy = TRUE)
# Check maf availables
knitr::kable(getResults(query.maf.hg19)[,c("created_datetime","file_name")])
query.maf.hg19 <- GDCquery(project = "TCGA-COAD",
data.category = "Simple nucleotide variation",

use diffbind to get diff sites

library(DiffBind)
UCI.H3K27ac.dba<- dba(sampleSheet="H3K27ac_diffbind.csv", scoreCol= 7, filter=80, peakFormat = "macs")

UCI_H3K27ac_RPKM<- dba.count(UCI.H3K27ac.dba, minOverlap=2, 
                      fragmentSize = 200, bParallel = T,
                      score = DBA_SCORE_RPKM)

This is the default behavior for SSH. It protects user keys by enforcing rwx------ on $HOME/.ssh and ensuring only the owner has write permissions to $HOME. If a user other than the respective owner has write permission on the $HOME directory, they could maliciously modify the permissions on $HOME/.ssh, potentially hijacking the user keys, known_hosts, or something similar. In summary, the following permissions on $HOME will be sufficient for SSH to work.

  • rwx------
  • rwxr-x---
  • rwxr-xr-x

>SSH will not work correctly and will send warnings to the log facilities if any variation of g+w or o+w exists on the $HOME directory.

# This code will get all clinical indexed data from TCGA
library(TCGAbiolinks)
library(data.table)
clinical <- TCGAbiolinks:::getGDCprojects()$project_id %>%
regexPipes::grep("TCGA",value=T) %>%
sort %>%
plyr::alply(1,GDCquery_clinic, .progress = "text") %>%
rbindlist
readr::write_csv(clinical,path = paste0("all_clin_indexed.csv"))
## DEseq2 built-in function
plotPCA(vsd.fast, intgroup=c("subtype"))

##SVD to get PCs mannually
X<- assay(vsd.fast)

## center X
X<- t(scale(t(X),center=TRUE,scale=FALSE))

From Mike Love:https://gist.github.com/mikelove/f539631f9e187a8931d34779436a1c01

An R implementation of the rule:

Archive generated fastq files are organised by run accession number under vol1/fastq directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/[/]/

is the first 6 letters and numbers of the run accession ( e.g. ERR000 for ERR000916 ),

Some reading for the basics

cores, cpus and threads :
http://www.slac.stanford.edu/comp/unix/package/lsf/currdoc/lsf_admin/index.htm?lim_core_detection.html~main
Traditionally, the value of ncpus has been equal to the number of physical CPUs. However, many CPUs consist of multiple cores and threads, so the traditional 1:1 mapping is no longer useful. A more useful approach is to set ncpus to equal one of the following:

  • The number of processors
  • Cores—the number of cores (per processor) * the number of processors (this is the ncpus default setting)
  • Threads—the number of threads (per core) * the number of cores (per processor) * the number of processors