Skip to content

Instantly share code, notes, and snippets.

View crazyhottommy's full-sized avatar
🎯
Focusing

Ming Tang crazyhottommy

🎯
Focusing
View GitHub Profile

An error

I was using TEQC to do quality control of my WES bam files aligned by bwa-mem. My data are paired end, so a function reads2pairs is called to make the paired-end reads to be a single fragment. I then get this error:

> readpairs <- reads2pairs(reads)
Error in reads2pairs(reads) : read pair IDs do not seem to be unique

I asked in the bioconductor support site and went to the source code of that function.

rm command is very dangerous because after you remove something, you can not recover it. There is no trash bin in the unix system. If you have some raw data (e.g fastq files), you'd better make them safe by changing the file permissions. in an empty directory, make a folder foo:

mkdir test
cd test
mkdir foo
cd foo
touch {1..4}.fastqs
@crazyhottommy
crazyhottommy / mount smb on ubuntu.md
Created July 13, 2017 20:28
mounting smb on ubuntu

I use sshfs to mount remote servers. but I also want to connecting windows servers to my ubuntu.

If there's one good thing that I can say about Windows XP is that it supports the SMB protocol. This enables a computer running Windows to share files, folders, and more with another PC. All that other PC needs is the right software to take advantage of the SMB protocol. Luckily, that software is available for GNU/Linux.

on mac, I can click the Finder bar --->Go---> Connect to Server and then type in the address.

Some reading for the basics

cores, cpus and threads :
http://www.slac.stanford.edu/comp/unix/package/lsf/currdoc/lsf_admin/index.htm?lim_core_detection.html~main
Traditionally, the value of ncpus has been equal to the number of physical CPUs. However, many CPUs consist of multiple cores and threads, so the traditional 1:1 mapping is no longer useful. A more useful approach is to set ncpus to equal one of the following:

  • The number of processors
  • Cores—the number of cores (per processor) * the number of processors (this is the ncpus default setting)
  • Threads—the number of threads (per core) * the number of cores (per processor) * the number of processors

From Mike Love:https://gist.github.com/mikelove/f539631f9e187a8931d34779436a1c01

An R implementation of the rule:

Archive generated fastq files are organised by run accession number under vol1/fastq directory in ftp.sra.ebi.ac.uk:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/[/]/

is the first 6 letters and numbers of the run accession ( e.g. ERR000 for ERR000916 ),

## DEseq2 built-in function
plotPCA(vsd.fast, intgroup=c("subtype"))

##SVD to get PCs mannually
X<- assay(vsd.fast)

## center X
X<- t(scale(t(X),center=TRUE,scale=FALSE))
# This code will get all clinical indexed data from TCGA
library(TCGAbiolinks)
library(data.table)
clinical <- TCGAbiolinks:::getGDCprojects()$project_id %>%
regexPipes::grep("TCGA",value=T) %>%
sort %>%
plyr::alply(1,GDCquery_clinic, .progress = "text") %>%
rbindlist
readr::write_csv(clinical,path = paste0("all_clin_indexed.csv"))

This is the default behavior for SSH. It protects user keys by enforcing rwx------ on $HOME/.ssh and ensuring only the owner has write permissions to $HOME. If a user other than the respective owner has write permission on the $HOME directory, they could maliciously modify the permissions on $HOME/.ssh, potentially hijacking the user keys, known_hosts, or something similar. In summary, the following permissions on $HOME will be sufficient for SSH to work.

  • rwx------
  • rwxr-x---
  • rwxr-xr-x

>SSH will not work correctly and will send warnings to the log facilities if any variation of g+w or o+w exists on the $HOME directory.