cgpu’s gists

cgpu / cohort_parsed_file.phe

Created September 29, 2020 19:19

	FID IID PAT MAT SEX PHE year_of_birth participant_ethnic_category death_date sars.cov.2_positive height_hcm
	10_10 10_10 0 0 2 1 25 8 20200929 2 1.65869684755143
	11_11 11_11 0 0 1 0 32 8 20200929 1 2.05760365489074
	13_13 13_13 0 1 2 1 8 8 20200929 1 1.91661311219814
	15_15 15_15 0 1 1 0 9 8 20200929 2 1.70644503349575
	21_21 21_21 0 1 2 0 26 8 20200929 2 1.68937434957125
	26_26 26_26 0 1 2 0 25 8 20200929 2 1.62607132142262
	40_40 40_40 0 0 2 0 31 8 20200929 2 1.98534853658691
	52_52 52_52 0 1 2 0 70 8 20200929 1 1.63362615482562
	53_53 53_53 0 1 1 0 11 8 20200929 2 1.84182132278676

cgpu / IQR.csv

Created May 28, 2020 16:37

IQR.csv

We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 3 columns, instead of 4 in line 1.

	S1,S2,S3
	ARHGEF10L,11.1818,11.0186,11.243
	HIF3A,5.2482,5.3847,4.0013
	RNF17,4.1956,0,0
	RNF10,11.504,11.669,12.0791
	RNF11,9.5995,11.398,9.8248
	RNF13,9.6257,10.8249,10.5608
	GTF2IP1,11.8053,11.5487,12.1228
	REM1,5.6835,3.5408,3.5582
	MTVR2,0,1.4714,0

cgpu / Dockerfile

Created May 25, 2020 03:25

jupyterlab/jupyterlab-monaco labextension

	# Install nvm with node and npm
	RUN curl https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh \| bash && \
	source /home/jovyan/.nvm/nvm.sh && \
	nvm install 8.4 && \
	npm install -g yarn && \
	yarn config set ignore-engines true && \
	cd /tmp && \
	git clone https://github.com/jupyterlab/jupyterlab-monaco.git && \
	cd jupyterlab-monaco && \
	yarn install && \

cgpu / gist:52272fa205958c5afd00da18aa9e6f2f

Created May 5, 2020 21:27

	while read line
	do
	srr=$line
	numLines=$(fastq-dump -X 1 -Z --split-spot $srr \| wc -l)
	if [ $numLines -eq 4 ]
	then
	echo "$srr,single_end"
	echo "$srr,single_end" >> file.log
	else
	echo "$srr,paired_end"

cgpu / sra-paired.sh

Created May 1, 2020 22:59 — forked from slowkow/sra-paired.sh

Check if an SRA file contains paired-end data.

	#!/usr/bin/env bash
	# sra-paired.sh
	# Kamil Slowikowski
	# April 23, 2014
	#
	# Check if an SRA file contains paired-end sequencing data.
	#
	# See documentation for the SRA Toolkit:
	# http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump

cgpu / cohort_18.csv

Last active April 30, 2020 02:26

	sample_id	status
	SRR3623945	case
	SRR10503928	control
	SRR10503929	control
	SRR3623941	case

cgpu / read-multiple-csv-files.R

Created April 29, 2020 22:31 — forked from apreshill/read-multiple-csv-files

Read multiple csv files into R

	# stack overflow answer from Joran Ellis:
	# http://stackoverflow.com/questions/5319839/read-multiple-csv-files-into-separate-data-frames

	# If the path is different than your working directory
	# you'll need to set full.names = TRUE to get the full
	# paths.
	my_files <- list.files("path/to/files")

	# Further arguments to read.csv can be passed in ...
	all_csv <- lapply(my_files,read.csv,...)

cgpu / ggpubrViolin.R

Created April 29, 2020 10:48

	# Adding roxygen comments
	#' A function for generating an interactive violin plot, and an ! interactive annotated gg violin plot
	#'
	#' This function generate a customisable annotated violin plot in png from a dataframe with an outcome variable with 2 levels
	#' @param data dataframe with features as coluns, observations as rows. All features must continuous variables. A column denoting the case control must be present
	#' @param group The column name from the column holding the case control status information
	#' @param feature The column name from the feature of interest for comparison between groups
	#' @param ctrl_id The value that denotes the controls in the "group" column (eg. "0", "Control"). Must be provided in double quotes
	#' @param case_id The value that denotes the cases in the "group" column (eg. "1", "Case"). Must be provided in double quotes
	#' @param SAVEDIR absolut path to output directory

cgpu / gtf2last_col_tsv.awk

Created April 28, 2020 10:30

Gencode GTF https://www.biostars.org/p/140471/

 cat gencode.v33.primary_assembly.annotation.gtf | awk 'BEGIN{FS="\t"}{split($9,a,";"); if($3~"gene") print a[1]"\t"a[3]"\t"$1":"$4"-"$5"\t"a[2]"\t"$7}' |sed 's/gene_id "//' | sed 's/gene_id "//' | sed 's/gene_type "//'| sed 's/gene_name "//' | sed 's/"//g' | awk 'BEGIN{FS="\t"}{split($3,a,"[:-]"); print $1"\t"$2"\t"a[1]"\t"a[2]"\t"a[3]"\t"$4"\t"$5"\t"a[3]-a[2];}' | sed "1i\Geneid\tGeneSymbol\tChromosome\tStart\tEnd\tClass\tStrand\tLength" | less -S

cgpu / ggextras.R

Created April 25, 2020 18:09

	theme(text = element_text( color = "#4A637B", face = "bold", family = 'Helvetica')
	,plot.caption = element_text(size = 9, color = "#8d99ae", face = "plain" )
	,plot.title = element_text(size = 18, color = "#2b2d42", face = "bold", hjust=0.15 )
	,axis.text.y = element_text(size = 10, color = "#8d99ae", face = "bold", hjust=1.1 )
	,axis.title.x = element_text(size = 14 , hjust = 0.15 )
	,axis.text.x = element_blank()
	,axis.title.y = element_blank()
	,axis.ticks.x = element_blank()
	,axis.ticks.y = element_blank()
	,plot.margin = unit(c(1,1,1,1),"cm")

Christina Chatzipantsiou cgpu