Skip to content

Instantly share code, notes, and snippets.

View cgpu's full-sized avatar
:octocat:

Christina Chatzipantsiou cgpu

:octocat:
View GitHub Profile
FID IID PAT MAT SEX PHE year_of_birth participant_ethnic_category death_date sars.cov.2_positive height_hcm
10_10 10_10 0 0 2 1 25 8 20200929 2 1.65869684755143
11_11 11_11 0 0 1 0 32 8 20200929 1 2.05760365489074
13_13 13_13 0 1 2 1 8 8 20200929 1 1.91661311219814
15_15 15_15 0 1 1 0 9 8 20200929 2 1.70644503349575
21_21 21_21 0 1 2 0 26 8 20200929 2 1.68937434957125
26_26 26_26 0 1 2 0 25 8 20200929 2 1.62607132142262
40_40 40_40 0 0 2 0 31 8 20200929 2 1.98534853658691
52_52 52_52 0 1 2 0 70 8 20200929 1 1.63362615482562
53_53 53_53 0 1 1 0 11 8 20200929 2 1.84182132278676
@cgpu
cgpu / IQR.csv
Created May 28, 2020 16:37
IQR.csv
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 3 columns, instead of 4 in line 1.
S1,S2,S3
ARHGEF10L,11.1818,11.0186,11.243
HIF3A,5.2482,5.3847,4.0013
RNF17,4.1956,0,0
RNF10,11.504,11.669,12.0791
RNF11,9.5995,11.398,9.8248
RNF13,9.6257,10.8249,10.5608
GTF2IP1,11.8053,11.5487,12.1228
REM1,5.6835,3.5408,3.5582
MTVR2,0,1.4714,0
@cgpu
cgpu / Dockerfile
Created May 25, 2020 03:25
jupyterlab/jupyterlab-monaco labextension
# Install nvm with node and npm
RUN curl https://raw.githubusercontent.com/creationix/nvm/v0.30.1/install.sh | bash && \
source /home/jovyan/.nvm/nvm.sh && \
nvm install 8.4 && \
npm install -g yarn && \
yarn config set ignore-engines true && \
cd /tmp && \
git clone https://github.com/jupyterlab/jupyterlab-monaco.git && \
cd jupyterlab-monaco && \
yarn install && \
while read line
do
srr=$line
numLines=$(fastq-dump -X 1 -Z --split-spot $srr | wc -l)
if [ $numLines -eq 4 ]
then
echo "$srr,single_end"
echo "$srr,single_end" >> file.log
else
echo "$srr,paired_end"
@cgpu
cgpu / sra-paired.sh
Created May 1, 2020 22:59 — forked from slowkow/sra-paired.sh
Check if an SRA file contains paired-end data.
#!/usr/bin/env bash
# sra-paired.sh
# Kamil Slowikowski
# April 23, 2014
#
# Check if an SRA file contains paired-end sequencing data.
#
# See documentation for the SRA Toolkit:
# http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=toolkit_doc&f=fastq-dump
sample_id status
SRR3623945 case
SRR10503928 control
SRR10503929 control
SRR3623941 case
@cgpu
cgpu / read-multiple-csv-files.R
Created April 29, 2020 22:31 — forked from apreshill/read-multiple-csv-files
Read multiple csv files into R
# stack overflow answer from Joran Ellis:
# http://stackoverflow.com/questions/5319839/read-multiple-csv-files-into-separate-data-frames
# If the path is different than your working directory
# you'll need to set full.names = TRUE to get the full
# paths.
my_files <- list.files("path/to/files")
# Further arguments to read.csv can be passed in ...
all_csv <- lapply(my_files,read.csv,...)
# Adding roxygen comments
#' A function for generating an interactive violin plot, and an ! interactive annotated gg violin plot
#'
#' This function generate a customisable annotated violin plot in png from a dataframe with an outcome variable with 2 levels
#' @param data dataframe with features as coluns, observations as rows. All features must continuous variables. A column denoting the case control must be present
#' @param group The column name from the column holding the case control status information
#' @param feature The column name from the feature of interest for comparison between groups
#' @param ctrl_id The value that denotes the controls in the "group" column (eg. "0", "Control"). Must be provided in double quotes
#' @param case_id The value that denotes the cases in the "group" column (eg. "1", "Case"). Must be provided in double quotes
#' @param SAVEDIR absolut path to output directory
cat gencode.v33.primary_assembly.annotation.gtf | awk 'BEGIN{FS="\t"}{split($9,a,";"); if($3~"gene") print a[1]"\t"a[3]"\t"$1":"$4"-"$5"\t"a[2]"\t"$7}' |sed 's/gene_id "//' | sed 's/gene_id "//' | sed 's/gene_type "//'| sed 's/gene_name "//' | sed 's/"//g' | awk 'BEGIN{FS="\t"}{split($3,a,"[:-]"); print $1"\t"$2"\t"a[1]"\t"a[2]"\t"a[3]"\t"$4"\t"$5"\t"a[3]-a[2];}' | sed "1i\Geneid\tGeneSymbol\tChromosome\tStart\tEnd\tClass\tStrand\tLength" | less -S
theme(text = element_text( color = "#4A637B", face = "bold", family = 'Helvetica')
,plot.caption = element_text(size = 9, color = "#8d99ae", face = "plain" )
,plot.title = element_text(size = 18, color = "#2b2d42", face = "bold", hjust=0.15 )
,axis.text.y = element_text(size = 10, color = "#8d99ae", face = "bold", hjust=1.1 )
,axis.title.x = element_text(size = 14 , hjust = 0.15 )
,axis.text.x = element_blank()
,axis.title.y = element_blank()
,axis.ticks.x = element_blank()
,axis.ticks.y = element_blank()
,plot.margin = unit(c(1,1,1,1),"cm")