Skip to content

Instantly share code, notes, and snippets.

E-GEOD-67812 RNASEQ_MRNA_DIFFERENTIAL Homo sapiens
E-GEOD-63452 RNASEQ_MRNA_DIFFERENTIAL Homo sapiens
E-GEOD-1571 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Mus musculus
E-MTAB-2746 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Homo sapiens
E-MEXP-2505 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Mus musculus
E-GEOD-52797 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Mus musculus
E-GEOD-49155 RNASEQ_MRNA_DIFFERENTIAL Homo sapiens
E-GEOD-75431 RNASEQ_MRNA_DIFFERENTIAL Mus musculus
E-GEOD-30526 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Rattus norvegicus
E-MTAB-2751 MICROARRAY_1COLOUR_MRNA_DIFFERENTIAL Mus musculus
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 10 columns, instead of 1 in line 5.
/var/tmp/E-GEOD-59831/E-GEOD-59831.g2_g1.go.gsea.tsv:GO:0003713 transcription coactivator activity 29 0.619039162220047 0.912487999596444 12 17 1987 2686 0.973314243328561
/var/tmp/new-annotations/E-GEOD-59831/E-GEOD-59831.g2_g1.go.gsea.tsv:GO:0003713 transcription coactivator activity 29 0.619039162220047 0.912487999596444 12 17 1987 2686 0.973314243328561
/ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-59831/archive/E-GEOD-59831.g2_g1.go.gsea.tsv.1:GO:0003713 transcription coactivator activity 100 0.00246585317392181 0.0472275722799049 51 49 5635 9730 1.38711748153359
/ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-59831/archive/E-GEOD-59831.g2_g1.go.gsea.tsv.2:GO:0003713 transcription coactivator activity 100 0.00061011434295709 0.0236835294947888 53 47 5588 9693 1.44512143237015
/var/tmp/previous/E-GEOD-59831/E-GEOD-59831.g2_g1.go.gsea.tsv:GO:0003713 transcription coactivator activity 29 0.631141987994359 0.922293749207595 12 17 1954 2610 0.966709930894166
/var/tmp/new-a
We can make this file beautiful and searchable if this error is corrected: It looks like row 2 should actually have 10 columns, instead of 11 in line 1.
#ensembl_version rank Accession Genes (tot) Stat (non-dir.) p p adj (non-dir.) Significant..in.gene.set. Non.significant..in.gene.set. Significant..not.in.gene.set. Non.significant..not.in.gene.set. effect.size
ensembl_75_21 513 GO:0000421 autophagosome membrane 13 0.380691116137044 0.594954040594745 6 7 7178 11485 1.19984581120439
ensembl_75_22 513 GO:0000421 autophagosome membrane 13 0.380691116137044 0.594954040594745 6 7 7178 11485 1.19984581120439
ensembl_76_23 523 GO:0000421 autophagosome membrane 13 0.380259824703545 0.586722950053927 6 7 7136 11425 1.20031019106908
ensembl_77_23 527 GO:0000421 autophagosome membrane 13 0.380259824703545 0.584789281887842 6 7 7136 11425 1.20031019106908
ensembl_77_24 527 GO:0000421 autophagosome membrane 13 0.380259824703545 0.584789281887842 6 7 7136 11425 1.20031019106908
ensembl_78_24 314 GO:0000421 autophagosome membrane 7 0.532909459319941 0.705255440732802 3 4 6482 10637 1.13179865623967
ensembl_78_25 314 GO:0000421 autophagosome membrane
# Annotations changed a lot - mostly lots of new mappings gene -> term:
[fg_atlas@ebi-cli-002 ~]$ comm -12 <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_88_35/mus_musculus.ensgene.go.tsv ) <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_89_36/mus_musculus.ensgene.go.tsv) | wc -l
304880
[fg_atlas@ebi-cli-002 ~]$ comm -13 <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_88_35/mus_musculus.ensgene.go.tsv ) <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_89_36/mus_musculus.ensgene.go.tsv) | wc -l
41488
[fg_atlas@ebi-cli-002 ~]$ comm -23 <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_88_35/mus_musculus.ensgene.go.tsv ) <( sort $ATLAS_PROD/bioentity_properties/archive/ensembl_89_36/mus_musculus.ensgene.go.tsv) | wc -l
3566
# But it's not because there are new GO terms:
[fg_atlas@ebi-cli-002 ~]$ comm -12 <( cut -f2 $ATLAS_PROD/bioentity_properties/archive/ensembl_88_35/mus_musculus.ensgene.go.tsv | sort -u ) <( cut -f2 $ATLAS_PROD/bioentity_properties/archive/ensem
@wbazant
wbazant / 88-89.sh
Last active September 19, 2017 16:26
analyse(){
geneSetFilePath=$1
outputFile=$2
analyticsFile=$ATLAS_EXPS/E-GEOD-59831/E-GEOD-59831-analytics.tsv
echo irap_GSE_piano --tsv=$analyticsFile --pvalue-col=3 --foldchange-col=4 --title="title" --pvalue=0.05 --gs_fdr=0.1 --method=fisher-exact --dup-use-best --plot-annot-only --top=10 --minsize 0 --maxsize 100 --descr $ATLAS_PROD/bioentity_properties/go/goIDToTerm.tsv.decorate.aux --go=$geneSetFilePath --out=$outputFile
irap_GSE_piano --tsv=$analyticsFile --pvalue-col=3 --foldchange-col=4 --title="title" --pvalue=0.05 --gs_fdr=0.1 --method=fisher-exact --dup-use-best --plot-annot-only --top=10 --minsize 0 --maxsize 100 --descr $ATLAS_PROD/bioentity_properties/go/goIDToTerm.tsv.decorate.aux --go=$geneSetFilePath --out=$outputFile
}
comm -12 /nfs/production3/ma/home/atlas3-production/bioentity_properties/archive/ensembl_88_35/mus_musculus.ensgene.go.tsv /nfs/production3/ma/home/atlas3-production/bioentity_properties/archive/ensembl_89_36/mus_musculus.ensgene.go.tsv > annotations/ensembl-88
ensembl_88_35 GO:0001786 phosphatidylserine binding 20 0.0288898209629177 0.159687286242028 12 8 5708 9851 1.63416083916084
89-without-new-genes GO:0001786 9 1.71609768345493e-05 0.00427621175382238 9 0 2783 6648 3.3810888252149
ensembl_89_36 GO:0001786 phosphatidylserine binding 9 0.000449020640444589 0.0838321535710048 9 0 1990 2703 2.35217608804402
ensembl_88_35 GO:0000421 autophagosome membrane 16 0.0321846660623193 0.161396928797408 10 6 5710 9853 1.70225087412587
89-without-new-genes GO:0000421 6 0.000666831275793296 0.0499810879684444 6 0 2786 6648 3.3810888252149
ensembl_89_36 GO:0000421 autophagosome membrane 6 0.00587902331165058 0.236541893799376 6 0 1993 2703 2.35217608804402
89-without-new-genes GO:0098609 6 0.0102106295997413 0.239044298531841 5 1 2787 6647 2.81757402101242
ensembl_89_36 GO:0098609 cell-cell adhesion 6 0.0536954726939721 0.466276500091376 5 1 1994 2702 1.96014674003668
grep GO:0003713 /nfs/production3/ma/home/atlas3-production/bioentity_properties/archive/ensembl_89_36/mus_musculus.ensgene.go.tsv | cut -f 1 | sort -u > $ATLAS_PROD/go-unstable/pvalue-by-hand/genes-for-GO_0003713.txt
head -n 1 $ATLAS_EXPS/E-GEOD-59831/E-GEOD-59831-analytics.tsv > $ATLAS_PROD/go-unstable/pvalue-by-hand/E-GEOD-59831-analytics.tsv
join -1 1 -2 1 <(sort -k1 $ATLAS_EXPS/E-GEOD-59831/E-GEOD-59831-analytics.tsv) $ATLAS_PROD/go-unstable/pvalue-by-hand/genes-for-GO_0003713.txt >> $ATLAS_PROD/go-unstable/pvalue-by-hand/E-GEOD-59831-analytics.tsv
# (with headers)
wc -l $ATLAS_EXPS/E-GEOD-59831/E-GEOD-59831-analytics.tsv $ATLAS_PROD/go-unstable/pvalue-by-hand/E-GEOD-59831-analytics.tsv
# 45514 /nfs/public/ro/fg/atlas/experiments/E-GEOD-59831/E-GEOD-59831-analytics.tsv
# 177 /nfs/production3/ma/home/atlas3-production/go-unstable/pvalue-by-hand/E-GEOD-59831-analytics.tsv
analyse(){
geneSetFilePath=$1;
outputFile=$2;
analyticsFile=$ATLAS_EXPS/E-GEOD-59831/E-GEOD-59831-analytics.tsv
irap_GSE_piano --tsv=$analyticsFile --pvalue-col=3 --foldchange-col=4 --title="title" --pvalue=0.05 --gs_fdr=0.1 --method=fisher-exact --dup-use-best --plot-annot-only --top=10 --minsize 5 --maxsize 100 --descr $ATLAS_PROD/bioentity_properties/go/goIDToTerm.tsv.decorate.aux --go=$geneSetFilePath --out=$outputFile;
}
# a "normal" analysis
@wbazant
wbazant / re-script.md
Last active September 29, 2017 12:01
  1. Can you not do exit at the bottom, and put echo Finished!? We will find it easier to know that the program works correctly. Add -S or --show-error so that we see what went wrong. Make curl's response include the query (so that when it fails, we know what it failed for, imagine you got just a HTTP failure code in the logs - not helpful to work with).

  2. Instead of tr -d '"' do jq -r , “raw” - it does what you want :)

  3. https://www.shellcheck.net/ tells you things about the program -and that it's mostly fine, no warnings and merely notes- could you apply at least the one about $() instead of ``?

  4. I've been collecting useful JQ programs on https://www.ebi.ac.uk/seqdb/confluence/display/GXA/JQ+tips I've just put exactly the one you want (get all baseline experiments) as:

[fg_atlas@ebi-cli-001 analysis_archive]$ git show --name-status dc1f641b89a31d1d4f6754d53fc6b9aa6f27d945 | grep "^D" | grep configs | sed "s@.*configs@$ATLAS_PROD@" | tail -n +3 | while read -r path ; do { exp=$(basename $(dirname $path)) ; mkdir -p $(dirname $path) ; cp -v $ATLAS_EXPS/$exp/$(basename $path) $path ; } ; done
‘/nfs/public/ro/fg/atlas/experiments/E-GEOD-29134/E-GEOD-29134-configuration.xml’ -> ‘/nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-29134/E-GEOD-29134-configuration.xml’
‘/nfs/public/ro/fg/atlas/experiments/E-GEOD-29134/E-GEOD-29134-factors.xml’ -> ‘/nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-29134/E-GEOD-29134-factors.xml’
‘/nfs/public/ro/fg/atlas/experiments/E-GEOD-29163/E-GEOD-29163-configuration.xml’ -> ‘/nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-29163/E-GEOD-29163-configuration.xml’
‘/nfs/public/ro/fg/atlas/experiments/E-GEOD-29163/E-GEOD-29163-factors.xm