Skip to content

Instantly share code, notes, and snippets.

@wbazant
wbazant / asciimath.txt
Created October 20, 2016 11:01
Differential analytics download
Abbreviations
----
v value
exp experiment
c contrast
id identifier
as analytics search, which the query either matches or it doesn't
cs conditions search, which the query either matches or it doesn't
Sets considered
echo "Clear previous Ensembl data from the public all subdirs of ${ATLAS_FTP}/bioentity_properties"
for dir in ensembl mirbase reactome go interpro wbps; do
rm -rf ${ATLAS_FTP}/bioentity_properties/${dir}/*
done
echo "Copy all array design mapping files into the public Ensembl data directory (this directory is used only for Solr index build)"
cp ${ATLAS_PROD}/bioentity_properties/ensembl/*.A-*.tsv ${ATLAS_FTP}/bioentity_properties/ensembl/
pushd ${ATLAS_PROD}/bioentity_properties/ensembl
echo "Copy all Ensembl matrices to the public Ensembl data directory"
#!/bin/bash
# Look at the header of the file, splay it into lines, then use grep -n to find out the position it's in
head -n1 $ATLAS_FTP/bioentity_properties/ensembl/mus_musculus.ensgene.tsv | tr '\t' '\n' | grep -n mgi_id | cut -f 1 -d :
#16
head -n1 $ATLAS_PROD/bioentity_properties/ensembl/mus_musculus.ensgene.tsv | tr '\t' '\n' | grep -n mgi_id | cut -f 1 -d :
#16
[fg_atlas@ebi-005 ~]$ ls $ATLAS_EXPS | grep 'E-.*' | tee /var/tmp/atlas-exps-exps | wc -l
3165
[fg_atlas@ebi-005 ~]$ ls $ATLAS_PROD/analysis/*/*/experiments | grep 'E-.*' | tee /var/tmp/atlas-prod-exps | wc -l
3244
[fg_atlas@ebi-005 ~]$ sort /var/tmp/atlas-prod-exps -o /var/tmp/atlas-prod-exps
[fg_atlas@ebi-005 ~]$ sort /var/tmp/atlas-exps-exps -o /var/tmp/atlas-exps-exps
# experiments in atlas exps but not in atlas prod
[fg_atlas@ebi-005 ~]$ comm -13 /var/tmp/atlas-prod-exps /var/tmp/atlas-exps-exps
E-GEOD-53580
E-GEOD-73312
[fg_atlas@ebi-005 E-ATMX-20]$ for exp in $(comm -13 <( ls $ATLAS_EXPS | grep 'E-.*' | sort ) <( ls $ATLAS_FTP/experiments | grep 'E-.*' | sort )) ; do find $ATLAS_FTP/experiments/$exp -name '*-analytics.tsv' ; done | xargs ls -ld
-rw-r--r-- 1 fg_atlas microarray 5561657 Mar 2 2014 /ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-10041/E-GEOD-10041_A-AFFY-44-analytics.tsv
-rw-r--r-- 1 fg_atlas microarray 1062342 Feb 19 2014 /ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-12332/E-GEOD-12332_A-AFFY-17-analytics.tsv
-rw-r--r-- 1 fg_atlas microarray 7589235 Apr 22 2014 /ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-12667/E-GEOD-12667_A-AFFY-44-analytics.tsv
-rw-r--r-- 1 fg_atlas microarray 3393121 Mar 2 2014 /ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-12767/E-GEOD-12767_A-AFFY-44-analytics.tsv
-rw-r--r-- 1 fg_atlas microarray 1330410 Mar 14 2016 /ebi/ftp/pub/databases/microarray/data/atlas/experiments/E-GEOD-14895/E-GEOD-14895_A-AFFY-33-ana
[fg_atlas@ebi-005 archive]$ mkdir /var/tmp/array-designs
[fg_atlas@ebi-005 archive]$ cp ensembl_87_34/homo_sapiens*A-*.tsv /var/tmp/array-designs
[fg_atlas@ebi-005 archive]$ cd /var/tmp/array-designs
[fg_atlas@ebi-005 array-designs]$ for file in `ls ` ; do cat <(echo -e "ensgene\tdesign_element") <(grep -E "\w+\W\w+" $file) > $file.tmp ; mv $file.tmp $file ; done
[fg_atlas@ebi-005 array-designs]$ wc -l *
8684 homo_sapiens.A-AFFY-10.tsv
6938 homo_sapiens.A-AFFY-11.tsv
5155 homo_sapiens.A-AFFY-12.tsv
6835 homo_sapiens.A-AFFY-13.tsv
25196 homo_sapiens.A-AFFY-141.tsv
--- Analytics index build started Tue Jan 31 10:01:04 GMT 2017 ---
Experiments to index: [E-MTAB-2770, E-MTAB-2706, E-MTAB-4840, E-GEOD-48433, E-GEOD-26284, E-MTAB-4342, E-GEOD-18858, E-MTAB-2512, E-MTAB-3579, E-MTAB-4404, E-MTAB-5214, E-MTAB-2919, E-MTAB-76, E-GEOD-53960, E-GEOD-36272, E-MTAB-4045, E-MTAB-1354, E-GEOD-16837, E-GEOD-49284, E-MTAB-4395, E-MTAB-4484, E-MTAB-4222, E-MTAB-3838, E-MTAB-2836, E-MTAB-1729, E-MTAB-3358, E-MTAB-3827, E-GEOD-60424, E-GEOD-53197, E-GEOD-18344, E-MTAB-5140, E-TABM-234, E-MTAB-4270, E-MTAB-1027, E-GEOD-66354, E-GEOD-16879, E-ERAD-401, E-GEOD-18791, E-TABM-142, E-MTAB-4444, E-MTAB-3871, E-GEOD-3307, E-MTAB-2801, E-MTAB-2812, E-GEOD-41678, E-MTAB-2137, E-GEOD-30495, E-GEOD-62673, E-GEOD-15129, E-GEOD-33643, E-GEOD-38351, E-GEOD-63252, E-GEOD-25250, E-MTAB-513, E-GEOD-16992, E-GEOD-63362, E-GEOD-45757, E-JJRD-1, E-GEOD-37940, E-GEOD-38023, E-TABM-1216, E-TABM-1205, E-MTAB-4260, E-TABM-82, E-GEOD-58603, E-MTAB-4289, E-GEOD-49050, E-GEOD-40844, E-MTAB-4308, E-GEOD-13637, E-GE
We can make this file beautiful and searchable if this error is corrected: No tabs found in this TSV file in line 0.
Gene ID Gene Name g1_g2.p-value g1_g2.log2foldchange g1_g3.p-value g1_g3.log2foldchange g1_g4.p-value g1_g4.log2foldchange g5_g6.p-value g5_g6.log2foldchange g5_g7.p-value g5_g7.log2foldchange g5_g8.p-value g5_g8.log2foldchange
ENSG00000000003 TSPAN6 0.00455588947600315 -0.2 1.47389153905488e-05 0.3 5.25735911012297e-74 1.1 0.137744420192993 -0.2 0.0192978221738748 -0.2 1.69287277016467e-07 0.6
ENSG00000000005 TNMD NA 0 NA 0.2 NA 0.3 NA 0 NA -0.1 NA 0
ENSG00000124635 HIST1H2BJ 2.062881127556e-210 -1.2 0 -1.5 0 -2.6 2.46220174669173e-223 -1.5 0 -1.5 0 -2.7
cd ~/dev/atlas-annotations
# modify the annsrcs to be (much) smaller
# test environment
source util/create_test_env.sh /var/tmp/will-it-go
# use the whole file from prod
scp ebi-cli:/nfs/production3/ma/home/atlas3-production/bioentity_properties/go/go.alternativeID2CanonicalID.tsv /var/tmp/will-it-go/bioentity_properties/go/go.alternativeID2CanonicalID.tsv
# get some properties populated in test environment
export JAVA_OPTS=-Xmx3000M