Skip to content

Instantly share code, notes, and snippets.

@wbazant
Last active October 3, 2017 15:32
Show Gist options
  • Save wbazant/c38cabac0ba5dacf7183255b3a83a655 to your computer and use it in GitHub Desktop.
Save wbazant/c38cabac0ba5dacf7183255b3a83a655 to your computer and use it in GitHub Desktop.

Summary

These experiments had columns that were all NA, and our stuff failed on them:

  • E-MTAB-2039 Oryza sativa, Nipponbare RNA-Seq - Conserved Poaceae Specific Genes Project
  • E-MTAB-4400 Sorghum bicolor, BTx623 RNA-Seq - Conserved Poaceae Specific Genes Project
  • E-MTAB-4401 Brachypodium distachyon, Bd21 RNA-Seq - Conserved Poaceae Specific Genes Project
  • E-MTAB-4818 Solanum lycopersicumTranscriptome or Gene expression

This probably also needs an ISL rerun because not all assays are present in files:

  • E-GEOD-42871

Details

E-MTAB-2039

Most columns in that datafile are NA:

[wbazant@ebi-cli-001 ~]$ head /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-2039/E-MTAB-2039-transcripts-tpms.tsv.undecorated
Transcript ID	SRR352184	SRR352187	SRR352189	SRR352190	SRR352192	SRR352194	SRR352204	SRR352206	SRR352207	SRR352209	SRR352211
OS11T0599200-01	33.18	107.81	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0599101-01	0	0.09	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0599100-01	0.01	0.06	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0599000-00	0	0.03	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0598900-00	0	0.67	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0598800-01	0	3.67	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0598700-00	0	0.14	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0598500-00	0.3	0.32	NA	NA	NA	NA	NA	NA	NA	NA	NA
OS11T0598300-00	3.38	7.71	NA	NA	NA	NA	NA	NA	NA	NA	NA

It fails us like so:

INFO  - Reading XML config from /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-2039/E-MTAB-2039-configuration.xml ...
INFO  - Successfully read XML config.
INFO  - Reading matrix from /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-2039/E-MTAB-2039-transcripts-tpms.tsv.undecorated ...
INFO  - Successfully read /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-2039/E-MTAB-2039-transcripts-tpms.tsv.undecorated
INFO  - Running quantile normalization...
INFO  - Running quantile normalization in R...
FATAL - Errors during R script execution, details below:
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Calls: lapply ... normalizeQuantiles -> approx -> regularize.values -> xy.coords
Execution halted
Errors during R script execution, details below:
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Calls: lapply ... normalizeQuantiles -> approx -> regularize.values -> xy.coords
Execution halted

I think there are NA's in the data file because the raw counts for these assays are all zero. /nfs/production3/ma/home/irap_prod/single_lib/studies/E-MTAB-2039/oryza_sativa/transcripts.raw.kallisto.tsv

This is only for transcript level results, the other pipeline does pick up counts: /nfs/production3/ma/home/irap_prod/single_lib/studies/E-MTAB-2039/oryza_sativa/genes.raw.tsv

E-MTAB-4400

I suspect the same as E-MTAB-2039 because it's the same error. File looks like:

Transcript ID	SRR349643	SRR349644	SRR349645	SRR349646	SRR349754	SRR349767	SRR349768	SRR349769	SRR349771	SRR349772
EES11166	2.97	11.41	28.81	NA	NA	NA	NA	NA	NA	NA
EES11195	4.06	4.12	1.93	NA	NA	NA	NA	NA	NA	NA
EES11310	1.29	5.78	9.52	NA	NA	NA	NA	NA	NA	NA
EES12864	106.56	332.3	537.51	NA	NA	NA	NA	NA	NA	NA
KXG27123	13.73	22.64	3.17	NA	NA	NA	NA	NA	NA	NA
KXG27122	3.36	8.08	8	NA	NA	NA	NA	NA	NA	NA
EES12867	0	0	0	NA	NA	NA	NA	NA	NA	NA
KXG25909	102.97	56.82	46.25	NA	NA	NA	NA	NA	NA	NA
KXG25908	0	0	39.23	NA	NA	NA	NA	NA	NA	NA

E-MTAB-4401

Same:

[fg_atlas@ebi-cli-001 ~]$ head /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-4401/E-MTAB-4401-transcripts-tpms.tsv.undecorated
Transcript ID	SRR349785	SRR349786	SRR349787	SRR352137	SRR352138	SRR352139	SRR352140	SRR352141	SRR352142	SRR352143	SRR352144
BRADI0007S00200.2	0.03	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0007S00210.2	565.61	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0007S00220.2	0.03	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0007S00233.1	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0009S00203.1	0.07	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0009S00210.5	25.52	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0009S00210.6	0.52	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0009S00210.7	172.63	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA
BRADI0009S00220.2	0	NA	NA	NA	NA	NA	NA	NA	NA	NA	NA

E-MTAB-4818

Same:

head /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-MTAB-4818/E-MTAB-4818-transcripts-tpms.tsv.undecorated
Transcript ID	SRR346617	SRR346618	SRR346619	SRR346620	SRR346621	SRR346622	SRR346623	SRR346624	SRR346625	SRR346626	SRR346627	SRR346628	SRR346629	SRR346630	SRR346631	SRR346632	SRR346633	SRR346634	SRR346635	SRR346636
Solyc02g062170.2.1	NA	NA	NA	3.18	NA	NA	NA	0	NA	NA	NA	0	NA	0	0.41	0	0.18	0	2.17	6.13
Solyc02g070370.2.1	NA	NA	NA	1.19	NA	NA	NA	1.69	NA	NA	NA	0.43	NA	0.57	0.43	1.56	0	0.22	0.28	0
Solyc02g085580.2.1	NA	NA	NA	0.45	NA	NA	NA	1.37	NA	NA	NA	2.05	NA	13.51	12.12	18.79	0.08	0.08	2.64	4.33
Solyc02g088390.2.1	NA	NA	NA	200.07	NA	NA	NA	41.31	NA	NA	NA	20.48	NA	33.14	51.74	113.17	549.54	247.02	247.46	44.25
Solyc03g097980.2.1	NA	NA	NA	6.05	NA	NA	NA	7.77	NA	NA	NA	7.31	NA	6.97	7.28	8.83	0.48	0.2	1.46	0
Solyc03g111820.2.1	NA	NA	NA	0.37	NA	NA	NA	0	NA	NA	NA	0	NA	0	0.86	0	2.3	3.36	46.51	0
Solyc03g121880.2.1	NA	NA	NA	34.2	NA	NA	NA	82.24	NA	NA	NA	80.96	NA	90.29	94.3	86.36	140.75	21.66	60.32	58.36
Solyc04g071620.2.1	NA	NA	NA	38.52	NA	NA	NA	582.73	NA	NA	NA	192.3	NA	17.05	36.8	66.95	11.58	8.73	217.61	530.28
Solyc04g076870.2.1	NA	NA	NA	50.14	NA	NA	NA	75.43	NA	NA	NA	33.47	NA	52.65	43.39	31.55	76.18	14.76	42.83	77.61

E-GEOD-42871

The results in ISL miss columns and were not synced to $ATLAS_PROD/analysis.

configuration.xml has 16 assays, and it's from 2016

[fg_atlas@ebi-cli-001 analysis_archive]$ ls -latorh /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-configuration.xml -rw-r--r-- 1 fg_atlas 1.8K Feb 8 2016 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-configuration.xml [fg_atlas@ebi-cli-001 analysis_archive]$ grep '' /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-configuration.xml | wc -l 16

data files in the folder match the configuration.xml: 17 columns, one is gene name

find /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871 -type f -name *tsv* | while read -r file ; do ls -latorh $file; head -n1 $file | tr $'\t' $'\n' | wc -l ; done
-rw-r--r-- 1 fg_atlas 2.7K Feb  2  2016 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/qc/E-GEOD-42871-findCRAMFiles-report.tsv
5
-rw-r--r-- 1 fg_atlas 1.1K Feb  2  2016 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-analysis-methods.tsv
2
-rw-r--r-- 1 fg_atlas 3.4M Feb  2  2016 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-raw-counts.tsv.undecorated
17
-rw-r--r-- 1 fg_atlas 4.5M May 19 12:58 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-tpms.tsv.undecorated
17
-rw-r--r-- 1 fg_atlas 6.9M May 19 12:59 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-tpms.tsv.undecorated.aggregated
10
-rw-r--r-- 1 fg_atlas 6.8M Sep 27 04:12 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-fpkms.tsv
11
-rw-r--r-- 1 fg_atlas 62M Jul 11 12:06 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-coexpressions.tsv.gz
1
-rw-r--r-- 1 fg_atlas 6.8M Feb  2  2016 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-fpkms.tsv.undecorated.aggregated
10
-rw-r--r-- 1 fg_atlas 6.9M Sep 27 04:14 /nfs/production3/ma/home/atlas3-production/analysis/baseline/rna-seq/experiments/E-GEOD-42871/E-GEOD-42871-tpms.tsv
11
ISL has files that are too short - not enough columns, only 14.

Missing these runs: SRR639163 SRR639165

find /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max -type f -name '*.tsv' | while read -r file ; do ls -latorh $file; head -n1 $file | tr $'\t' $'\n' | wc -l ; done
-rw-r--r-- 1 fg_atlas 26M Jun 30  2016 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/exons.tpm.dexseq.tsv
17
-rw-r--r-- 1 fg_atlas 4.9M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/transcripts.fpkm.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 3.7M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.fpkm.htseq2.tsv
14
-rw-r--r-- 1 fg_atlas 27M Jun 30  2016 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/exons.fpkm.dexseq.tsv
17
-rw-r--r-- 1 fg_atlas 1.2K Sep  4 15:12 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/irap.versions.tsv
4
-rw-r--r-- 1 fg_atlas 19M Jun 30  2016 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/exons.raw.dexseq.tsv
17
-rw-r--r-- 1 fg_atlas 3.7M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.tpm.htseq2.tsv
14
-rw-r--r-- 1 fg_atlas 2.9M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.raw.htseq2.tsv
14
-rw-r--r-- 1 fg_atlas 5.0M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/transcripts.tpm.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 3.7M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.tpm.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 3.7M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.fpkm.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 3.6M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/transcripts.riu.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 6.0M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/transcripts.raw.kallisto.tsv
14
-rw-r--r-- 1 fg_atlas 4.4M Sep  4 15:14 /nfs/production3/ma/home/irap_prod/single_lib/studies/E-GEOD-42871/glycine_max/genes.raw.kallisto.tsv
14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment