Skip to content

Instantly share code, notes, and snippets.

@allenday
Created April 9, 2017 19:42
Show Gist options
  • Save allenday/33c97713e1661074c665283a8ebc11d8 to your computer and use it in GitHub Desktop.
Save allenday/33c97713e1661074c665283a8ebc11d8 to your computer and use it in GitHub Desktop.
SRA download to Google Cloud Storage
name: sra_download
description: use Google Pipeline API to download an SRA run, reformat it as unaligned BAM, and upload it to Google Cloud Storage. Run it like this: gcloud alpha genomics pipelines run --inputs SAMPLE=XXXXX --inputs RUN=XXXXX --outputs OUTPUT_FILE=gs://XXXXX --pipeline-file=sra_download.yaml
resources:
#increase boot disk from 10GB to 50GB to accomodate intermediate files
bootDiskSizeGb: 50
#specify multiple zones so this pipeline will run in parallel
zones:
- us-west1-a
- us-west1-b
- us-east1-b
- us-east1-c
- us-east1-d
- us-central1-a
- us-central1-b
- us-central1-c
- us-central1-f
inputParameters:
- name: SAMPLE
- name: RUN
outputParameters:
- name: OUTPUT_FILE
defaultValue: gs://allenday-dev/oryza/primary/${SAMPLE}/${RUN}.bam
localCopy:
disk: boot
path: /tmp/output.bam
docker:
imageName: index.docker.io/allenday/bfx-seq
cmd: "cd /tmp && /opt/sratoolkit/bin/fastq-dump --split-files -F -O . ${RUN} && picard-tools FastqToSam F1=${RUN}_1.fastq F2=${RUN}_2.fastq O=${RUN}.unsort.bam SAMPLE_NAME='${SAMPLE}' READ_GROUP_NAME='${RUN}' && samtools sort ${RUN}.unsort.bam ${RUN}.sort && picard-tools MarkDuplicates I=${RUN}.sort.bam O=output.bam M=${RUN}.metrics"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment