Skip to content

Instantly share code, notes, and snippets.

@IsmailM
Last active August 24, 2018 19:25
Show Gist options
  • Select an option

  • Save IsmailM/95dda9440189338f9844eee3f6dc0f6b to your computer and use it in GitHub Desktop.

Select an option

Save IsmailM/95dda9440189338f9844eee3f6dc0f6b to your computer and use it in GitHub Desktop.
Tutorial on how to download data from PGP-UK - https://www.personalgenomes.org.uk/data/
# Download PGP Data CSV
# Contains three columns:
# - PGP-UK Hex ID
# - Direct Link
# - Type of Data
wget https://www.personalgenomes.org.uk/data/data_file_links.csv
# Here we display the Types of Data that are available
cat data_file_links.csv | cut -d ',' -f 3 | sort | uniq -c
# You will see something like:
# (Run on 24, Fri Aug 2018)
# 11 Methylation 450k Array Green Blood IDAT
# 13 Methylation 450k Array Green Saliva IDAT
# 11 Methylation 450k Array Red Blood IDAT
# 13 Methylation 450k Array Red Saliva IDAT
# 10 Transcriptomic - Amplicon Fastq
# 20 Transcriptomic - Proton RNA Sequence Fastq
# 30 Transcriptomic - RNAseq Fastq
# 11 VCF
# 11 VCF MD5
# 11 VCF Tabix Index
# 10 WGBS Bam
# 10 WGBS Bam Index
# 20 WGBS Fastq
# 11 WGS Bam
# 1 WGS Bam Index
# 90 WGS Cram
# 203 WGS Fastq
### To extract the links of a certain type of data.
# Here, we use `awk` to extract rows where the third row is equal to "VCF Tabix Index",
# We then extract the second field (the url), which is written to a text file
cat data_file_links.csv | \
awk -F ',' '$3 == "VCF Tabix Index" {print $2}' > download_urls.txt
# To download these URLs sequentially
wget -i download_urls.txt
# Downloading in parallel
## Using Xargs (tested with Mac xargs and Ubuntu GNU xargs)
cat download_urls.txt | xargs -n 1 -P 8 wget -q
## Alternative option with GNUParallels
# Requires prior installation of GNU Parallel - https://www.gnu.org/software/parallel/
# Set the number of threads using the -j argument (set to 8 below)
parallel -a download_urls.txt -j 8 wget {}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment