Skip to content

Instantly share code, notes, and snippets.

@IsmailM
Last active January 28, 2025 15:03
Show Gist options
  • Save IsmailM/e929e91b06c892d3bfca65d537899245 to your computer and use it in GitHub Desktop.
Save IsmailM/e929e91b06c892d3bfca65d537899245 to your computer and use it in GitHub Desktop.
using the PGP api to get fastq urls, md5s and sizes
# The below is using JQ from https://stedolan.github.io/jq/ +
# the PGP API v1.2 - https://www.personalgenomes.org.uk/api/v1.2/
curl -X GET "https://www.personalgenomes.org.uk/api/v1.2/all_wgs" -H "accept: application/json" | jq -r '
.[] | [
.hex_id,
(.data[]?.fastq_ftp),
(.data[]?.fastq_md5),
(.data[]?.fastq_bytes | split(";") | .[] | tonumber | . /1024/1024/1024)
] | flatten | @csv' > wgs_fastqs.csv
# Note, some of the records have three fastq files - so the CSV does not fully line up :(
# The 3 exome sequencing datasets
# Note this endpoint is not documented, but it exists (sorry)
curl -X GET "https://www.personalgenomes.org.uk/api/v1.2/all_wxs" -H "accept: application/json" | jq -r '
.[] | [
.hex_id,
(.data[]?.fastq_ftp),
(.data[]?.fastq_md5),
(.data[]?.fastq_bytes | split(";") | .[] | tonumber | . /1024/1024/1024)
] | flatten | @csv' > wxs_fastqs.csv
# Note in the above you can also split the fastq_ftp and fastq_md5 fields
(.data[]?.fastq_ftp | split(";")),
(.data[]?.fastq_md5 | split(";")),
#The FTP file can then be downloaded using curl
# e.g.
# Download the first file from uk35C650
curl -X GET "https://www.personalgenomes.org.uk/api/v1.3/download_url/uk35C650" -H "accept: application/json" | jq '.[0].download_url'
# % Total % Received % Xferd Average Speed Time Time Time Current
# Dload Upload Total Spent Left Speed
# 100 2157 100 2157 0 0 10722 0 --:--:-- --:--:-- --:--:-- 10731
# "ftp.sra.ebi.ac.uk/vol1/fastq/ERR172/004/ERR1726424/ERR1726424_1.fastq.gz"
curl -LO ftp.sra.ebi.ac.uk/vol1/fastq/ERR172/004/ERR1726424/ERR1726424_1.fastq.gz
# % Total % Received % Xferd Average Speed Time Time Time Current
# Dload Upload Total Spent Left Speed
# 21 32.3G 21 7026M 0 0 90.5M 0 0:06:05 0:01:17 0:04:48 110M
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment