Skip to content

Instantly share code, notes, and snippets.

@chasemc
Created February 21, 2021 18:56
Show Gist options
  • Save chasemc/0674866b46ced0ad8c5b350f325b80ec to your computer and use it in GitHub Desktop.
Save chasemc/0674866b46ced0ad8c5b350f325b80ec to your computer and use it in GitHub Desktop.
Extract genomic_accessions and lengths from "ftp.ncbi.nlm.nih.gov/genomes............._assembly_report.txt"
#!/usr/bin/bash
curl -s $1 |\
sed -ne '/# Sequence-Name\tSequence-Role\tAssigned-Molecule\tAssigned-Molecule-Location\/Type\tGenBank-Accn\tRelationship\tRefSeq-Accn\tAssembly-Unit\tSequence-Length\tUCSC-style-name/,$ p' |\
awk -F"\t" 'NR==1 {for (i=1; i<=NF; i++) {f[$i] = i}}{ print $(f["RefSeq-Accn"]), $(f["Sequence-Length"])}' |\
sed 1d
@chasemc
Copy link
Author

chasemc commented Feb 21, 2021

example:

./ga_and_len.sh  "ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/158/815/GCF_000158815.1_ASM15881v1/GCF_000158815.1_ASM15881v1_assembly_report.txt"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment