Created
January 15, 2014 17:43
-
-
Save danielecook/8440831 to your computer and use it in GitHub Desktop.
This chunk of code produces 'kegg_merged.txt' which is a file consisting of genes and their respective pathways. This gist downloads a number of files from the UCSC genome browser and merge them together.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Download KEGG Data (Pathways) | |
#==============================# | |
# Download select files from UCSC (hg19) | |
for var in keggPathway KeggMapDesc knownGene kgXref | |
do | |
wget --timestamping --directory-prefix test 'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/$var.txt.gz' | |
gunzip kegg/$var.txt.gz | |
done | |
# Join kegg pathway description with ID; keggMapDesc is already sorted; kgXref has 82,960 lines. | |
sort kegg/keggPathway.txt -k3 | join -1 3 -2 1 -t $'\t' - kegg/keggMapDesc.txt | cut -f 1,2,4 | sort -k 2 > kegg/kegg_tmp.txt # 58,073 lines. | |
sort kegg/KgXref.txt -k 1 | join -1 1 -2 2 -t $'\t' - kegg/kegg_tmp.txt > kegg/kegg_merged.txt | |
rm kegg/kegg_tmp.txt |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment