Created
June 17, 2013 15:13
-
-
Save dbolser/5797661 to your computer and use it in GitHub Desktop.
Format to convert...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Example input... | |
# | |
# More information on reference human assembly build 36: | |
# http://www.ncbi.nlm.nih.gov/projects/mapview/map_search.cgi?taxid=9606&build=36 | |
# | |
# rsid chromosome position genotype | |
rs4477212 1 72017 AA | |
rs3094315 1 742429 AA | |
rs3131972 1 742584 GG | |
rs12124819 1 766409 GG | |
rs11240777 1 788822 GG | |
rs6681049 1 789870 CC | |
Example output | |
IndividualID rs123 rs456 rs456 … | |
N1234 A/C C/G T/T … | |
N1235 A/C G/G ?/? … | |
N3455 C/C G/G A/T … | |
Current solution: | |
perl -ane '$x{$F[0]}=$F[3]if$F[0]=~/^rs/&&$F[1]==6&&$F[3]=~/[ATCG]{2}/;END{print join(" ", "IndividualID", sort keys %x), "\n"; print join(" ", "NM3", for {substr($x{$_},1,0,"/");$x{$_}} sort keys %x), "\n"}' genome_Dan_Bolser_Full_20110223175600.txt > genome_Dan_Bolser_Full_20110223175600_chr6_unknown_format.txt |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment