Last active
August 29, 2015 14:01
-
-
Save explodecomputer/0948f35d6ed841cc70e4 to your computer and use it in GitHub Desktop.
extractSnps.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# set -e | |
snplistfile=${1} | |
plinkrt=${2} | |
outfile=${3} | |
touch ${outfile}_mergelist.txt | |
rm ${outfile}_mergelist.txt | |
touch ${outfile}_mergelist.txt | |
firstchr="1" | |
flag="0" | |
for i in {01..23} | |
do | |
filename=$(sed -e "s/CHR/$i/g" <<< ${plinkrt}) | |
echo "" | |
echo "$filename" | |
echo "${outfile}_${i}" | |
echo "" | |
plink1.90 --noweb --bfile ${filename} --extract ${snplistfile} --make-bed --out ${outfile}_${i} | |
echo "$?" | |
if [ -f "${outfile}_${i}.bed" ]; then | |
echo "${outfile}_${i}.bed ${outfile}_${i}.bim ${outfile}_${i}.fam" >> ${outfile}_mergelist.txt | |
if [ "${flag}" == "0" ]; then | |
firstchr=${i} | |
fi | |
flag="1" | |
fi | |
done | |
sed -i 1d ${outfile}_mergelist.txt | |
plink1.90 --noweb --bfile ${outfile}_${firstchr} --merge-list ${outfile}_mergelist.txt --make-bed --out ${outfile} | |
rm ${outfile}_* |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The 1000 genomes imputed ALSPAC data is split into separate chromosomes because a single file would be very large. This script can be used to extract a list of SNPs from any of the chromosomes and combine them into a single bed/bim/fam plink file set.
It uses plink1.90 because it is much, much faster than original plink. Download from here:
https://www.cog-genomics.org/plink2/
move the executable to a folder called ~/bin in your home drive, and add this directory to your path by including this line in .bash_profile:
PATH=$PATH:$HOME/bin
Then to run the script you would save it somewhere, set permissions to executable e.g.
chmod 755 extractSnps.sh
and then run something like:
./extractSnps.sh /panfs/panasas01/shared/alspac/deprecated/alspac_combined_1kg_20140424/chrCHR/alspac_1kg_p1v3_CHR
where the is just a file with a single SNP per line, and is the name of the prefix you'd like to have before .bed/.bim/.fam