Skip to content

Instantly share code, notes, and snippets.

@sinamajidian
Last active July 20, 2025 20:44
Show Gist options
  • Save sinamajidian/565d14a3bdbfecafcb42ca8173b30987 to your computer and use it in GitHub Desktop.
Save sinamajidian/565d14a3bdbfecafcb42ca8173b30987 to your computer and use it in GitHub Desktop.
Create a diploid genome using SURVIVOR

A diploid genome using SURVIVOR

Using the following bash code, you can create a diploid genome using SURVIVOR. Finally, you will have three files :

  • Two fasta file: sim1.fasta and sim2.fasta.
  • Truely phased vcf file: sim_e_merg.vcf.

Some lines of the intermediate files:

$ grep -v "#" sim1.vcf | head -n2
22	450	  SNP0SURVIVOR	T	G	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1
22	1043	SNP1SURVIVOR	A	C	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT:GL:GQ:FT:RC:DR:DV:RR:RV	1/1

$ grep -v "#" sim1_e.vcf | head -n2 
22	450	  SNP0SURVIVOR	T	G	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT	0|1
22	1043	SNP1SURVIVOR	A	C	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT	0|1

$ grep -v "#" sim_e.vcf | head -n3 
22	113	SNP0SURVIVOR	A	G	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT	./.	0|1
22	450	SNP0SURVIVOR	T	G	.	PASS	PRECISE;SVMETHOD=SURVIVOR_sim;SVLEN=1	GT	0|1	./.

$ grep -v "#" sim_e_merg.vcf | head -n2
22	113	SNP0SURVIVOR	A	G	.	PASS	.	GT	1|0
22	450	SNP0SURVIVOR	T	G	.	PASS	.	GT	0|1

ref="22.fa"
samtools faidx ${ref}
SURVIVOR simSV parameter_file
SURVIVOR simSV ${ref} parameter_file ${snp_rate} 0 sim1
sleep 1
SURVIVOR simSV ${ref} parameter_file ${snp_rate} 0 sim2
grep "#" sim1.vcf > sim1_e.vcf
grep -v "#" sim1.vcf | sed 's/GT:GL:GQ:FT:RC:DR:DV:RR:RV/GT/g' | sed 's/1\/1/0|1/g' |grep "SNP" >> sim1_e.vcf
grep "#" sim1.vcf > sim2_e.vcf
grep -v "#" sim1.vcf | sed 's/GT:GL:GQ:FT:RC:DR:DV:RR:RV/GT/g' | sed 's/1\/1/0|1/g' | grep "SNP" >> sim2_e.vcf
bgzip -c sim1_e.vcf > sim1_e.vcf.gz
tabix sim1_e.vcf.gz
bcftools index sim1_e.vcf.gz
bgzip -c sim2_e.vcf > sim2_e.vcf.gz
tabix sim2_e.vcf.gz
bcftools merge --force-samples sim1_e.vcf.gz sim2_e.vcf.gz > sim_e.vcf
python3 combine_simulated_vcfs.py sim_e.vcf
#!/usr/bin/env python
from sys import argv
vcf_file_address= argv[1]
vcf_file = open(vcf_file_address,'r')
first_first= True
vcf_file_out = open(vcf_file_address[:-4]+"_merg.vcf",'w')
for line in vcf_file:
line_strip = line.strip()
if line_strip.startswith('#'):
if line_strip.startswith('#CHROM'):
line_parts=line_strip.split('\t')
vcf_file_out.write("\t".join(line_parts[:-1])+"\n")
else:
vcf_file_out.write(line_strip+"\n")
else:
line_parts=line_strip.split('\t')
#chrom = line_parts[0]
#var_pos = int(line_parts[1]) # genomic position of variants
#ref=line_parts[3]
#alt=line_parts[4]
sim1=line_parts[9]
sim2=line_parts[10]
if sim1 == "0|1" and sim2 == "./.":
gt_both = "0|1"
elif sim1 == "./." and sim2 == "0|1":
gt_both = "1|0"
elif sim1 == "0|1" and sim2 == "0|1":
gt_both = "1|1"
#print(var_pos)
line_parts_ed=line_parts[:7]+[".","GT",gt_both]
line_strip_ed='\t'.join(line_parts_ed)
#print(line_strip_ed)
vcf_file_out.write(line_strip_ed+"\n")
vcf_file_out.close()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment