Strategy:
- Extract the chr/pos/rsid from dbsnp into a
.bed
file. Here it makes a separate .bed file per chromosome for parallelisation of lookups - Create a tabix index for the
.bed
files - Extract the chr/pos from the target VCF file
- Use tabix to query the target chr/pos against the tabix indexed
.bed
files. Parellelised across chromosomes using GNU parallel. - Update the VCF file with the extracted RSIDs
mamba create -n bcftools -c bioconda -c conda-forge bcftools tabix parallel
mamba activate bcftools