The input is the genotype data from Supplementary file S13 from the study by Edwards et al.
There are 14112 genes for the 94 samples. I assume the first part of the sample name AA_SRR23446543#1
is the population ID and the last letter is the haplotype ID. So there are 15 samples for AI (A. insularis), 15 for AW (A. woodehouseii), 14 for AC (A. coerulescens), and 1 for each of AA (A. californica), CY (Cyanocorax yucatanicus), and CS (Cyanocitta cristata).
Note that coordiantes are arbitary, 100 bases per each gene. If we have the gene length, we could make it accurate, probably shouldn't matter. We could double check whether CNV length matters in the PoMoCNV framework.
f= "data_S13.csv"
f_read= open(f,'r')