I had a ddRAD project with two restriction enzymes not supported by Stacks - Hinfl and HpyCH4IV. This is how I added them to Stacks:
- Download the code, open src/renz.cc
- Check NEB for the restriction enzyme sequences: https://www.nebiolabs.com.au/products/r0155-hinfi#Product%20Information and https://www.nebiolabs.com.au/products/r0619-hpych4iv#Product%20Information
- Understand how Stacks encode restriction enzymes - Hinfl is G CUT ANTC, which in the comments of renz.cc is encoded as G/ANTC
- Choose the longer piece (in this case, ANTC) and translate that into all four possibilities AGTC AATC ATTC ACTC, and the reverse complement GACT GATT GAAT GAGT. For others like A/CGT it's easier, take the longer piece and its reverse-complement, so it's just CGT and ACG
- Add that to the code, and don't forget your semicolons (I think the first letter of the RE name has to be lower case??)
123 const char *xhoI[] = {"TCGAG", // C/TCGAG, XhoI
124 "CTCGA"};
125 const char *hpyCH4IV[] = {"CGT", // A/CGT, HpyCH4IV
126 "ACG"};
127 const char *hinfI[] = {"AATC", "ATTC", // G/ANTC, hinfI
128 "ACTC", "AGTC",
129 "GAGT", "GACT",
130 "GATT", "GAAT"};
131
Don't forget the count of restriction enzyme sites and the length of each site in the two other data structures. If you have only a sequence and its reverse complement then the count is 1. AGT has a length of 3.
132 void
133 initialize_renz(map &renz, map &renz_cnt, map &renz_len) {
134
135 renz["hpyCH4IV" ] = hpyCH4IV; // A/CGT, hpyCH4IV
136 renz["hinfI"] = hinfI; // // G/ANTC, hinfI
137 renz["sbfI"] = sbfI; // CCTGCA/GG, SbfI
138 renz["pstI"] = pstI; // CTGCA/G, PstI
139 renz["notI"] = notI; // GC/GGCCGC, NotI
....
190 renz_cnt["hpyCH4IV" ] = 1;
191 renz_cnt["hinfI"] = 4;
192 renz_cnt["sbfI"] = 1;
...
245 renz_len["hpyCH4IV" ] = 3;
246 renz_len["hinfI"] = 4;
247 renz_len["sbfI"] = 6;
Now go into the Stacks base directory, run ./compile and ./make, fix any errors you introduced, if you now run process_radtags your new restriction enzymes should be listed:
...
Currently supported enzymes include:
'aciI', 'ageI', 'aluI', 'apaLI', 'apeKI', 'apoI', 'aseI', 'bamHI',
'bbvCI', 'bfaI', 'bfuCI', 'bgIII', 'bsaHI', 'bspDI', 'bstYI', 'cac8I',
'claI', 'csp6I', 'ddeI', 'dpnII', 'eaeI', 'ecoRI', 'ecoRV', 'ecoT22I',
'haeIII', 'hinP1I', 'hindIII', 'hinfI', 'hpaII', 'hpyCH4IV', 'kpnI', 'mluCI',
...