(non-exhaustive list)
(Thoughts resulting from previous discussions (6374, 6543))
(Please also have a look again at Roger's talk from last year's UGM (direct link to PDF))
- InChIKey / SMILES (-> identity determination)
- Fingerprints (-> similarity determination, substructure matching, model training)
- QED
- NP Likeness
- LogP
- pKa (not calcuable by RDKit)
- PolarSurfaceArea (TPSA)
- ...
In addition to this list, everything depending on 2D/3D coordinates (e.g. docking) is influenced by tautomerism.
- Consistency
- "Correctness"
This ofc depends on multiple factors (solvent, pH, ...),
but a certain level of (intuitive) "correctness" should be aimed for (avoiding the "Will I be getting yelled at?" effect).
- Canonical tautomers are probably much more important for your analyzes than you thought (at least this was the case for me)
- Important to always standardize / canonicalize your input structures
- Esp. when different sources are used
- There are two modules to generate tautomers in the RDKit
(and they give different results):Chem.MolStandardize.tautomer.TautomerCanonicalizer
(older implementation)Chem.MolStandardize.rdMolStandardize.TautomerEnumerator
(newer implementation)
- For some applications, tautomer enumeration might be a solution