Skip to content

Instantly share code, notes, and snippets.

@apahl
Created September 18, 2023 09:54
Show Gist options
  • Save apahl/3e9997527dc1627bf1d757c8632c4f83 to your computer and use it in GitHub Desktop.
Save apahl/3e9997527dc1627bf1d757c8632c4f83 to your computer and use it in GitHub Desktop.
Properties and Descriptors influenced by Tautomers

Properties and Descriptors influenced by Tautomers

(non-exhaustive list)

(Thoughts resulting from previous discussions (6374, 6543))

(Please also have a look again at Roger's talk from last year's UGM (direct link to PDF))

Identity / Similarity

  • InChIKey / SMILES (-> identity determination)
  • Fingerprints (-> similarity determination, substructure matching, model training)

Scores & Descriptors (-> Model Training)

  • QED
  • NP Likeness
  • LogP
  • pKa (not calcuable by RDKit)
  • PolarSurfaceArea (TPSA)
  • ...

In addition to this list, everything depending on 2D/3D coordinates (e.g. docking) is influenced by tautomerism.

Goals of Canonical Tautomer Generation

  1. Consistency
  2. "Correctness"
    This ofc depends on multiple factors (solvent, pH, ...),
    but a certain level of (intuitive) "correctness" should be aimed for (avoiding the "Will I be getting yelled at?" effect).

Summary

  • Canonical tautomers are probably much more important for your analyzes than you thought (at least this was the case for me)
  • Important to always standardize / canonicalize your input structures
    • Esp. when different sources are used
  • There are two modules to generate tautomers in the RDKit
    (and they give different results):
    • Chem.MolStandardize.tautomer.TautomerCanonicalizer (older implementation)
    • Chem.MolStandardize.rdMolStandardize.TautomerEnumerator (newer implementation)
  • For some applications, tautomer enumeration might be a solution

How is Everybody Doing It?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment