-
-
Save PatWalters/c2f558202cb386043e583edc880fb303 to your computer and use it in GitHub Desktop.
I added a final call to SanitizeMol as a check.
Seems a bit dangerous, as it fixes anything with the "wrong" valence, which could be due to drawing errors or similar. May be dangerous to apply indiscriminately on large datasets.
Just wanted to clarify that the intended use of this function is to fix issues caused by incorrect charges. We have an alternative function fix_valence that breaks bonds instead to correct valence issues (this one designed mainly for generative models and automated compound enumeration xD). Neither of these functions should be used indiscriminately, but they are often quite useful, especially in the case highlighted by @PatWalters here.
To be clear, I'm not recommending this as a general solution for processing millions of molecules. I'm simply highlighting one way of getting around an issue many of us deal with.
One problem I see is that fixed_mol might have other problems since it was loaded without sanitization, while there is a single thing we wanted to correct in this molecule.
fixed_mol should probably be read again in rdkit, with sanitize=True, so we can check it is fully correct now.