Skip to content

Instantly share code, notes, and snippets.

@PatWalters
Last active December 3, 2024 12:15
Show Gist options
  • Save PatWalters/c2f558202cb386043e583edc880fb303 to your computer and use it in GitHub Desktop.
Save PatWalters/c2f558202cb386043e583edc880fb303 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@UnixJunkie
Copy link

One problem I see is that fixed_mol might have other problems since it was loaded without sanitization, while there is a single thing we wanted to correct in this molecule.
fixed_mol should probably be read again in rdkit, with sanitize=True, so we can check it is fully correct now.

@PatWalters
Copy link
Author

I added a final call to SanitizeMol as a check.

@EBjerrum
Copy link

EBjerrum commented May 8, 2023

Seems a bit dangerous, as it fixes anything with the "wrong" valence, which could be due to drawing errors or similar. May be dangerous to apply indiscriminately on large datasets.

@maclandrol
Copy link

Just wanted to clarify that the intended use of this function is to fix issues caused by incorrect charges. We have an alternative function fix_valence that breaks bonds instead to correct valence issues (this one designed mainly for generative models and automated compound enumeration xD). Neither of these functions should be used indiscriminately, but they are often quite useful, especially in the case highlighted by @PatWalters here.

@PatWalters
Copy link
Author

To be clear, I'm not recommending this as a general solution for processing millions of molecules. I'm simply highlighting one way of getting around an issue many of us deal with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment