Created
June 24, 2021 16:55
-
-
Save ptosco/2b19142ff8fd6afdfee12836cec73d4f to your computer and use it in GitHub Desktop.
Use a custom normalization reaction list with the MolStandardizer
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"2021.03.1\n" | |
] | |
} | |
], | |
"source": [ | |
"import rdkit\n", | |
"print(rdkit.__version__)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import tempfile\n", | |
"from rdkit import Chem\n", | |
"from rdkit.Chem.Draw import MolsToGridImage\n", | |
"from rdkit.Chem.MolStandardize import rdMolStandardize" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"You can pass the MolStandardizer a custom list of normalization reactions.<br/>\n", | |
"Here I copied the standard RDKit list and just tweaked the `Pyridine oxide to n+O` rule a bit to make it more specific such that it is not triggered by molecules which are not actually N-oxides (as yours):" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"custom_normalizations = \"\"\"// Opposite of #2.1 in InChI technical manual? Covered by RDKit\n", | |
"// Sanitization.\n", | |
"Nitro to N+(O-)=O\t[N,P,As,Sb;X3:1](=[O,S,Se,Te:2])=[O,S,Se,Te:3]>>[*+1:1]([*-1:2])=[*:3]\n", | |
"Sulfone to S(=O)(=O)\t[S+2:1]([O-:2])([O-:3])>>[S+0:1](=[O-0:2])(=[O-0:3])\n", | |
"Pyridine oxide to n+O-\t[nH0+0:1]=[O:2]>>[n+:1][O-:2]\n", | |
"Azide to N=N+=N-\t[*:1][N:2]=[N:3]#[N:4]>>[*:1][N:2]=[N+:3]=[N-:4]\n", | |
"Diazo/azo to =N+=N-\t[*:1]=[N:2]#[N:3]>>[*:1]=[N+:2]=[N-:3]\n", | |
"Sulfoxide to -S+(O-)-\t[!O:1][S+0;X3:2](=[O:3])[!O:4]>>[*:1][S+1:2]([O-:3])[*:4]\n", | |
"// Equivalent to #1.5 in InChI technical manual\n", | |
"Phosphate to P(O-)=O\t[O,S,Se,Te;-1:1][P+;D4:2][O,S,Se,Te;-1:3]>>[*+0:1]=[P+0;D5:2][*-1:3]\n", | |
"// Equivalent to #1.8 in InChI technical manual\n", | |
"C/S+N to C/S=N+\t[C,S&!$([S+]-[O-]);X3+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]\n", | |
"// Equivalent to #1.8 in InChI technical manual\n", | |
"P+N to P=N+\t[P;X4+1:1]([NX3:2])[NX3!H0:3]>>[*+0:1]([N:2])=[N+:3]\n", | |
"Normalize hydrazine-diazonium\t[CX4:1][NX3H:2]-[NX3H:3][CX4:4][NX2+:5]#[NX1:6]>>[CX4:1][NH0:2]=[NH+:3][C:4][N+0:5]=[NH:6]\n", | |
"// Equivalent to #1.3 in InChI technical manual\n", | |
"Recombine 1,3-separated charges\t[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[N,P,As,Sb,O,S,Se,Te;+1:3]>>[*-0:1]=[*:2]-[*+0:3]\n", | |
"Recombine 1,3-separated charges\t[n,o,p,s;-1:1]:[a:2]=[N,O,P,S;+1:3]>>[*-0:1]:[*:2]-[*+0:3]\n", | |
"Recombine 1,3-separated charges\t[N,O,P,S;-1:1]-[a:2]:[n,o,p,s;+1:3]>>[*-0:1]=[*:2]:[*+0:3]\n", | |
"Recombine 1,5-separated charges\t[N,P,As,Sb,O,S,Se,Te;-1:1]-[A+0:2]=[A:3]-[A:4]=[N,P,As,Sb,O,S,Se,Te;+1:5]>>[*-0:1]=[*:2]-[*:3]=[*:4]-[*+0:5]\n", | |
"Recombine 1,5-separated charges\t[n,o,p,s;-1:1]:[a:2]:[a:3]:[c:4]=[N,O,P,S;+1:5]>>[*-0:1]:[*:2]:[*:3]:[c:4]-[*+0:5]\n", | |
"Recombine 1,5-separated charges\t[N,O,P,S;-1:1]-[c:2]:[a:3]:[a:4]:[n,o,p,s;+1:5]>>[*-0:1]=[c:2]:[*:3]:[*:4]:[*+0:5]\n", | |
"// Conjugated cation rules taken from Francis Atkinson's standardiser. Those\n", | |
"// that can reduce aromaticity aren't included\n", | |
"Normalize 1,3 conjugated cation\t[N,O;+0!H0:1]-[A:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]=[*:2]-[*+0:3]\n", | |
"Normalize 1,3 conjugated cation\t[n;+0!H0:1]:[c:2]=[N!$(*[O-]),O;+1H0:3]>>[*+1:1]:[*:2]-[*+0:3]\n", | |
"Normalize 1,5 conjugated cation\t[N,O;+0!H0:1]-[A:2]=[A:3]-[A:4]=[N!$(*[O-]),O;+1H0:5]>>[*+1:1]=[*:2]-[*:3]=[*:4]-[*+0:5]\n", | |
"Normalize 1,5 conjugated cation\t[n;+0!H0:1]:[a:2]:[a:3]:[c:4]=[N!$(*[O-]),O;+1H0:5]>>[n+1:1]:[*:2]:[*:3]:[*:4]-[*+0:5]\n", | |
"// Equivalent to #1.6 in InChI technical manual. RDKit Sanitization handles\n", | |
"// this for perchlorate.\n", | |
"Charge normalization\t[F,Cl,Br,I,At;-1:1]=[O:2]>>[*-0:1][O-:2]\n", | |
"Charge recombination\t[N,P,As,Sb;-1:1]=[C+;v3:2]>>[*+0:1]#[C+0:2]\n", | |
"\"\"\"" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"m = Chem.MolFromSmiles('Cn1c(=O)c2nc[nH][n+](=O)c2n(C)c1=O')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"params = rdMolStandardize.CleanupParameters()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"For a one-off cleanup job you can use `rdMolStandardize.Cleanup()`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with tempfile.NamedTemporaryFile() as hnd:\n", | |
" hnd.write(custom_normalizations.encode(\"utf-8\"))\n", | |
" hnd.flush()\n", | |
" params.normalizationsFile = hnd.name\n", | |
" clean_mol = rdMolStandardize.Cleanup(m, params)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<rdkit.Chem.rdchem.Mol at 0x7fbf06e2a0d0>" | |
] | |
}, | |
"execution_count": 7, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"clean_mol" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"As parsing the same file multiple times is not very efficient, if you need to clean up multiple molecules you might prefer to create objects once..." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"params = rdMolStandardize.CleanupParameters()\n", | |
"metal_disconnector = rdMolStandardize.MetalDisconnector()\n", | |
"normalizer = rdMolStandardize.NormalizerFromData(custom_normalizations, params)\n", | |
"reionizer = rdMolStandardize.Reionizer()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
" ...and then re-use them (e.g., in a loop) for all the molecules you need to standardize; these are the operations that `rdMolStandardize.Cleanup()` carries out in sequence:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"clean_mol = metal_disconnector.Disconnect(m)\n", | |
"clean_mol = normalizer.normalize(clean_mol)\n", | |
"clean_mol = reionizer.reionize(clean_mol)\n", | |
"Chem.AssignStereochemistry(clean_mol)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"In either case, the amended pattern for pyridine _N_-oxides now does not cause trouble anymore, and you can get other tautomers of that molecule correctly:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"taut = rdMolStandardize.TautomerEnumerator()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<IPython.core.display.Image object>" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"MolsToGridImage(list(taut.Enumerate(clean_mol)))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.7.8" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 4 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment