Created
October 21, 2021 19:26
-
-
Save ssiddhantsharma/25f94dc6741f2d1c45c43a333dca2331 to your computer and use it in GitHub Desktop.
Simple conversion of a SMILES String to 3 different chemical data formats. See Figure 1. https://iopscience.iop.org/article/10.1088/2632-2153/aba947/meta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "Molecular_Representations.ipynb", | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Lpqzq6S--hXf" | |
}, | |
"source": [ | |
"### **We will start by installing RDKit, SELFIES v2. and DeepSMILES using !pip**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "M8nbMHcL-9K5", | |
"outputId": "21dbb97f-7cd9-45ee-df7d-7f5400393766" | |
}, | |
"source": [ | |
"!pip install rdkit-pypi \n", | |
"!pip install selfies --upgrade \n", | |
"!pip install --upgrade deepsmiles" | |
], | |
"execution_count": 2, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Requirement already satisfied: rdkit-pypi in /usr/local/lib/python3.7/dist-packages (2021.3.5.1)\n", | |
"Requirement already satisfied: numpy>=1.19 in /usr/local/lib/python3.7/dist-packages (from rdkit-pypi) (1.19.5)\n", | |
"Requirement already satisfied: selfies in /usr/local/lib/python3.7/dist-packages (2.0.0)\n", | |
"Requirement already satisfied: deepsmiles in /usr/local/lib/python3.7/dist-packages (1.0.1)\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "psqSkp58_nXR" | |
}, | |
"source": [ | |
"### **Importing relevant libraries and drawing small organic molecule: 3,4-Methylenedioxymethamphetamine**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 317 | |
}, | |
"id": "z8XQwFgEAHys", | |
"outputId": "525a8261-1b4e-4e3a-84a5-73f15d6946ba" | |
}, | |
"source": [ | |
"from rdkit import Chem \n", | |
"from rdkit.Chem.Draw import IPythonConsole #RDKit molecule drawing capabilites \n", | |
"from rdkit.Chem import Draw\n", | |
"IPythonConsole.drawOptions.addAtomIndices = True\n", | |
"IPythonConsole.molSize = 300,300\n", | |
"import selfies as sf #importing selfies\n", | |
"import deepsmiles # importing deepsmiles\n", | |
"mol = Chem.MolFromSmiles('CNC(C)CC1=CC=C2C(=C1)OCO2') #SMILES string for 3,4-Methylenedioxymethamphetamine\n", | |
"mol" | |
], | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"image/png": "\n", | |
"text/plain": [ | |
"<rdkit.Chem.rdchem.Mol at 0x7f0bdd97f080>" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 3 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "fBGPOIqSBiGt" | |
}, | |
"source": [ | |
"### **Converting SMILES String to SELFIES v2. and InChI**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "SYd7RO4UBygy", | |
"outputId": "2e0de0f5-fa93-48cd-f8a0-f92c112b85bf" | |
}, | |
"source": [ | |
"SMILES = \"CNC(C)CC1=CC=C2C(=C1)OCO2\"\n", | |
"SELFIES = sf.encoder(SMILES) # SMILES --> SEFLIES v2.\n", | |
"print(f\"Generated SELFIES: {SELFIES}\")\n", | |
"\n", | |
"InChI = Chem.MolToInchi(mol) # SMILES --> InChI\n", | |
"print(f\"Generated Inchi: {InChI}\")\n" | |
], | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Generated SELFIES: [C][N][C][Branch1][C][C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1]\n", | |
"Generated Inchi: InChI=1S/C11H15NO2/c1-8(12-2)5-9-3-4-10-11(6-9)14-7-13-10/h3-4,6,8,12H,5,7H2,1-2H3\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "wb9JG0guCugN" | |
}, | |
"source": [ | |
"### **Converting SMILES String to DeepSMILES**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "xI8vJ-rDCzyp", | |
"outputId": "b9265321-c0fc-47d0-e61c-a0012178ebe6" | |
}, | |
"source": [ | |
"converter = deepsmiles.Converter(rings=True, branches=True)\n", | |
"DeepSMILES = converter.encode(\"CNC(C)CC1=CC=C2C(=C1)OCO2\")\n", | |
"print(f\"Generated DeepSMILES: {DeepSMILES}\")\n" | |
], | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Generated DeepSMILES: CNCC)CC=CC=CC=C6)OCO5\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment