Created
October 21, 2021 19:26
-
-
Save ssiddhantsharma/25f94dc6741f2d1c45c43a333dca2331 to your computer and use it in GitHub Desktop.
Simple conversion of a SMILES String to 3 different chemical data formats. See Figure 1. https://iopscience.iop.org/article/10.1088/2632-2153/aba947/meta
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "Molecular_Representations.ipynb", | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Lpqzq6S--hXf" | |
}, | |
"source": [ | |
"### **We will start by installing RDKit, SELFIES v2. and DeepSMILES using !pip**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "M8nbMHcL-9K5", | |
"outputId": "21dbb97f-7cd9-45ee-df7d-7f5400393766" | |
}, | |
"source": [ | |
"!pip install rdkit-pypi \n", | |
"!pip install selfies --upgrade \n", | |
"!pip install --upgrade deepsmiles" | |
], | |
"execution_count": 2, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Requirement already satisfied: rdkit-pypi in /usr/local/lib/python3.7/dist-packages (2021.3.5.1)\n", | |
"Requirement already satisfied: numpy>=1.19 in /usr/local/lib/python3.7/dist-packages (from rdkit-pypi) (1.19.5)\n", | |
"Requirement already satisfied: selfies in /usr/local/lib/python3.7/dist-packages (2.0.0)\n", | |
"Requirement already satisfied: deepsmiles in /usr/local/lib/python3.7/dist-packages (1.0.1)\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "psqSkp58_nXR" | |
}, | |
"source": [ | |
"### **Importing relevant libraries and drawing small organic molecule: 3,4-Methylenedioxymethamphetamine**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 317 | |
}, | |
"id": "z8XQwFgEAHys", | |
"outputId": "525a8261-1b4e-4e3a-84a5-73f15d6946ba" | |
}, | |
"source": [ | |
"from rdkit import Chem \n", | |
"from rdkit.Chem.Draw import IPythonConsole #RDKit molecule drawing capabilites \n", | |
"from rdkit.Chem import Draw\n", | |
"IPythonConsole.drawOptions.addAtomIndices = True\n", | |
"IPythonConsole.molSize = 300,300\n", | |
"import selfies as sf #importing selfies\n", | |
"import deepsmiles # importing deepsmiles\n", | |
"mol = Chem.MolFromSmiles('CNC(C)CC1=CC=C2C(=C1)OCO2') #SMILES string for 3,4-Methylenedioxymethamphetamine\n", | |
"mol" | |
], | |
"execution_count": 3, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"image/png": "iVBORw0KGgoAAAANSUhEUgAAASwAAAEsCAIAAAD2HxkiAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3dd1gU1/oH8HcbHYQsQbESFRRQUbGDejWgUbGgItaQWJCIwRrxJioWrDcaorEQEyP6s2HHhqJigYAK6ioSFUWsSAsCUtxd9vz+GLNBWFbK7pwt7+fJcx+cmZ19vfLdOTtnzjkcQggghOjh0i4AIX2HIUSIMgwhQpRhCBGiDEOIEGUYQoQowxAiRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQIkQZhhAhyjCECFGGIUSIMgwhQpRhCBGiDEOIEGUYQoQowxAiRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQIkQZhhAhyjCECFGGIUSIMgwhQpRhCBGiDEOIEGUYQoQowxAiRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQapaioqIZM2Z4eHgMHz5cLBbTLgexgUMIoV0D+teoUaOGDx/+5ZdfvnnzxtLSknY5iA14JdQg+fn5t27dGjx4sEgkMjQ0pF0OYgmfdgHoX8+fPy8oKJg/f76lpeWlS5diY2OtrKxoF4XUDq+EGsTY2Lh169Y7d+4MCwvr3r37hQsXaFeE2IAh1CB2dna5ubmvXr0qLy+/e/euUCikXRFiAzZHNYhAINi2bZuvr++7d+/69+/fr18/2hUhNuDdUYQow+YoQpRhCBGiTI++E7569crPz0/+x9DQ0O7du1Osp6ysLCkpKTU19d69e8nJyffu3Zs9e7a9vf348eMpVoXYp4/fCWUymaur69mzZ21sbNh835KSkpSUlNu3b4tEIpFIdOfOnaKioooHGBkZSaXSEydOfPHFF2wWhujSoyuhXFRUVM+ePdWdQJlM9tdffyUnJ8uvdZmZmRUPMDQ0dHV1dXJycnV1dXV1bdeu3Y4dO+bNmzdy5MjY2Fi6V2nEJn28Evbt2/e3336zt7dX7WlLSkpu3ryZnJzMBO/hw4eVLnRCobBTp07y1Dk4OAgEgkonmTlz5ubNm21tbRMTE5s3b67aCpFm0rsrYWJiopWVlUoSmJmZmZSUJE/d06dPy8vL5Xs5HI6Tk5Ozs7M8dY0bN/7oOX/++eenT5+ePHly8ODB8fHxDRo0qH+dSMPp3ZXQ19d35syZvXv3ru0LxWLx3bt3mYYlk7r8/PyKBxgaGrZr165i87JuwyCKiorc3d3v3LkzcODAkydP8vl690Gpb/QrhK9fvx43blxsbGxNDs7Ly7t165Y8dWlpaRKJpOIBNWle1s3Lly+7d+/+8uXLqVOnbt++XSXnRBpLv0KoBCEkNTVVfhOl6n0UDofj6OhY2+ZlnSUnJ/ft27e4uHjDhg1z5sxR3xsh6jCEsHjx4uTkZJFI9OrVq4rbORxOq1atOv7DxcWladOmbBZ25MgRHx8fADh06JC3tzebb43YpO8hDAwMPHny5LNnzwDA2tq6R48e8mudCpuXdfa///1vwYIFxsbG2Gmhw/Q6hImJiT179vzkk082btzYsWPHNm3aaOBdkMDAwC1btmCnhQ7T62dH9+3bBwCTJ0+eMGGCs7MzxQQWFhZWt+vnn38eMGBAZmbmoEGDCgoK2KwKsUN/QyiVSvfv3w8A48aNo1vJpUuX7Ozs9uzZo3Avn88/dOhQhw4dUlNTfX19pVIpy+UhddPfEF68eDE7O9vJyalz5850K4mPj8/Pz/f3909KSlJ4gLm5eVRUVMOGDc+ePfvNN9+wXB5SO6KvvvrqKwBYtmwZ7UIIIYSJllAoTEtLq+6YpKQkU1NTAFi/fj2btSF109MQlpSUWFhYcDicR48e0a6FEEIkEsmAAQMAoG3btn///Xd1hx06dIjL5XK53CNHjrBZHlIrPQ1hZGQkAHTr1o12If8qLCzs0KEDAPTp0+fdu3fVHbZmzRoAMDY2TkxMZLM8pD56GkKm7zssLIx2IR/IyMho1KgRAHz99ddKDgsICACARo0aPX36lLXaVE4mk/n7+3/zzTe0C6FPH0OYn59vaGjI4/EyMzNp11KZ/Ivf6tWrqztGLBZ7enoCgJOTU35+PpvlqdDWrVunTZvm6upKuxD69DGEv/32GwB4enoq3Ltz587BgwdfuHCB5arkDh8+zOVyORzOnj17qjumoKCgffv2ADBw4ECJRMJmeSrx7NmzTp06ZWRkYAiJfobw888/B4AdO3Yo3Nu3b18A2LVrF8tVVbRu3ToAMDIy+vPPP6s75smTJw0bNgSAqVOnslmbSgwZMuTs2bNZWVkYQqKHIXzx4gWXyzU2Ni4oKKi699mzZ1wu18TEpLCwkP3aKpoxYwYAWFtbK+m0uHHjhomJidZ1Whw7dsze3j4yMnL79u0tW7ZMSkqiXRFletdZf+DAAZlMNnjwYAsLi6p79+/fL5PJhg4dam5uzn5tFW3cuNHLyys3N3fYsGGVRg/LdenSJSIigsvlfvfdd0ePHmW5wjqztbWdP39+fn5+QUGBVCotKSmhXRFttD8F2NalSxcAOHz4sMK9nTp1AoBjx46xXJVC8k6Lvn37Kum0WLVqFWhnp0VOTs5//vMf2lXQp18hvH//PgBYWlqWlZVV3ZuSkgIAQqFQyW88y168eMEMYpw8ebKSw6ZPnw4AjRo1ysjIYK02pCr61Rxlhk2MHDlS4RKcBw4cYPYaGBiwXVk1mjRpcvz4cVNT0x07djB3axTatGmTh4fH69evBw8e/ObNm+oOKywszM/Pz8/PxxagZqH9KcCq1q1bA0BMTEzVXTKZrFWrVgBw8eJF9gtT7uTJkzwej8Ph7N27t7pjCgoK2rVrB0o7Lezs7Hx8fHx8fJScB7FPj0J448YNAGjSpEl5eXnVvQkJCQDQtGlThXup+/HHHwHA2Ng4ISGhumPS09OZTospU6ZU3SuRSNq1a6fOGqtVWlp648aN7du3M/Pc7d69Oz4+nkolmkmPQshMlzR79myFe4OCggBg/vz5LFdVc4GBgQBgbW2t5KHz69evM50WP/74Y6VdL168cHR0XL169ebNm4uKitRXp0wmS0lJiYiICA4O9vLysrW1rdT4Mjc3t7S0TE1NVV8N2kVfQiiVSpnfhuvXr1fdK5FImFnxk5OT2a+thqRS6dChQ+FjT6tFRkYqHGlRUlKyZ8+ehISE5cuXe3l5qbaw1NTUffv2BQcHDxw4kHn8tSJmwv/Jkydv3LjxypUrX3/9NQA0btz4+fPnKixDe+lLCM+fPw8ADg4OCveePXuW+eVmuaraKiwsdHFxAYABAwYoeVotNDRUSdtVKpVaWFjUp4zMzMyoqKiQkBAfHx8nJ6dK04IwU4/7+PiEhIRERUW9fPmy0svFYrGHhwcAdO7c+e3bt/WpRDfoSwinTJkCACEhIQr3atQAX+XknRYKv/jJ+fv7w4edFomJiRs2bHj58uWWLVsGDBhQ83eUSqVM8zIoKMjNzc3KykrhhW7SpElhYWFXr16tyTPl8ttIQ4YMkUqlNS9GJ+lFCMvKypgZ6e/fv191r6YN8P2o5ORkZqRF1S9+cmKxmHlEVt52LSsrW7Nmzfjx45cvX56bm6vk/EVFRVevXg0LC5s0aZKTk1PVDhuhUOjh4REUFBQREZGSkiIWi+vwt0hPT2e+AsydO7cOL9clehHCI0eOAEB1zwofPHgQNGyA70cxnRbKh9jn5eW1adNGeacF+ec+SmRkZEhIiML7KB9tXtaZ/DbSL7/8oqpzaiO9COHo0aOVXDeYAb4//fQTy1XV0/r16+FjT6vJrzYV266lpaVXr14NDw9nmpdVV60xMjJyc3Pz9/dnmpdv3rxR398iMjKSw+HweLyoqCj1vYuG0/0QvnnzxsjIiMfjKfwI1+QBvh81c+ZM+NgQ+7i4OCMjIwAYNWpUQEBAz549zczMKqXO2Ni4a9eu06ZN27x5c1xcHMsjSFasWMH0W9y+fZvN99Ucuh/CP/74AwD69++vcK/yAb4aTv7Fz9XVVckXsx07dvB4PHn21Ne8rDM/Pz997rTQ/RAys5ht375d4V7lA3w1X2FhYa9evQ4dOqTkGOZhIAsLi5UrV546dUoTUleJ/NNEPzstdDyEmZmZPB7P0NBQ4X1z5QN8tcVHn7NjHrUJDg5mp566kd9G0sNOCx0fRREZGVleXj5o0CCFi+ZGRkYqGeCrLbhcZf+IEomEmd+R+mz/yn3yySdnzpyxsbE5derUd999R7scdtH+FFAvZjmxAwcOKNzbtWtXAFDeltN2p06dAgBaj27X1tWrV5lRZps2baJdC3t0OYQPHz4EgAYNGpSWllbd++DBAwCwtLRUuFdnTJw4EQBWrlxJu5CaOnDggL51Wuhyc5RZdGnEiBHMPfpKmAG+3t7eCvfqhuLi4qNHj3I4nAkTJtCupabGjBmzdOnS8vLyCRMmiEQi2uWwgvangBq1bdsWAKKjoxXuVTLAV2cwHzS9evWiXUjtyGSyL7/8EvSm00LrQ3jnzp2YmJj09PRK22/evAkADRs2VHirTfkAX53BDH3SxofCxGJx//79QQs7LX766acePXr079+/5ov2aHdzdMeOHXPnzn306JG3tzfTGybHXATGjBnD4/GqvpDZ6+Pjo/zWolbLy8uLjo4WCARjxoyhXUutCQSCgwcPOjg43Lx509fXt7y8nHZFNZKXl1dUVJSQkBAZGTl9+nRSw7Xo1fmhoHZ+fn4HDx4khISGhq5atUq+vby8vFmzZgCgcAZr+QDfa9eusVcr67Zt2wYAgwYNol1I3T1+/PjTTz8FgDlz5tCupXauXLnSs2fPGh6s3SG8fPly8+bNFy1a5Onp+erVK/n2S5cuAYC9vb3CV124cAGqH+CrM/r06QMAu3fvpl1IvWhjp8WMGTMcHBxqvpKCdjfGrl+/7u3t3ahRozdv3jBf8xh79+4FAF9fX4WvYtqiY8eOZadIKjIyMq5evWpqasqMEdFe7u7uERERHA5n9uzZJ06coF1OjWzevPnu3btLlizJy8ur0QvU+pGgbk2aNGGeR7ty5YqHhwezsaysjBn9/ddff1V9iXyAr8K9OmPt2rUAMHbsWNqFqMaSJUtAG0Za5OTk7N+/nxDy5s2bpk2bZmdn1+RV2h3CcePGLVy4MD4+fsKECfJFUY4fPw4AHTt2VPiS0tLSLVu2KJ/QWgcwU9HoTH+3TCabNGkSaHynhUQimTdvXq9evdzd3ZWsbFeJdoewtLR0//7969atO336tHwj085cu3YtxcLounv3LgAIhcK6TTyhmd69e9evXz8A6Ny5s1qnbGSfdoewqqKiIhMTEy6X++zZM9q1UPPDDz8AAHOLXJfk5uba29sDwODBg3VppIV235ip5Nq1a6NHj5ZIJA4ODkwXhR4ihPzf//0faPywiToQCoVnzpyxtrY+ffq0To20oP0poAIlJSW///67q6sr8zcyMjLi8/nnzp2jXRcd8fHxAGBnZyeTyWjXohZXrlzRzE6L8vLyhw8fRkZGfv/990OGDFE4zbRC2h1CkUjk7+8vX9DTyckpPDx86dKlAGBmZnbr1i3aBVLATDyzYMEC2oWo0b59+5iRFsePH6dYRk5OTlRU1Jo1a5i5IQUCQcXL28aNG2t4Hq0MoVgsjoiIkF/6+Hy+j49PTEyM/EHQgIAAAGjcuLG+fTMUi8XMIyY6/wG0aNEiYL3TIisr69y5c2vXrh0/fnzVqccBQCgUfv755/Pmzdu9e7eS2bcq0bIQPn36NDg4WL7awaeffhoSEvLixYtKh4nFYk9PTwDo1KmTjt1JU+706dOgPUN460MmkzFDJdXXaVFpbsgGDRpUipyhoWHFuSFrMvW4QtoRQplMFhMT4+XlJX/e2sPDIzIyUsmSugUFBe3btweAQYMG6dKdNOWYzrTQ0FDahbChtLS0V69eKvyozc7Orti8rHqhs7a2rv/U41Vpegjz8vLWrFnj4ODA/L9gamrq7+9fw7WTnjx5wqzXN2vWLHXXqQmKi4vNzc05HM7jx49p18KS+nRafHQJN9bmhqw2hBkZGcwABUJIenp6WFjYjh07WB3Z9fDhouBg+fIjzZs3X7ly5evXr2t1jhs3bjATrdf8W7L2YmYSqPnD+7rhr7/+Yn5Jqlt5Uq6kpIRZY8Pf39/Nza3q7F5sTj1ekeIQhoaGDh06tFmzZoSQR48ede3adceOHQsXLhw+fLjaKyopIeHhxNWVAHzbp4/8pkudb7gfPHiQy+XyeLxjx46ptlJNM2zYMA28cc+Cy5cvK+y0yMrK+mjz0svLKzg4WLXNy9pS1hxlQkgIKS4uJoRkZWXZ2dmpsZZXr8jy5aRpUwJAAIil5bNFix48eFD/E69atUrnOy1yc3MFAgGfz6/hQ8M6hplnncfjhYSErF692tfXt02bNlXHc9vY2Hh6ei5YsGDPnj337t3TkJsFNQohIWTx4sXdu3c/c+aM6kuQSklkJPHwIFzu+/h5eJDISFL9TZc6mD59um53WoSHhwPAF198QbsQaubOncvhcCpe7mg1L2urpiG8f//+zp073d3dVfnm2dkkJIQ0afI+e6amxN+f3Lypyrf4h3x1WF3ttOjbty8A1Hwgqe65fPkyADRo0MDf33/r1q1//vmntvxD1yiE8pk5P/3002rvzcTFke+/J/7+xN+fBAeTixeVvW1MDPHxIYaG7+Pn7EzCw4ma56KXrw47aNAgJev1aaNnz55xuVwTExOWF1TSKExj54cffqBdSK0pDuHly5eDg4PNzc2Dg4OvXLni7u6+fv36GTNm+Pr6Kjg6P58MG/Y+Tg0bElvb9z97epKsrA+OLCoiYWHE0fH9AXw+8fEhMTGEracc5ev1BQUFsfOO7Fi3bh0AKP7X0Q/ykdypqanMlpcvX4aGhi5ZskTDxwGT6kKYkZER84/CwsKcnJxTp07FxsYq+CIrlZJevQgAmTyZyB9cyMwkX31FAEiHDqSsjBBC7twh/v7EwuJ9/GxsSEgIobE8kHx1WF3qtOjYsSMA0H2Qki5mJLeLiwvzR5lM1q1bt717954/f75Fixa17dliWb076yMiCAAZPVrBri+/JABk0yYik5GWLd/Hr1Mnsn07KS6u7/vWQ2RkJJfL5XK5utFpkZKSAjo3hLe2qo7kll8wRo0adeHCBUp11Ui9Q/j55wSAKLz1/+gRASBduxJCyPr1ZOJEomgCQipCQ0N1ptOCeZTZ39+fdiHUFBYWVjeS+927d05OThrebVO/EMpkxMyMWFlV+6WuRQvC5xONXHHF399fBzotZDJZy5YtASA2NpZ2LdTs2rULAPr06VNpe3l5+ZgxYzT/6YX6jawvKoK3b6FpU+BwFB/QvDlIpZCTU693UY9ffvnl888/f/Xq1bBhw96+fUu7nDpKTExMT09v1qwZM8uofmLmsBw/fnzFjQUFBd7e3p07d2YGWGqy+oWQmZxcyUzyzCMLGjmHuUAgiIyMbNOmze3bt8eMGSOVSmlX9F5+fn5sbGxYWNhXX3313//+V/nB8jlUdXg+f+WysrJiYmIMDAwqzvZfWlrao0ePBw8epKenT58+nZkMWnPV6zoqlRI+nzRtWu0Bzs4EQN0dgPUh77T49ttvqRTw7t27pKSkiIgIZtCa/IF1RnWTiDMkEgkzhPemep5w0AqbNm0CgKFDh1bcKJVKH1eg4cuhV36ktXZ4PHB2BpEIsrKgYcPKe4uK4OFDaNkSNHgx6s8+++zIkSMeHh6bNm1q3bp1UFCQut9RLBanpKSIRKLbt28z/1tQUFDxAIFA4OTk1LFjRxcXl06dOik51fnz53NycpydnZUfptsUtkV5PB7zVVkr1C+EADB8OIhEsGMHVG04/fEHSCQwcmR930LN3NzcIiIixo4dO2fOnBYtWgwfPly1509PT793717yPzIzMysdYGtr61pB48aNa3hmfZjPX7n09PSEhAQzMzNmBIm2qu+l9PVrYm1NjI1Jhel3CSHk4sX3N04rrNOiyZYtWwYAJiYm9VyqqaysrGLzkplyvyIej+fk5DRp0iTmqeK///67bm+kh0N4q2K6miZOnEi7kHpRxcj6ixeJuTkBIH37ksWLSUgI8fAgHA4xMamcTA0mXx3W1ta2Vp0Wubm5MTExYWFhCqfcAgAzMzM3Nzf5nAhKpuSolQMHDgBAjx49VHI2LeXo6AgAp7Xn10whFU1vkZFBgoKIvT0xMCACAWnVisyYQbTtE1q+Omy7du1qOOzFz8+v6lBRKyurfv36zZ49e+fOnbdu3VLTgyxMs1mXHr6rrdu3bwNAw4YNtf1xfE2fY4ZleXl5bdq0gRqPtAgKCjIwMHB1da1/87JWcnNzDQwM9HYIL2PBggUAEBgYSLuQ+lJ1CK9fJ337Ekq3+1VCvjpsTR4Ey8rKYnOo6OPHj48cORISEtKjRw8OhyNfDU4PyRdjjo+Pp11Lfak6hNeu/fu8qNaSrw77888/UywjJydHybfNgIAAkUhEsTy6EhLuCQSC5s2b68Bs//Xuoqjk008BAHJzVXxadjGrw44bN05NnRbVyc7OFolEt27dEolEIpHowYMHlZ7jsbOzc3Fx6dixY8eOHbt27dqkSRN2CtNAu3c7mZu/Cgx8xKnukUntwSGEqPJ8RUVgYQFmZlBUpMrT0rB06dJly5aZmJjExsZ269ZN5ecvLS1leg5TU1Pv3bt39+7dwsLCigeYmZm5uLjI+w/t7e0NDAzke7OzswGAedxHKpXevn3b0tKydevWKq9TA4nF0KgR5OdDaio4OtKupt5UHUIAMDaGsjIoLQUjIxWfmV2EED8/v927d9va2l67dq3+a629fv36xo0b8tQ9fPiw0oWuZcuWTk5OH+21Ly4unjt37uXLl0eMGLFmzRqxWDxgwAAHB4cnT5706dNn8eLF9axT8504AcOGgYsL3L5NuxRVUHVzFACsreHFC8jNhaZNVX9yFnE4nN9+++3FixexsbGDBg2Kj4+vuhqBEuXl5ffv3694rav0rAxzW1WeOmdn50oPjlbHxMRkxYoV58+fv3PnDgAcPXq0RYsWv/76K7MwY2Bg4CeffFKrv6nW2bcPAEBn1l9UWwhzcrQ9hABgYGBw8ODBXr163bt3b9y4cVFRUVV7BeXevn17+/Zt+eNpjx49EovFFQ+o1Lx0cHCo2rNfExwOh2mFMkQiEbMkg0AgaN++fWpqqru7ex1Oqy3evoXjx4HLhQ8fF9Viagih9t+bKS4u/vvvv5s2bcrhcJjVYXv06HHmzJkZM2b8+uuvzDGEkCdPnih/KLSGzct6kkgk8o8GPp+vOWOy1OT4cSgpgd69QWfWYlbPlRBAMwfy1sS5c+cCAgLatWv34sWLU6dO2dratmzZ8vDhw56entu3b5dKpY0bN2buXj5//rziCzkczmeffcbcumTuYTZv3pyFgj/77LOHDx8yP6elpWnR6IG60bG2KOCVsKqlS5fu3r3bzc1t48aNmzZtYqbQ7927d1hY2DfffHPkyBH5yCOhUNipUyf5ta7Ozcs6iImJuXbt2tOnT6Ojo319fXv16uXq6pqWltaqVSt2kk9Lbi6cOwcCAfj40C5FddR2JdTaEPJ4PGZOxNatW8fExMi3M/0HpqamU6dOZS50jo6OSr4iqlVhYaGzs7Ozs3NhYaFQKDx79uz+/fuFQiGzMJMOO3gQJBIYMuT9b5luwOZoZcuXL580aZKtrS2fz5evKJKenr506VIOh7N3715mwnm6Ro0aVfGPdnZ2CxcupFUMm3SvLQrYHK2qX79+d+7cEYvF+/btE4lEzMbAwMDS0tLx48drQgL1VkYGxMWBmRmMGEG7FJXCK2Flz58/z8nJKSkpWb9+/aFDhwDg6NGj0dHRlpaWGzZsoF2dXjtwAAiBoUPB1JR2KSqFV8LKnj9/Hh4ebmVltW/fvrZt2759+3bWrFkAsHz58oZV59FBLNLJtiio5bG1rCxo1AhsbCArS8VnpiE4OHjdunXdunVLSEjQ22kFNUFKCrRvD0IhZGYCWzehWaKG3yqhELhcyMsDmUz1J2fXnTt3NmzYwOVyN2/ejAmki7nvO3q0riUQ1BJCPh8sLaG8HPLzVX9yFhFCAgMDpVLptGnTunTpQrscvUaIzrZFQS0hBK3vKmTs2rUrLi7OxsZm9erVtGvRd4mJkJ4OzZpB7960S1EDtfQ1P+vWrczGpkF+vvbex8jLy5s3bx4ArFu3roaDG5D6MJfBsWOVLbmgvdQSwqCiouNxcUdfv9be7pxFixbl5eX169ePmQcR0TVlCggEMGEC7TrUQy0fLMxESTla21WYmJj466+/CgSCTZs26cDsCZrju+9gyxbIyKi8XSKBLVveZywqCtauhfPnPzjAxQUCAiA6Gq5dY6dSVqklhNbW1gCQq53fCcvLywMDA2Uy2ezZs52dnWmXo1OeP4fAQHj3rvJ2mQwCA+HxYwCAfftg4UIYORJevvzgmNRUWLgQLl9WfGaxWPz777936dJl6dKlqq9bzTCElW3ZsuXmzZt2dnba+M+pM96+hQULanE8IcTU1HT69OmlpaVqK0pdsDn6gVevXjGrT2/YsIEZS4GoGDUK9u6t3ChVwtDQcOzYsWZmZuosSl3wSviBwk2bhFzugAEDvL29adei15YuBQsLmDEDyspol6J+eCWs4MqVtmvXppma/vbLL7RL0XcNG8KKFZCWBitW0C5F/dTSRaGVV8J372DqVCCEFxDQzN6edjW6KTUVAMDHB4TCD7Yz66nfvPnBxm++gfBwWL8e/PzAweHjJy8sLCwuLi4rKyssLLTQ4HVpq8Ir4T9+/BHS0qBt29rdEEC1wbxO26wAAAnGSURBVHS1l5eDVFr5PwD4ZwT1ewIB/PILiMUwY0aNTr5y5cpz585lZmYuWbJE1YWrl1quhGZmZkZGRsXFxaWlpcbGxup4CxVLT4eVK4HDgW3boMIs10i12rYFkQiOHIE2bT7Y/u4dGBmBi0vl4/v1Az8/2LkTDh78+D/L2rVrVVkri9T1FJCWtUgDA6G0FMaOBRw4r2FWrYIGDeCHH0AioV2K2qgrhNrUIj12DKKjwdIScOC85rG1heXLIS0Ntm6lXYraqDeEWnAlfPsWgoIAAJYtg0aNaFeDFJg5E7p3h4sXadehNuptjmrBlTA0FJ4/h06dIDCQdilIMS4XNm+ufNtGl+j3lfDOHVi/HrhcCA/X5X9kjTF/PgiFih/gFgrfD9i1sYGWLSv/a7i6wsyZ0LIlWFoCgA7M2fAhNS0+umLFCgBYtGiRms6vAjIZcXcnAGT6dNqloFrYvJnY25OsLNp1qI4eN0d374a4OLCxARw4rz2kUti5E9LSwNcXPlzzSovpa3M0Lw/mzgUAWLsWcOC89uDz4cwZaN0aLl2CiRN1pF2qrhAKhUIAOH369JQpUyIiIh4zY8U0x6JFkJf3vjMYaRWhEE6cACsrOHgQtO3ZGMXUMO8oAABIpdIDBw5MnDhRvsXc3Lx79+5ubm7u7u69e/c2NDRUx/vWSGIiuLkBjwe3bgEO29VOV66ApyeIxbBtG0yfTrua+qlXCJ8/f86spAkA9+7d43A4jo6O8vkgSkpKrl27duXKlbi4uISEhOLiYvkLrays3Nzcevfu7e7u3qVLFwM2nxQrL4du3eDmTfjuO1i3jr33Rar2xx8weTIIBHD6NHh40K6mHuoYwujo6LVr1yYmJubm5hobGw8fPtzAwIDH4xUXF588eVLhvCzp6ennz5+Pi4u7evVqRoW71Hw+38XFhblC9uvXz1rda17dvQu9eoGFBfz1F2jVs/aoqv/+F9asAQsLiI+Hdu1oV1Nndbup+vTp05KSEmapBrFYfPHiRWa7i4tLWlqa8tfKZLKUlJStW7dOmDCh2YdLHvP5/K5du86ZM+fVyZMkJ6dutX3cs2fkyhV1nRyxSCYj48cTAGJnR16/pl1NXdWrOero6JiUlGT6zxo5hBAHB4eEhIRaXc2ys7OvXbsWHx8fFxd3/fp1iUTC4XByrKyEf/8Ntrbg7g5ubuDuDp07Q20nPsvKgn37IDER8vLAzAw6dQIfH3B0rN1JkGYrK4P+/SEhAbp0gcuXQSvnJKlPgpkrofyPP/744+TJk+tzwuzs7CNHjoR+/z3p0oXw+QTg3/+aNycTJ5Jt28i9e0Qm+/i5du0iJibvPyRdXYmjI+FyCZdLvvuuRi9H2iMnh7RqRQDI6NGkvJx2NbWnshBu2bJl2LBhZWVlqqiKEEKIREKSkkhYGPHxIVZWHwTS1JS4uZHgYBITQ0pKFLz29GnC5ZKmTUlc3L8bHz0inTsTALJsmcqKRJohNZVYWhIAsnAh7VJqr44hzM7OjomJadas2YkTJzIyMvz8/Lp37x4dHR0TE5OljgeKpFKSkkLCw8mkSaRZsw8CyecTV1cSFEQiI0leHiGEyGSkbVvC55M7d6rWTaytiaGhTj31hAghhMTGEgMDAkC2bKFdSi3V8Tvho0ePDh8+zPzs5uYWHx8v3+Xt7e1QkylB6kwmg7t34erV9/9lZv67SyCA8HDo2BE6d4YvvoAzZxS8/IcfYNUqXehdQlX8/jtMnQoCAZw6BZ6etKupOdqfAvWWmUmiokhwMHF1JVwuuXqVhIcTALJiheLjo6MJAJkyhd0qEUvmzycAxMKCiES0S6kx7V/kplEjGDoU1qyBpCR48QK6dwfmqfHqlra2tQUA3VhFGFW1Zg0MGQJ8vmzBgiWa++jyh7Q/hBXZ2oJAAEwDGxdy0Us8HuzbB127fnv27Apvb+93VRe+0Dy6FUIG00v5+rXivcx2Gxv26kHsMjeHnTsXt2jRIi4ubsKECTKNH2qhiyHs3BkA4Pp1xXsTEwEAXF3ZqwexrlGjRqdPn7a0tDx8+PD3339Pu5yPUNcoCppkMrC3hxcv4O7dylM3FxeDkxNkZsLTp++/HCLddfbsWS8vL6lUunXr1oCAANrlVEsXr4RcLoSGglgMI0ZAevq/29+8gTFj4NkzCArCBOqDgQMHbt26FQCCgoJiYmJol1MtXbwSMlasgJAQMDSEXr2gZUt4/Rri4yE/H3x9YdcunGZbf8ydO/enn36ysLCIi4tr3759dYcVFRWdOXMGAAYNGmRubs5igTocQgAQiWDrVoiNfb/WQZcuMGUKDBhAuyzEKplMNmrUqGPHjtnZ2SUmJjZU1Hclk8nc3NxGjBjB4XCOHj0aHx/P5bLYSKTcT4mQ+pWUlHTr1g0AunTpUlxcXPWA2NjYIUOGMD97eXnFxsayWZ4ufidE6EPGxsbHjh1r3rx5UlKSn59f1U6Lx48fOzk5MT87OTmlpaWxWR6GEOkFW1vb06dPN2jQ4NChQ4sXL660l8fjlTOLJAKUl5fz+WpZraw6GEKkL5ydnffv38/n81etWhUeHl5xl5OTk0gkYn4WiUTt2J0qQy9CmJeXt3z58g4dOhQVFQHApUuXhg4dunz5ctp1IbZ98cUXW7ZsAYBvv/32woUL8u3dunUzMjKaNWvWrFmzjIyMunbtymZVehHCGzdudO7cWSKRlJeXi8VikUjk4eGRWXEMFNIb06ZNCwoKkkgkI0eOTElJkW8/fPiwl5eXl5fXoUOH2K6JzbtAdHXo0CE/P5/5+dSpUwEBAXTrQbRIJJIBAwaAxqyVwuoXUIQ0AZ/Pj4yM3LVr18yZM2nXAqCmNesR0nANGjT49ttvaVfxnl58J0RIk+n0Y2v/SEhIkH/sLV269MSJE8nJyQBgbW0dHR1NtTSE9COECGkybI4iRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQIkQZhhAhyjCECFGGIUSIMgwhQpRhCBGiDEOIEGUYQoQowxAiRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQIkQZhhAhyjCECFGGIUSIMgwhQpRhCBGiDEOIEGUYQoQowxAiRBmGECHKMIQIUYYhRIgyDCFClGEIEaIMQ4gQZRhChCjDECJEGYYQIcowhAhRhiFEiDIMIUKUYQgRogxDiBBlGEKEKMMQIkQZhhAhyjCECFGGIUSIMgwhQpRhCBGi7P8BPyjDL0rGLS0AAAEwelRYdHJka2l0UEtMIHJka2l0IDIwMjEuMDMuNQAAeJx7v2/tPQYg4AFiRgYI4ANifiBuYGRjSACJM7ODaSYonxkuDqGZmNgcNEDiLGwOGSAaqADBgMmgqeBgAAswwYyA8bmBzmBkYmBiZmBiYWBhZWBly2BiY09g58hg4uBM4OTKYOLkVuDmYeDhZeBiTeDlUBBhYmPl4uRgZ2Pl5uTg5RGH+YKBb5dbrMNLbtEDIM75y3IO31xm7AexxfbbOfwqSbMDsZf09TlslGOwB7F1fBgcqheuA7Ofb2e3P5TQDWZ3qD/Zd34mnwOI7Xbs6/6nn36CxbPOSR2Yv48TzP63bcd+yeeSYPM3rGGw9Tm9bB+IrZrOdyD8zAewuOSJ7AOnbI6A2a1FxQeOmkvvBbsHAEm2Rtlq4feIAAABIXpUWHRNT0wgcmRraXQgMjAyMS4wMy41AAB4nJ1SS07FMAzc9xS+wIv8S2qvKSvEQ2LBHdhzf+EkNCoSSOAoimaSeDrjdIM+Xo+n9w9Yg49tA1KgCoA/TneHN0bErd+XonXXADcuwuZ9D0ucIjzAbxLXOVS4qNHeVaiQebuo3P+j4oY6a5kr5bxoJNKvHBWxZhMh6shBRZpKTiUcCNJUQUt294ZFtbbpitlyfYmXcT69eOU9pxIvTWZntrwXVaGZrSFxtrsopFNFuFo2UfxrOF2ZffPy8ncVKc2UpkoVSfZFiqH4TISye8ZLfJrH2kkgWSSQXkldJFBbJNA+bnYSyBYJ5Is4EJ4kENGVLAeBaDkIRMtBL2irRof0OHkGeLwf2yed+cAv+zNrSQAAAQ56VFh0U01JTEVTIHJka2l0IDIwMjEuMDMuNQAAeJxNjr1uwzAMhF+lYwLIAn8lih69J32FwuhYuCgy5uFDNoM8id+Jd7zttl2267bjvu903+477fjxvHAV7SJlocpkpmWlKobNyoIVbbQUhoFggUqkoKFImGSkogASu7GTgxas3ITLGj8MmAxmNsq6QBXRhlIoYizeNQ4MAsRYGko9XFFCMEuk3SDy1oWrAVsrEQDcbby1ZgIjKypzJKTzv1xKZkjveBEGS2eLM0ZZCxgFpafIpL2Xa/l6HD+ff8evQ83xdjy+KzhOQKcJ5DyBXSaI6wT1NqF5n9DdJpiP0x12PHVAcjy3QMdTj+F4KoLwfAEG4HpHEGWvEAAAAABJRU5ErkJggg==\n", | |
"text/plain": [ | |
"<rdkit.Chem.rdchem.Mol at 0x7f0bdd97f080>" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 3 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "fBGPOIqSBiGt" | |
}, | |
"source": [ | |
"### **Converting SMILES String to SELFIES v2. and InChI**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "SYd7RO4UBygy", | |
"outputId": "2e0de0f5-fa93-48cd-f8a0-f92c112b85bf" | |
}, | |
"source": [ | |
"SMILES = \"CNC(C)CC1=CC=C2C(=C1)OCO2\"\n", | |
"SELFIES = sf.encoder(SMILES) # SMILES --> SEFLIES v2.\n", | |
"print(f\"Generated SELFIES: {SELFIES}\")\n", | |
"\n", | |
"InChI = Chem.MolToInchi(mol) # SMILES --> InChI\n", | |
"print(f\"Generated Inchi: {InChI}\")\n" | |
], | |
"execution_count": 7, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Generated SELFIES: [C][N][C][Branch1][C][C][C][C][=C][C][=C][C][=Branch1][Ring2][=C][Ring1][=Branch1][O][C][O][Ring1][=Branch1]\n", | |
"Generated Inchi: InChI=1S/C11H15NO2/c1-8(12-2)5-9-3-4-10-11(6-9)14-7-13-10/h3-4,6,8,12H,5,7H2,1-2H3\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "wb9JG0guCugN" | |
}, | |
"source": [ | |
"### **Converting SMILES String to DeepSMILES**" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "xI8vJ-rDCzyp", | |
"outputId": "b9265321-c0fc-47d0-e61c-a0012178ebe6" | |
}, | |
"source": [ | |
"converter = deepsmiles.Converter(rings=True, branches=True)\n", | |
"DeepSMILES = converter.encode(\"CNC(C)CC1=CC=C2C(=C1)OCO2\")\n", | |
"print(f\"Generated DeepSMILES: {DeepSMILES}\")\n" | |
], | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"Generated DeepSMILES: CNCC)CC=CC=CC=C6)OCO5\n" | |
] | |
} | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment