Skip to content

Instantly share code, notes, and snippets.

@baoilleach
baoilleach / time_tokenizers.py
Created December 28, 2023 19:12
Code to tokenize a SMILES string
import re
import time
import itertools
import doctest
ITERATIONS = 1000000
# From IBM Research's Rxn4Chemistry:
# https://github.com/rxn4chemistry/rxn-chemutils/blob/main/src/rxn/chemutils/tokenization.py
SMILES_TOKENIZER_PATTERN = r"(\%\([0-9]{3}\)|\[[^\]]+]|Br?|Cl?|N|O|S|P|F|I|b|c|n|o|s|p|\||\(|\)|\.|=|#|-|\+|\\|\/|:|~|@|\?|>>?|\*|\$|\%[0-9]{2}|[0-9])"
I will have attended 9 out of 13 (if we include the remotes).
13th - 2024 Zurich - Sereina Riniker (ETH)
12th - 2023 Mainz - Paul Czodrowski (Uni Mainz) - Talk on SmiZip
11th - 2022 Berlin - Bayer ML - Shared talk with Jan Jensen on Gabby
10th - 2021 Remote
9th - 2020 Remote - Flash presentation on "An efficient algorithm to find matched pairs of a peptide"
8th - 2019 Hamburg - Emanuel Ehmki (Uni Hamburg) **didn't attend**
7th - 2018 Cambridge - Andreas Bender (Uni Cambridge) - SMILES benchmark and prepared flash presentation on DeepSMILES
6th - 2017 Berlin - Andrea Volkamer (Charite Berlin) and Gerhard Wolber (FU Berlin) **didn't attend**
@baoilleach
baoilleach / 2023_Sheffield_Conference.txt
Created June 27, 2023 19:17
Notes from Ninth Joint Sheffield Conference on Chemoinformatics
I have no Twitter notes from the first day. Here are my notes from Days 2 and 3...
#shef2023 Adele Hardie (Uni Edinburgh) on an sMD/MSM approach for rational design of allosteric modulators.
Have come up with a workflow to predict allostery. Examples from two protein systems.
Orthosteric inhibition is where you stick a molecule into the active site blocking it. Allosteric inhibition is whether the molecule interacts somewhere else and affects protein activity. How can we predict this? Using MD.
Diff methods have diff cost. We use classical mechanics to compute the energies of the system, bonds, angles, torsion angles. The constants come from sets of precomputed params called forcefields. We can look at systems as big as protein-ligand, and ns timescales.
We can do Markov State Modelling (MSM), where we model probs of states (conformations). If the probabilities of the active vs inactive state change in the presence of a ligand then it's a modulator. Difficulty is that this is millsec to sec timescale - t
@baoilleach
baoilleach / ICCS_2022_Conference_Notes.txt
Created July 3, 2022 20:14
Notes from International Conference on Chemical Structures 2022
Monday morning - Analysis of Large Chemical Datasets
--------------------------------------------
https://twitter.com/ConferenceNoel/status/1536235381313753090
I missed the first tweet as I was setting up this Twitter a/c but it should have been:
#2022iccs Maximilian Beckers (Novartis) on 25 years of small molecule optimization at Novartis: A retrospective analysis of chemical series evolution
#2022iccs A chemical series is a subjective concept. Kruger JCIM 2020 published automated id of chemical series.
#2022iccs Specificity of a scaffold is the probability of a random match of a scaffold. More meaningful scaffolds have fewer random matches per scaffold.
#2022iccs The dataset includes a whole bunch of different properties from their Novartis in-house dataset. Filtering removes bifunctional degrader and others (e.g. >5 amide bonds). 310K cmpds in the end.
#2022iccs Ran the scaffold analysis of the dataset. 72% of the compounds were assigned to a scaffold. Median is 60 cmpds assigned to a scaffold; typical on
@baoilleach
baoilleach / _gvimrc
Created September 30, 2019 19:56
My GVIM RC
set encoding=utf-8 " The encoding displayed.
set fileencoding=utf-8 " The encoding written to file.
autocmd FileType python set omnifunc=pythoncomplete#Complete
" If you prefer the Omni-Completion tip window to close when a selection is
" made, these lines close it on movement in insert mode or when leaving
" insert mode
autocmd CursorMovedI * if pumvisible() == 0|pclose|endif
autocmd InsertLeave * if pumvisible() == 0|pclose|endif
" When you hit Enter the chosen omnicompletion is inserted
@baoilleach
baoilleach / updatesnap.sh
Created August 20, 2019 13:02
Update Open Babel snap
# Source me
rm -rf baoilleach-repo
git clone https://github.com/baoilleach/openbabel.git baoilleach-repo
cd baoilleach-repo
# git remote add geoffh https://github.com/openbabel/openbabel.git
git checkout snaps
git merge master
git push origin snaps
@baoilleach
baoilleach / peekable.py
Created June 27, 2019 13:21
A peekable Python iterator
class Peekable:
def __init__(self, it):
self.it = it
self.finished = False
self.curr = None
self.nextval = next(it)
def __iter__(self):
return self
def __next__(self):
if self.finished:
@baoilleach
baoilleach / gcc2018mainz.txt
Last active November 14, 2018 10:09
#gcc2018mainz tweets from German Conference on Chemoinformatics (Mainz 2018)
AkiraAiren Had a great time at the #gcc2018mainz! Thank all of you with who I had great talks. Still excited that I was awarded one of the two poster prices ☺! See you next year!
--> marwinsegler @AkiraAiren Well deserved!
--> Joao_F_Borges @AkiraAiren Congrats, @AkiraAiren !
--> BZdrazil @AkiraAiren Congratulations, well deserved;)
baoilleach #gcc2018mainz tweets in toto
https://t.co/THVgi8DP4T
--> RSC_CICAG @baoilleach Oddly, our tweets seem to have fallen off the list.
--> baoilleach @RSC_CICAG Yeah, I saw that. Don't know what's going on there. Maybe their fake news detector was triggered by some of the speakers' claims. I'll rerun today....
RSC_CICAG Next conference is already planned for 2019 #gcc2018mainz #gcc2019mainz https://t.co/KwdOxjf7tl
@baoilleach
baoilleach / 7thRDKitUGM.txt
Created September 25, 2018 13:07
Twitter #RDKitUGM2018 plus replies
CzodrowskiPaul Very comprehensive summary of the #RDKitUGM2018! Kudos to Pat! https://t.co/dhwZhjFu0u
dr_greg_landrum After a really good #RDkitUGM2018 it was great to get out into the mountains today and concentrate on moving instead of chemInformatics. ;-) https://t.co/QoDAp3g3Qp
CzodrowskiPaul @wpwalters @dr_greg_landrum @AndreasBenderUK And kudos to all of you who came over from North, South America and even Japan! #RDKitUGM2018
dr_greg_landrum @AndreasBenderUK Thanks for hosting Andreas. Cambridge was a great place to have the meeting and the social events were exemplary! #RDKitUGM2018
--> wpwalters @dr_greg_landrum @AndreasBenderUK Loads of fun and lots of great science. Thanks to the organizers and all who participated.
--> CzodrowskiPaul @wpwalters @dr_greg_landrum @AndreasBenderUK And kudos to all of you who came over from North, South America and even Japan! #RDKitUGM2018
@baoilleach
baoilleach / aromaticity.rst
Last active July 16, 2018 20:45
Description of aromaticity in Open Babel...to be added to the docs

Handling of aromaticity

The purpose of this section is to give an overview of how Open Babel handles aromaticity. Given that atoms can be aromatic, bonds can be aromatic, and that molecules have a flag for aromaticity perceived, it's important to understand how these all work together.

How is aromaticity information stored?

Like many other toolkits, Open Babel stores aromaticity information separate from bond order information. This means that there isn't a special bond order to indicate aromatic bond. Instead, aromaticity is stored as a flag on an atom as well as a flag on a bond. You can access and set this information using the following API functions: