dehaenw/pxr_report.txt

Last active May 5, 2026 08:43

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/dehaenw/f63f9c9bd42eacbddede706e90d8d8e9.js"></script>
Save dehaenw/f63f9c9bd42eacbddede706e90d8d8e9 to your computer and use it in GitHub Desktop.

Download ZIP

PXR report

Raw

pxr_report.txt

TBD

dehaenw commented May 5, 2026

Author

This is the latest activity submission by the IMG-UCL-UCT joint team. Names of participants will be added here later.

The method is a TabICL ensemble, it uses several TabICL models built on the following descriptors and combinations of them:

MOE descriptors
MORDRED descriptors
RDKit descriptors
Count ECFP2
Count ECFP4
Count RDKitFP
CheMeleonFP
LGBM predicted affinities on unrelated large pubchem HTS assays
UniMol embeddings
pmapper features
RDKit 3D descriptors

remarks:

unimol works badly. HTS data works very badly. chemeleonFP is worse than mordred. FPs work only with TabICL when dimension reduction is applied, using counts is necessary. rdkit 3d descs do not work well, best desc set is MOE, MORDRED and RDKit descs good too.
training set was filtered using info from single dose and counterscreen.
isotonic calibration.
ensembling via ridge.
butina splits, 5xCV
internal MAE too optimistic: 0.4609
internal MAE per isolated feature type
moe : 0.4847
rdkit : 0.4984
ecfp4 : 0.5024
mordred : 0.5039
ecfp2 : 0.5048
chemeleon : 0.5118
rdkitfp : 0.5378
rdkit3d : 0.5638
unimol : 0.6198
pmapper : 0.6470
hts : 0.6940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment