Skip to content

Instantly share code, notes, and snippets.

@dehaenw
Last active May 5, 2026 08:43
Show Gist options
  • Select an option

  • Save dehaenw/f63f9c9bd42eacbddede706e90d8d8e9 to your computer and use it in GitHub Desktop.

Select an option

Save dehaenw/f63f9c9bd42eacbddede706e90d8d8e9 to your computer and use it in GitHub Desktop.
PXR report
@dehaenw

dehaenw commented May 5, 2026

Copy link
Copy Markdown
Author

This is the latest activity submission by the IMG-UCL-UCT joint team. Names of participants will be added here later.

The method is a TabICL ensemble, it uses several TabICL models built on the following descriptors and combinations of them:

  • MOE descriptors
  • MORDRED descriptors
  • RDKit descriptors
  • Count ECFP2
  • Count ECFP4
  • Count RDKitFP
  • CheMeleonFP
  • LGBM predicted affinities on unrelated large pubchem HTS assays
  • UniMol embeddings
  • pmapper features
  • RDKit 3D descriptors

remarks:

  • unimol works badly. HTS data works very badly. chemeleonFP is worse than mordred. FPs work only with TabICL when dimension reduction is applied, using counts is necessary. rdkit 3d descs do not work well, best desc set is MOE, MORDRED and RDKit descs good too.

  • training set was filtered using info from single dose and counterscreen.

  • isotonic calibration.

  • ensembling via ridge.

  • butina splits, 5xCV

  • internal MAE too optimistic: 0.4609

  • internal MAE per isolated feature type
    moe : 0.4847
    rdkit : 0.4984
    ecfp4 : 0.5024
    mordred : 0.5039
    ecfp2 : 0.5048
    chemeleon : 0.5118
    rdkitfp : 0.5378
    rdkit3d : 0.5638
    unimol : 0.6198
    pmapper : 0.6470
    hts : 0.6940

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment