Home
Scholarly Works
B3clf: A Resampling-Integrated Machine Learning...
Preprint

B3clf: A Resampling-Integrated Machine Learning Framework to Predict Blood-Brain Barrier Permeability

Abstract

Developing accurate, computationally efficient, and reliable predictive models for small molecules' blood-brain barrier (BBB) permeability is challenging due to the class imbalance often found in collections of reference data. We use resampling techniques to address class imbalance and build 24 types of machine learning models, which we developed using comprehensive hyperparameter optimizations. We evaluated our model against those from previous studies, which provides insight into optimal classification models and resampling techniques that are relevant beyond BBB permeability. In addition to classifying unknown compounds on the basis of BBB permeability, the predicted probabilities are provided to facilitate further improvements and comparative benchmarking, and to report the models' confidence in their predictions. To disseminate our findings, we developed B3clf, a highly efficient, user-friendly tool that facilitates BBB permeability prediction, which can be accessed as open-source software https://github.com/theochem/B3clf or as a web app https://huggingface.co/spaces/QCDevs/b3clf. The newly curated external dataset for BBB is hosted at https://github.com/theochem/B3DB.

Authors

Meng F; Chen J; Collins-Ramirez JS; Ayers PW

Publication date

August 20, 2025

DOI

10.26434/chemrxiv-2025-xschc

Preprint server

ChemRxiv
View published work (Non-McMaster Users)

Contact the Experts team