Home
Scholarly Works
Selector: A General Python Library for Diverse...
Journal article

Selector: A General Python Library for Diverse Subset Selection

Abstract

Selector is a free, open-source Python library for selecting diverse subsets from any dataset, making it a versatile tool across a wide range of application domains. Selector implements different subset sampling algorithms based on sample distance, similarity, and spatial partitioning, along with metrics to quantify subset diversity. It is flexible and integrates seamlessly with popular Python libraries like Scikit-Learn, demonstrating the interoperability of the implemented algorithms with data analysis workflows. Selector is an operating-system agnostic, accessible, and easily extensible package designed with modern software development practices, including version control, unit testing, and continuous integration. Interactive quick-start notebooks, which are also web-accessible, provide user-friendly tutorials for all skill levels, showcasing applications in computational chemistry, drug discovery, and chemical library design. Additionally, a web interface has been developed that allows users to easily upload datasets, configure sampling settings, and run subset selection algorithms, with no programming required. This paper serves as the official release note for the Selector package, offering a technical overview of its features, use cases, and development practices that ensure its quality and maintainability.

Authors

Meng F; González MM; Chuiko V; Tehrani A; Al Nabulsi AR; Broscius A; Khaleel H; López-Pérez K; Miranda-Quintana RA; Ayers PW

Journal

, , ,

Publisher

Cold Spring Harbor Laboratory

Publication Date

November 22, 2025

DOI

10.1101/2025.11.21.689756

ISSN

2692-8205
View published work (Non-McMaster Users)

Contact the Experts team