A critical step in comparative genomics is the identification of differences in the presence/absence of encoded biochemical pathways among organisms. Our library, Pygenprop, facilitates these comparisons using data from the Genome Properties database. Pygenprop is written in Python and, unlike existing libraries, it is compatible with a variety of tools in the Python data science ecosystem, such as Jupyter Notebooks for interactive analyses and scikit-learn for machine learning. Pygenprop assigns YES, NO, or PARTIAL support for each property based on InterProScan annotations of open reading frames from an organism’s genome. The library contains classes for representing the Genome Properties database as a whole and methods for detecting differences in property assignments between organisms. As the Genome Properties database grows, we anticipate widespread adoption of Pygenprop for routine genome analyses and integration within third-party bioinformatics software.
Availability and implementation
Pygenprop is written in Python and is compatible with versions 3.6 or higher. Source code is available under Apache Licence Version 2 at https://github.com/Micromeda/pygenprop. The package can be installed from both PyPi (https://pypi.org/project/pygenprop) and Anaconda (https://anaconda.org/lbergstrand/pygenprop). Documentation is available on Read the Docs (http://pygenprop.rtfd.io/).