Home
Scholarly Works
A Machine Learning Framework Identifies...
Journal article

A Machine Learning Framework Identifies Plastid-Encoded Proteins Harboring C3 and C4 Distinguishing Sequence Information

Abstract

C4 photosynthesis is known to have at least 61 independent origins across plant lineages making it one of the most notable examples of convergent evolution. Of the >60 independent origins, a predicted 22-24 origins, encompassing greater than 50% of all known C4 species, exist within the Panicoideae, Arundinoideae, Chloridoideae, Micrairoideae, Aristidoideae, and Danthonioideae (PACMAD) clade of the Poaceae family. This clade is therefore primed with species ideal for the study of genomic changes associated with the acquisition of the C4 photosynthetic trait. In this study, we take advantage of the growing availability of sequenced plastid genomes and employ a machine learning (ML) approach to screen for plastid genes harboring C3 and C4 distinguishing information in PACMAD species. We demonstrate that certain plastid-encoded protein sequences possess distinguishing and informative sequence information that allows them to train accurate ML C3/C4 classification models. Our RbcL-trained model, for example, informs a C3/C4 classifier with greater than 99% accuracy. Accurate prediction of photosynthetic type from individual sequences suggests biologically relevant, and potentially differing roles of these sequence products in C3 versus C4 metabolism. With this ML framework, we have identified several key sequences and sites that are most predictive of C3/C4 status, including RbcL, subunits of the NAD(P)H dehydrogenase complex, and specific residues within, further highlighting their potential significance in the evolution and/or maintenance of C4 photosynthetic machinery. This general approach can be applied to uncover intricate associations between other similar genotype-phenotype relationships.

Authors

Yogadasan N; Doxey AC; Chuong SDX

Journal

Genome Biology and Evolution, Vol. 15, No. 7,

Publisher

Oxford University Press (OUP)

Publication Date

July 3, 2023

DOI

10.1093/gbe/evad129

ISSN

1759-6653

Contact the Experts team