Simple sequences are rare in the Protein Data Bank

Abstract

A simple sequence is abundant in the proteins that have been sequenced to date. But unusual protein features, such as a simple sequence, are not present in the same high frequency within structural databases. A subset of these simple sequences, a group with a highly repetitive nature has been shown to be abundant in eukaryotes but not in prokaryotes. In this study, an examination of the eukaryotic proteins in the Protein Data Bank (PDB) has revealed a large deficiency of low complexity, highly repetitive protein repeats. Through simulated databases of similar samples of eukaryotic proteins taken from the National Center for Biotechnology Information (NCBI) database, it is shown that the PDB contains a significantly less highly repetitive, simple sequence than artificial databases of similar composition randomly derived from NCBI. When the structural data for those few PDB sequences that did contain a highly repetitive simple sequence is examined in detail, it is found that in most cases the tertiary structure is unknown for the regions consisting of a simple sequence. This lack of a simple sequence both in the PDB database and in the structural information suggests that this type of simple sequence may produce disordered structures that make structural characterization difficult.

Authors

Huntley MA; Golding GB

Journal

Proteins Structure Function and Bioinformatics, Vol. 48, No. 1, pp. 134–140

Publisher

Wiley

Publication Date

July 1, 2002

DOI

10.1002/prot.10150

ISSN

0887-3585

Associated Experts

Brian Golding

Professor, Faculty of Science

Visit profile

Labels