Protein lipograms

Abstract

Linguistic analysis of protein sequences is an underexploited technique. Here, we capitalize on the concept of the lipogram to characterize sequences at the proteome levels. A lipogram is a literary composition which omits one or more letters. A protein lipogram likewise omits one or more types of amino acid. In this article, we establish a usable terminology for the decomposition of a sequence collection in terms of the lipogram. Next, we characterize Uniref50 using a lipogram decomposition. At the global level, protein lipograms exhibit power-law properties. A clear correlation with metabolic cost is seen. Finally, we use the lipogram construction to assign proteomes to the four branches of the tree-of-life: archaea, bacteria, eukaryotes and viruses. We conclude from this pilot study that the lipogram demonstrates considerable potential as an additional tool for sequence analysis and proteome classification.

Publication DOI: https://doi.org/10.1016/j.jtbi.2017.07.009
Divisions: College of Engineering & Physical Sciences
College of Engineering & Physical Sciences > Systems analytics research institute (SARI)
College of Engineering & Physical Sciences > School of Informatics and Digital Engineering > Mathematics
College of Health & Life Sciences > Aston Pharmacy School
College of Health & Life Sciences
College of Health & Life Sciences > Chronic and Communicable Conditions
Additional Information: © 2017, Elsevier. Licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
Uncontrolled Keywords: Amino Acid Sequence,Archaea,Bacteria,Eukaryota,Evolution, Molecular,Pilot Projects,Proteins/chemistry,Proteome/classification,Viruses,Statistics and Probability,Medicine(all),Modelling and Simulation,Immunology and Microbiology(all),Biochemistry, Genetics and Molecular Biology(all),Agricultural and Biological Sciences(all),Applied Mathematics
Publication ISSN: 1095-8541
Full Text Link:
Related URLs: http://www.scop ... tnerID=8YFLogxK (Scopus URL)
PURE Output Type: Article
Published Date: 2017-10-07
Published Online Date: 2017-07-15
Accepted Date: 2017-07-12
Submitted Date: 2017-03-10
Authors: Laurie, Jason (ORCID Profile 0000-0002-3621-6052)
Chattopadhyay, Amit K (ORCID Profile 0000-0001-5499-6008)
Flower, Darren R (ORCID Profile 0000-0002-8542-7067)

Export / Share Citation


Statistics

Additional statistics for this record