Advances in Mathematical Physics
Volume 2013 (2013), Article ID 917153, 10 pages
http://dx.doi.org/10.1155/2013/917153
Research Article

Can Power Laws Help Us Understand Gene and Proteome Information?

1Institute of Engineering, Polytechnic of Porto, Department of Electrical Engineering, Rua Dr. António Bernardino de Almeida 431, 4200-072 Porto, Portugal
2National Health Institute, Biochemical Genetics Unit, Medical Genetics Center “Jacinto de Magalhães”, Praça Pedro Nunes 88, 4099-028 Porto, Portugal

Received 11 February 2013; Accepted 27 February 2013

Academic Editor: Dumitru Baleanu

Copyright © 2013 J. A. Tenreiro Machado et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Proteins are biochemical entities consisting of one or more blocks typically folded in a 3D pattern. Each block (a polypeptide) is a single linear sequence of amino acids that are biochemically bonded together. The amino acid sequence in a protein is defined by the sequence of a gene or several genes encoded in the DNA-based genetic code. This genetic code typically uses twenty amino acids, but in certain organisms the genetic code can also include two other amino acids. After linking the amino acids during protein synthesis, each amino acid becomes a residue in a protein, which is then chemically modified, ultimately changing and defining the protein function. In this study, the authors analyze the amino acid sequence using alignment-free methods, aiming to identify structural patterns in sets of proteins and in the proteome, without any other previous assumptions. The paper starts by analyzing amino acid sequence data by means of histograms using fixed length amino acid words (tuples). After creating the initial relative frequency histograms, they are transformed and processed in order to generate quantitative results for information extraction and graphical visualization. Selected samples from two reference datasets are used, and results reveal that the proposed method is able to generate relevant outputs in accordance with current scientific knowledge in domains like protein sequence/proteome analysis.