<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JBiSE</journal-id><journal-title-group><journal-title>Journal of Biomedical Science and Engineering</journal-title></journal-title-group><issn pub-type="epub">1937-6871</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jbise.2016.96025</article-id><article-id pub-id-type="publisher-id">JBiSE-67017</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biomedical&amp;Life Sciences</subject></subj-group></article-categories><title-group><article-title>
 
 
  Hidden Sequence Repeats: Additional Evidence for the Origin of TIM-Barrel Family
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Xiaofeng</surname><given-names>Ji</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yuan</surname><given-names>Zheng</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zhipeng</surname><given-names>Wang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Jun</surname><given-names>Sheng</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, China</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>jixf@ysfri.ac.cn(JS)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>19</day><month>05</month><year>2016</year></pub-date><volume>09</volume><issue>06</issue><fpage>307</fpage><lpage>314</lpage><history><date date-type="received"><day>19</day>	<month>April</month>	<year>2016</year></date><date date-type="rev-recd"><day>accepted</day>	<month>28</month>	<year>May</year>	</date><date date-type="accepted"><day>31</day>	<month>May</month>	<year>2016</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Most proteins adopt an approximate structural symmetry. However, they have no symmetry detectable in their sequences and it is unclear for most of these proteins whether their structural symmetry originates from duplication. As one of the six popular folds (super-folds) possessing an approximate structural symmetry, the triosephosphate isomerase barrel (TIM-barrel) domain has been widely studied. Using modified recurrent quantification analysis of primary sequences, we identified the same 2-, 3-, and 4-fold symmetry pattern as their tertiary structures. This result indicates that the symmetry in tertiary structure is coded by symmetry in the primary sequence and that the TIM-barrel adopts a 2-, 3-, or 4-fold repeat pattern during evolution. This discovery will be useful for understanding the evolutionary mechanisms of this protein family and the symmetry pattern that may be a clue into the ancient origin of duplication of half-barrels or the 
  <em>β</em> a unit.
 
</p></abstract><kwd-group><kwd>TIM-Barrel</kwd><kwd> Hidden Symmetry</kwd><kwd> Primary Sequences</kwd><kwd> Repeat Pattern</kwd><kwd> Recurrence Quantification Analysis</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Proteins are amino acid polymers that can adopt a wide range of structures uniquely determined by sequence. It is well-known that the information regarding structure formation is contained within their amino acid sequences [<xref ref-type="bibr" rid="scirp.67017-ref1">1</xref>] . Nevertheless, many proteins exhibit obvious symmetry at the level of tertiary structures and yet seldom show periodicity in their primary sequences [<xref ref-type="bibr" rid="scirp.67017-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.67017-ref3">3</xref>] . A detailed analysis of the repeats in protein sequences may help us to better understand the evolutionary mechanisms proteins used to adapt their structure and function under evolutionary pressure.</p><p>The eight-stranded β/α barrel (triosephosphate isomerase [TIM] barrel) is by far the most common tertiary fold observed in high-resolution protein crystal structures and it mediates diverse function maintaining overall structure. It is estimated that 10% of all known enzymes have this fold [<xref ref-type="bibr" rid="scirp.67017-ref4">4</xref>] . By itself, the TIM-barrel fold has typically approximately 250 residues, with a minimum of approximately 200 residues required to form its structure; branched hydrophobic side chains dominate the core of β/α barrels [<xref ref-type="bibr" rid="scirp.67017-ref5">5</xref>] . The closed parallel β-domain structure of the (β/α)<sub>8</sub>-barrel is formed from eight parallel (β/α)-units linked by hydrogen bonds (<xref ref-type="fig" rid="fig1">Figure 1</xref>). Based on structural [<xref ref-type="bibr" rid="scirp.67017-ref6">6</xref>] and sequence [<xref ref-type="bibr" rid="scirp.67017-ref7">7</xref>] analysis of HisA and HisF, the (β/α)<sub>8</sub> barrel domain of both of these enzymes appears to be the result of a gene duplication and fusion. Richter and colleagues suggested a two-step evolutionary pathway in which a HisF-N1-like predecessor was duplicated and fused twice to yield HisF [<xref ref-type="bibr" rid="scirp.67017-ref8">8</xref>] . Despite many experimental studies showing that the (β/α)<sub>8</sub>-barrel may evolve from an ancestral half or quarter-barrel [<xref ref-type="bibr" rid="scirp.67017-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.67017-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.67017-ref10">10</xref>] and structures of this family are approximately symmetrical, evidence for an origin of this common ancestor by 4-fold duplication is lacking.</p><p>Internal repeats in protein sequences have wide-ranging implications for the structure and function of proteins. The ability to detect repeated structures based only on sequence analysis would support the evolutionary hypotheses that a large fraction of modern-day enzymes evolved from a basic structural unit. In order to detect latent symmetries in protein sequences, some effort has been made. Different methods [<xref ref-type="bibr" rid="scirp.67017-ref11">11</xref>] - [<xref ref-type="bibr" rid="scirp.67017-ref19">19</xref>] have been proposed to detect periods in the sequences of beta-trefoil [<xref ref-type="bibr" rid="scirp.67017-ref20">20</xref>] , beta-barrel [<xref ref-type="bibr" rid="scirp.67017-ref21">21</xref>] , beta-propeller [<xref ref-type="bibr" rid="scirp.67017-ref22">22</xref>] [<xref ref-type="bibr" rid="scirp.67017-ref23">23</xref>] , Ig fold [<xref ref-type="bibr" rid="scirp.67017-ref24">24</xref>] [<xref ref-type="bibr" rid="scirp.67017-ref25">25</xref>] , and left-handed beta-helix fold [<xref ref-type="bibr" rid="scirp.67017-ref26">26</xref>] , among others. Notably, there are popular web tools available that detect repeats: RADAR [<xref ref-type="bibr" rid="scirp.67017-ref18">18</xref>] , TRUST [<xref ref-type="bibr" rid="scirp.67017-ref17">17</xref>] , HHrep [<xref ref-type="bibr" rid="scirp.67017-ref16">16</xref>] , REPETITA [<xref ref-type="bibr" rid="scirp.67017-ref14">14</xref>] , and FAIR [<xref ref-type="bibr" rid="scirp.67017-ref13">13</xref>] . These tools identify repeats in protein and DNA sequences based on suboptimal self-sequence alignment. These tools are useful for general repeats detection, but are less useful for symmetric sequence repeats. In our previous paper [<xref ref-type="bibr" rid="scirp.67017-ref27">27</xref>] , a modified recurrence plot was used to detect latent periodicities in proteins with an Ig fold. At that time, the amino acids were denoted by their corresponding Grantham polarity values [<xref ref-type="bibr" rid="scirp.67017-ref28">28</xref>] and Pearson’s correlation coefficients were used to characterize similarity. If the two segments showed a higher correlation, they were considered to be more similar. In order to understand the evolution of the (β/α)<sub>8</sub>-barrel family, here we propose a fast and sensitive modified quantification analysis method to detect the hidden symmetries in the primary sequence of non- homologous sequences with CATH [<xref ref-type="bibr" rid="scirp.67017-ref29">29</xref>] Code 3.20.20. In this study, hydrophilic and hydrophobic features were used to denote the corresponding amino acids. Additionally, the percentages of their identical symbols were used to characterize similarity. Our result showed that nearly all numbers of this family were 2-, 3-, and 4-fold symmetric. This result may increase the understanding of the evolutionary mechanisms of (β/α)<sub>8</sub>-barrel family.</p></sec><sec id="s2"><title>2. Methods</title><p>The method of modified recurrence plot, which was guided by the idea of recurrence quantification analysis [<xref ref-type="bibr" rid="scirp.67017-ref30">30</xref>] was used to identify internal repeats in the TIM-barrel family. The flow chart of this method is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>Consider an arbitrary sequence<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-9102302x7.png" xlink:type="simple"/></inline-formula>, where N is the length of the sequence and x<sub>i</sub> denotes one of the 20 amino acids. First, the complexity of the protein sequence should be reduced. From the Introduction, we can easily find that the (β/α)<sub>8</sub>-barrel is mainly characterized by α-helix and β-strand, and their structural features are mainly determined based on their hydrophilic and hydrophobic regions. Hence, we reduce protein sequence complexity by grouping the 20 amino acids into four groups based on their individual hydrophobicity according to the ranges of the hydropathy scale (<xref ref-type="table" rid="table1">Table 1</xref>) [<xref ref-type="bibr" rid="scirp.67017-ref31">31</xref>] . After this step, a vector representation of the protein sequence, as<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-9102302x8.png" xlink:type="simple"/></inline-formula>, is achieved. Next, sets of possible segments, as described in our previous paper</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> The topological structure diagram of the eight-stranded β/α barrel</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-9102302x9.png"/></fig><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> The flow chart of the method</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-9102302x10.png"/></fig><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Hydropathy characteristics</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Hydropathy characteristic</th><th align="center" valign="middle" >Abbreviation</th><th align="center" valign="middle" >Amino acids</th></tr></thead><tr><td align="center" valign="middle" >Strongly hydrophilic (polar)</td><td align="center" valign="middle" >POL</td><td align="center" valign="middle" >R, D, E, N, Q, K, H</td></tr><tr><td align="center" valign="middle" >Strongly hydrophobic</td><td align="center" valign="middle" >HPO</td><td align="center" valign="middle" >L, I, V, A, M, F</td></tr><tr><td align="center" valign="middle" >Weakly hydrophilic or weakly hydrophobic (ambiguous)</td><td align="center" valign="middle" >AMBI</td><td align="center" valign="middle" >S, T, Y, W</td></tr><tr><td align="center" valign="middle" >Undefined</td><td align="center" valign="middle" >UND</td><td align="center" valign="middle" >C, G, P</td></tr></tbody></table></table-wrap><p>[<xref ref-type="bibr" rid="scirp.67017-ref27">27</xref>] , were constructed. For any segment <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-9102302x11.png" xlink:type="simple"/></inline-formula> (1 ≤ i ≤ N − d + 1), if we can identify another segment <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/4-9102302x12.png" xlink:type="simple"/></inline-formula> (j ≠ i) of the same length in the sequence S and at the same time the two segment are similar, we plot a point at (i, d) and (j, d) in the modified recurrence plot. Two segments are similar if the percentage (s) of their identical symbols is larger than a chosen number r (0 &lt; r &lt; 1) and when P-value is lower than 0.01. When this was completed for all the possible i and d, the modified recurrence plot was formed. We decreased the value of r gradually to detect symmetries in primary sequences.</p><p>In order to assess the performance of our method for repeat detection, our results were compared with those obtained using the web tools discussed in the Introduction section. Among these tools, HHrep and REPETITA are based on existing knowledge and they use information from sequence profiles. Moreover, FAIR can only identify short segments. Hence, only the de novorepeat detection methods REPRO, RADAR, and TRUST were used for the accession procedure. Compared with these three methods, our method showed high accuracy for all selected proteins (<xref ref-type="table" rid="table2">Table 2</xref>) for repeats and residues. Our method also showed a higher sensitivity for repeat prediction, although the sensitivity was lower than that of REPRO if repeat residues were counted.</p></sec><sec id="s3"><title>3. Results and Discussion</title><p>We used typical proteins of eight-stranded β/α barrel family as examples to demonstrate the effectiveness of our methods for detecting symmetries in protein sequence. The TIM-barrel is an ancient fold with considerable sequence diversity. It evolved from the half- or quarter-barrel. Particularly, the prototypical (β/α)8-barrel proteins HisA (PDB id: 1QO2) and HisF (PDB id 1THF) provided evidence that this fold evolved from a (β/α)4-half or (β/α)4 quarter-barrel ancestor. If the chain conformations of protein are primarily determined by the information contained in its amino acid sequence, there must be signals which indicate the structural symmetry in the sequences of these proteins. Here, we used HisA and HisF as examples.</p><p><xref ref-type="fig" rid="fig3">Figure 3</xref>(c) shows that the entire zone was partitioned into two main parts. This demonstrates the latent 2-fold periodicity in both of these sequences. For HisF, the recurrence plot shows that at position 122, the sharp boundary line divides the plot into two parts. This means that segments 1 - 122 and 123 - 253 are symmetric. Similarly to HisF, the sharp boundary line divides the recurrence plot of HisA into two parts in x<sub>i</sub> = 118. This result agrees with the experimental findings that the TIM-barrel family evolved from repeated duplication of simpler units.</p><p>It is easy to extend the analysis above to the amino acid sequences of all other proteins in this family. Sixteen proteins were selected from the fold of TIM-barrel in CATH, among them the identical amino acids between any two sequences are less than 30%. Furthermore, among these, identical amino acids between any two sequences</p><table-wrap-group id="2"><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Sensitivity and accuracy for different selected proteins from PROPEAT</title></caption><table-wrap id="2_1"><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Folds</th><th align="center" valign="middle"  colspan="4"  >Repeats</th><th align="center" valign="middle"  colspan="4"  >Residues</th></tr></thead><tr><td align="center" valign="middle" >Radar</td><td align="center" valign="middle" >Trust</td><td align="center" valign="middle" >Repro</td><td align="center" valign="middle" >Our</td><td align="center" valign="middle" >Radar</td><td align="center" valign="middle" >Trust</td><td align="center" valign="middle" >Repro</td><td align="center" valign="middle" >Our</td></tr><tr><td align="center" valign="middle" >b-trefoill</td><td align="center" valign="middle" >28.79</td><td align="center" valign="middle" >28.79</td><td align="center" valign="middle" >65.15</td><td align="center" valign="middle" >96.31</td><td align="center" valign="middle" >30.74</td><td align="center" valign="middle" >27.81</td><td align="center" valign="middle" >99.26</td><td align="center" valign="middle" >77.44</td></tr><tr><td align="center" valign="middle" >Jelly-roll</td><td align="center" valign="middle" >22.50</td><td align="center" valign="middle" >13.85</td><td align="center" valign="middle" >96.92</td><td align="center" valign="middle" >97.65</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td></tr><tr><td align="center" valign="middle" >Ig like</td><td align="center" valign="middle" >9.38</td><td align="center" valign="middle" >15.63</td><td align="center" valign="middle" >90.63</td><td align="center" valign="middle" >94.11</td><td align="center" valign="middle" >8.64</td><td align="center" valign="middle" >17.69</td><td align="center" valign="middle" >110.23</td><td align="center" valign="middle" >98.82</td></tr><tr><td align="center" valign="middle" >TIM-barrel</td><td align="center" valign="middle" >23.75</td><td align="center" valign="middle" >22.50</td><td align="center" valign="middle" >50.00</td><td align="center" valign="middle" >91.63</td><td align="center" valign="middle" >19.54</td><td align="center" valign="middle" >57.72</td><td align="center" valign="middle" >107.94</td><td align="center" valign="middle" >97.36</td></tr><tr><td align="center" valign="middle" >Ferredoxin-like</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >62.79</td><td align="center" valign="middle" >88.49</td></tr><tr><td align="center" valign="middle" >Total</td><td align="center" valign="middle" >20.18</td><td align="center" valign="middle" >19.01</td><td align="center" valign="middle" >73.01</td><td align="center" valign="middle" >93.47</td><td align="center" valign="middle" >16.65</td><td align="center" valign="middle" >24.00</td><td align="center" valign="middle" >93.12</td><td align="center" valign="middle" >90.65</td></tr></tbody></table></table-wrap><table-wrap id="2_2"><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Folds</th><th align="center" valign="middle"  colspan="4"  >Repeats</th><th align="center" valign="middle"  colspan="4"  >Residues</th></tr></thead><tr><td align="center" valign="middle" >Radar</td><td align="center" valign="middle" >Trust</td><td align="center" valign="middle" >Repro</td><td align="center" valign="middle" >Our</td><td align="center" valign="middle" >Radar</td><td align="center" valign="middle" >Trust</td><td align="center" valign="middle" >Repro</td><td align="center" valign="middle" >Our</td></tr><tr><td align="center" valign="middle" >b-trefoill</td><td align="center" valign="middle" >63.16</td><td align="center" valign="middle" >68.42</td><td align="center" valign="middle" >42.50</td><td align="center" valign="middle" >72.52</td><td align="center" valign="middle" >76.16</td><td align="center" valign="middle" >76.88</td><td align="center" valign="middle" >63.20</td><td align="center" valign="middle" >80.76</td></tr><tr><td align="center" valign="middle" >Jelly-roll</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td><td align="center" valign="middle" >-------</td></tr><tr><td align="center" valign="middle" >Ig like</td><td align="center" valign="middle" >50.00</td><td align="center" valign="middle" >70.00</td><td align="center" valign="middle" >44.83</td><td align="center" valign="middle" >78.96</td><td align="center" valign="middle" >8.33</td><td align="center" valign="middle" >52.34</td><td align="center" valign="middle" >48.29</td><td align="center" valign="middle" >61.03</td></tr><tr><td align="center" valign="middle" >TIM-barrel</td><td align="center" valign="middle" >47.05</td><td align="center" valign="middle" >56.25</td><td align="center" valign="middle" >43.74</td><td align="center" valign="middle" >82.47</td><td align="center" valign="middle" >29.64</td><td align="center" valign="middle" >39.88</td><td align="center" valign="middle" >29.57</td><td align="center" valign="middle" >63.11</td></tr><tr><td align="center" valign="middle" >Ferredoxin-like</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >50.00</td><td align="center" valign="middle" >100.00</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >0</td><td align="center" valign="middle" >92.00</td><td align="center" valign="middle" >97.87</td></tr><tr><td align="center" valign="middle" >Total</td><td align="center" valign="middle" >50.00</td><td align="center" valign="middle" >59.57</td><td align="center" valign="middle" >47.37</td><td align="center" valign="middle" >85.45</td><td align="center" valign="middle" >48.15</td><td align="center" valign="middle" >54.58</td><td align="center" valign="middle" >45.06</td><td align="center" valign="middle" >76.62</td></tr></tbody></table></table-wrap></table-wrap-group><p>were less than 30%. Therefore, these proteins can be considered representatives of the TIM-barrel family. We showed that the modified recurrence plot clearly revealed 2-fold, 4-fold, and even 3-fold symmetry in the primary sequence. First, we found the 2-fold symmetry in all members of this family had a similarity degree of r = 0.4 for the alignment, supporting the hypothesis of the origin of protein domains by duplication and recombination of simpler peptides. <xref ref-type="fig" rid="fig4">Figure 4</xref> shows the modified recurrence plot of typical proteins of the TIM-barrel family, and all of the results are listed in <xref ref-type="table" rid="table2">Table 2</xref>. Based on the partitioned mode of the plot, the modes of origin can be classified into three main categories (<xref ref-type="table" rid="table3">Table 3</xref>).</p><p>Categories 1 (e.g., <xref ref-type="fig" rid="fig4">Figure 4</xref>, S1) clearly contained a nearly 4-fold repeat structure with all three sub-optimal alignments visible; 4 + 4 indicates that the proteins evolved from an ancestral half-barrel. However, when we restricted the threshold, the multi-fold symmetry of the primary sequence emerged. This result supports that the</p><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> The tertiary structures and recurrence plot of imidazoleglycerol phosphate (PDBid:1thf) and Isomerase (PDBid:1qo2). (a) PDBid of the protein; (b) the tertiary structure. This figure was generated by Pymoland it was shown in rainbow cartoon; (c) the recurrence plot</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-9102302x13.png"/></fig><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Result of all the proteins is classified into three categories<sup>#</sup></title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Categories</th><th align="center" valign="middle" >R</th><th align="center" valign="middle" >Format</th><th align="center" valign="middle" >PDB id</th></tr></thead><tr><td align="center" valign="middle"  rowspan="2"  >S1 (e.g. <xref ref-type="fig" rid="fig4">Figure 4</xref>: S1)</td><td align="center" valign="middle" >0.4 and 0.5</td><td align="center" valign="middle" >4 + 4</td><td align="center" valign="middle"  rowspan="2"  >1eex, 1gk8, 1hzy, 1s2w, 1bd0, 1eye, 1i1w, 1v93</td></tr><tr><td align="center" valign="middle" >0.6</td><td align="center" valign="middle" >2 + 2 + 2 + 2</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >S2 (e.g. <xref ref-type="fig" rid="fig4">Figure 4</xref>: S2)</td><td align="center" valign="middle" >0.4</td><td align="center" valign="middle" >4 + 4</td><td align="center" valign="middle"  rowspan="2"  >1o1z</td></tr><tr><td align="center" valign="middle" >0.5 and 0.6</td><td align="center" valign="middle" >2 + 3 + 3</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >S3 (e.g. <xref ref-type="fig" rid="fig4">Figure 4</xref>: S3)</td><td align="center" valign="middle" >0.4 and 0.5</td><td align="center" valign="middle" >4 + 4</td><td align="center" valign="middle"  rowspan="2"  >1luc, 1kko, 1ex1, 1muw, 1req, 1ccw, 1oc7</td></tr><tr><td align="center" valign="middle" >0.6</td><td align="center" valign="middle" >5 + 3</td></tr></tbody></table></table-wrap><p><sup>#</sup>Here, we regard the βa domain as the basic unit to form the tertiary structure. We use a formula N<sub>1</sub> + N<sub>2</sub> + ∙∙∙ + N<sub>i</sub> + ∙∙∙ + N<sub>n</sub> to express “format”. In the formula N<sub>i</sub> (i = 1, 2, 3, ∙∙∙, n) means the number of βa domain to form a beta-domain; n means the number of beta-domain to form the whole structure. (e.g. Format 4 + 4 means 4 βa domains form a beta-domain, and the whole structure is grouped by the two domains.)</p><fig-group id="fig4"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Structures and recurrence plots of the representative proteins. (a) The tertiary structures of proteins. (b)-(d) Modified recurrence plot with the values of r = 0.40, 0.50, 0.60 respectively. S means “categories”.</title></caption><fig id ="fig4_1"><label> (b)</label><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/4-9102302x14.png"/></fig></fig-group><p>ancient module may have arisen by 2-fold duplication of an aβ precursor, which would have given rise to the 8-fold symmetry. The same is true for other representative numbers (1EEX, 1GK8, 1HZY, 1S2W, 1BD0, 1EYE, 1I1W) of this family (not shown here).</p><p>Categories 2 (e.g., <xref ref-type="fig" rid="fig4">Figure 4</xref>, S2), the 3-fold symmetry emerged as the similarity degree increased. The protein may have had three ancestral segments, but the structure alignment showed that the latter two domains (3 + 3) were similar (rmsd = 3.77). One can speculate that the ancient βα domain may have duplicated to form the βαβαβα domain, and the other domain evolved by tandem duplication and fusion from the formed domain.</p><p>Categories 3 (e.g., <xref ref-type="fig" rid="fig4">Figure 4</xref>, S3), with the format of 5 + 3, the former domain (fi = 5) may have contained an βa domain as the ancestral segment and the latter domain (fi =3) contained another; therefore, we speculated that these proteins evolved by gene duplication from two ancestral segments, which formed the domain by duplication respectively during the early stage of evolution.</p></sec><sec id="s4"><title>4. Conclusion</title><p>An internal repeat is a character that proteins use to adapt their structures and functions under evolutionary pressure. A detailed analysis of internal repeats within protein sequences may have wide-ranging implications for protein evolutionary trends. In this study, we used modified recurrence analysis method to detect hidden symmetries within proteins from the TIM-barrel family which accounted clearly for the 2-, 3-, and 4-fold symmetry. This result was consistent with the idea that TIM-barrels evolved from repeated duplication of simpler units. These findings support the hypothesis that protein evolution typically occurs by duplication, mutation, and shuffling from existing protein domains. Occasionally, the domains themselves are produced de novo, but they primarily belong to an established set. This result suggests that the symmetries at the structure level are due to those at sequence level. We hope that our results are useful for the development of structural prediction methods and understanding the mechanisms of protein evolution.</p></sec><sec id="s5"><title>Acknowledgements</title><p>This work is supported by the Special Scientific Research Funds for Central Non-profit Institute, Yellow Sea Fisheries Research Institutes (Grant no. 20603022015012 and 20603022013016).</p></sec><sec id="s6"><title>Cite this paper</title><p>Xiaofeng Ji,Yuan Zheng,Zhipeng Wang,Jun Sheng, (2016) Hidden Sequence Repeats: Additional Evidence for the Origin of TIM-Barrel Family. Journal of Biomedical Science and Engineering,09,307-314. doi: 10.4236/jbise.2016.96025</p></sec><sec id="s7"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.67017-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Anfinsen, C.B. (1973) Principles That Govern the Folding of Protein Chains. Science, 181, 223-230. http://dx.doi.org/10.1126/science.181.4096.223</mixed-citation></ref><ref id="scirp.67017-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Soding, J. and Lupas, A.N. (2003) More than the Sum of Their Parts: On the Evolution of Proteins from Peptides. Bioessays, 25, 837-846. http://dx.doi.org/10.1002/bies.10321</mixed-citation></ref><ref id="scirp.67017-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Lupas, A.N., Ponting, C.P. and Russell, R.B. (2001) On the Evolution of Protein Folds: Are Similar Motifs in Different Protein Folds the Result of Convergence, Insertion, or Relics of an Ancient Peptide World? Journal of Structural Biology, 134, 191-203. http://dx.doi.org/10.1006/jsbi.2001.4393</mixed-citation></ref><ref id="scirp.67017-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Nagano, N., Orengo, C.A. and Thornton, J.M. (2002) One Fold with Many Functions: The Evolutionary Relationships between TIM Barrel Families Based on Their Sequences, Structures and Functions. Journal of Molecular Biology, 321, 741-765. http://dx.doi.org/10.1016/S0022-2836(02)00649-6</mixed-citation></ref><ref id="scirp.67017-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Branden, C. and Tooze, J. (1991) Introduction to Protein Structure. Garland, New York.</mixed-citation></ref><ref id="scirp.67017-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Lang, D., Thoma, R., Henn-Sax, M., Sterner, R. and Wilmanns, M. (2000) Structural Evidence for Evolution of the Beta/Alpha Barrel Scaffold by Gene Duplication and Fusion. Science, 289, 1546-1550. http://dx.doi.org/10.1126/science.289.5484.1546</mixed-citation></ref><ref id="scirp.67017-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Fani, R., Lio, P., Chiarelli, I. and Bazzicalupo, M. (1994) The Evolution of the Histidine Biosynthetic Genes in Prokaryotes: A Common Ancestor for the hisA and hisF Genes. Journal of Molecular Evolution, 38, 489-495. http://dx.doi.org/10.1007/BF00178849</mixed-citation></ref><ref id="scirp.67017-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Lee, J. and Blaber, M. (2011) Experimental Support for the Evolution of Symmetric Protein Architecture from a Simple Peptide Motif. Proceedings of the National Academy of Sciences of the United States of America, 108, 126-130. http://dx.doi.org/10.1073/pnas.1015032108</mixed-citation></ref><ref id="scirp.67017-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">List, F., Sterner, R. and Wilmanns, M. (2011) Related (Betaal-pha)8-barrel Proteins in Histidine and Tryptophan Biosynthesis: A Paradigm to Study Enzyme Evolution. ChemBioChem, 12, 1487-1494. http://dx.doi.org/10.1002/cbic.201100082</mixed-citation></ref><ref id="scirp.67017-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Richter, M., Bosnali, M., Carstensen, L., Seitz, T., Durchschlag, H., Blanquart, S., Merkl, R. and Sterner, R. (2010) Computational and Experimental Evidence for the Evolution of a (βα)&lt; sub&gt; 8-Barrel Protein from an Ancestral Quarter- Barrel Stabilised by Disulfide Bonds. Journal of Molecular Biology, 398, 763-773. http://dx.doi.org/10.1016/j.jmb.2010.03.057</mixed-citation></ref><ref id="scirp.67017-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Pellegrini, M., Renda, M.E. and Vecchio, A. (2012) Ab Initio Detection of Fuzzy Amino Acid Tandem Repeats in Protein Sequences. BMC Bioinformatics, 13, S8. http://dx.doi.org/10.1186/1471-2105-13-S3-S8</mixed-citation></ref><ref id="scirp.67017-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Luo, H., Lin, K., David, A., Nijveen, H. and Leunissen, J.A. (2012) ProRepeat: An Integrated Repository for Studying Amino Acid Tandem Repeats in Proteins. Nucleic Acids Research, 40, D394-D399. http://dx.doi.org/10.1093/nar/gkr1019</mixed-citation></ref><ref id="scirp.67017-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Senthilkumar, R., Sabarinathan, R., Hameed, B.S., Banerjee, N., Chidambarathanu, N., Karthik, R. and Sekar, K. (2010) FAIR: A Server for Internal Sequence Repeats. Bioinformation, 4, 271-275. http://dx.doi.org/10.6026/97320630004271</mixed-citation></ref><ref id="scirp.67017-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Marsella, L., Sirocco, F., Trovato, A., Seno, F. and Tosatto, S.C. (2009) REPETITA: Detection and Discrimination of the Periodicity of Protein Solenoid Repeats by Discrete Fourier Transform. Bioinformatics, 25, i289-i295. http://dx.doi.org/10.1093/bioinformatics/btp232</mixed-citation></ref><ref id="scirp.67017-ref15"><label>15</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Nirjhar Banerjee</surname><given-names> N.C.D.M. </given-names></name>,<etal>et al</etal>. (<year>2008</year>)<article-title>An Algorithm to Find All Identical Internal Sequence Repeats</article-title><source> Current Science India</source><volume> 95</volume>,<fpage> 188</fpage>-<lpage>195</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.67017-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Soding, J., Remmert, M. and Biegert, A. (2006) HHrep: De Novo Protein Repeat Detection and the Origin of TIM Barrels. Nucleic Acids Research, 34, W137-W142. http://dx.doi.org/10.1093/nar/gkl130</mixed-citation></ref><ref id="scirp.67017-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Szklarczyk, R. and Heringa, J. (2004) Tracking Repeats Using Significance and Transitivity. Bioinformatics, 20, i311-i317. http://dx.doi.org/10.1093/bioinformatics/bth911</mixed-citation></ref><ref id="scirp.67017-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Heger, A. and Holm, L. (2000) Rapid Automatic Detection and Alignment of Repeats in Protein Sequences. Proteins: Structure, Function, and Bioinformatics, 41, 224-237. http://dx.doi.org/10.1002/1097-0134(20001101)41:2&lt;224::AID-PROT70&gt;3.0.CO;2-Z</mixed-citation></ref><ref id="scirp.67017-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Rackovsky, S. (1998) “Hidden” Sequence Periodicities and Protein Architecture. Proceedings of the National Academy of Sciences of the United States of America, 95, 8580-8584. http://dx.doi.org/10.1073/pnas.95.15.8580</mixed-citation></ref><ref id="scirp.67017-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Xu, R. and Xiao, Y. (2005) A Common Sequence-Associated Physicochemical Feature for Proteins of Beta-Trefoil Family. Computational Biology and Chemistry, 29, 79-82. http://dx.doi.org/10.1016/j.compbiolchem.2004.12.003</mixed-citation></ref><ref id="scirp.67017-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Ji, X., Chen, H. and Xiao, Y. (2007) Hidden Symmetries in the Primary Sequences of Beta-Barrel Family. Computational Biology and Chemistry, 31, 61-63. http://dx.doi.org/10.1016/j.compbiolchem.2007.01.002</mixed-citation></ref><ref id="scirp.67017-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Yadid, I. and Tawfik, D.S. (2011) Functional Beta-Propeller Lectins by Tandem Duplications of Repetitive Units. Protein Engineering, Design and Selection, 24, 185-195. http://dx.doi.org/10.1093/protein/gzq053</mixed-citation></ref><ref id="scirp.67017-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Wang, X., Huang, Y. and Xiao, Y. (2008) Structural-Symmetry-Related Sequence Patterns of the Proteins of Beta- Propeller Family. Journal of Molecular Graphics and Modelling, 26, 829-833. http://dx.doi.org/10.1016/j.jmgm.2007.04.014</mixed-citation></ref><ref id="scirp.67017-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Ji, X., Wang, H., Hao, J., Zheng, Y., Wang, W. and Sun, M. (2010) Identification of Sequence Repetitions in Immunoglobulin Folds. Journal of Molecular Graphics and Modelling, 28, 788-791. http://dx.doi.org/10.1016/j.jmgm.2010.02.003</mixed-citation></ref><ref id="scirp.67017-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Huang, Y. and Xiao, Y. (2007) Detection of Gene Duplication Signals of Ig Folds from Their Amino Acid Sequences. Proteins: Structure, Function, and Bioinformatics, 68, 267-272. http://dx.doi.org/10.1002/prot.21330</mixed-citation></ref><ref id="scirp.67017-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Shen, X. (2011) Conformation and Sequence Evidence for Two-Fold Symmetry in Left-Handed Beta-Helix Fold. Journal of Theoretical Biology, 285, 77-83. http://dx.doi.org/10.1016/j.jtbi.2011.06.011</mixed-citation></ref><ref id="scirp.67017-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Ji, X., Sheng, J., Wang, F., Zhang, S., Hao, J., Wang, H. and Sun, M. (2011) Identification of Latent Periodicity in Domains of Alkaline Proteases. Biochemistry (Moscow), 76, 1037-1042. http://dx.doi.org/10.1134/S0006297911090082</mixed-citation></ref><ref id="scirp.67017-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Yamazaki, T. and Maruyama, T. (1972) Evidence for the Neutral Hypothesis of Protein Polymorphism. Science, 178, 56-58. http://dx.doi.org/10.1126/science.178.4056.56</mixed-citation></ref><ref id="scirp.67017-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Sillitoe, I., Cuff, A.L., Dessailly, B.H., Dawson, N.L., Furnham, N., Lee, D., Lees, J.G., Lewis, T.E., Studer, R.A., Rentzsch, R., Yeats, C., Thornton, J.M. and Orengo, C.A. (2013) New Functional Families (FunFams) in CATH to Improve the Mapping of Conserved Functional Sites to 3D Structures. Nucleic Acids Research, 41, D490-D498. http://dx.doi.org/10.1093/nar/gks1211</mixed-citation></ref><ref id="scirp.67017-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Konopka, A.K. (2005) Sequence Complexity and Composition. eLS.</mixed-citation></ref><ref id="scirp.67017-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Panek, J., Eidhammer, I. and Aasland, R. (2005) A New Method for Identification of Protein (Sub) Families in a Set of Proteins Based on Hydropathy Distribution in Proteins. Proteins: Structure, Function, and Bioinformatics, 58, 923-934. http://dx.doi.org/10.1002/prot.20356</mixed-citation></ref></ref-list></back></article>