<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">AJPS</journal-id><journal-title-group><journal-title>American Journal of Plant Sciences</journal-title></journal-title-group><issn pub-type="epub">2158-2742</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ajps.2015.619315</article-id><article-id pub-id-type="publisher-id">AJPS-62020</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biomedical&amp;Life Sciences</subject></subj-group></article-categories><title-group><article-title>
 
 
  &lt;i&gt;In Silico&lt;/i&gt; Exploration of &lt;i&gt;Cannabis sativa&lt;/i&gt; L. Genome for Simple Sequence Repeats (SSRs)
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>ncoronata</surname><given-names>Galasso</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Elena</surname><given-names>Ponzoni</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Istituto di Biologia e Biotecnologia Agraria (IBBA-CNR), Milan, Italy</addr-line></aff><pub-date pub-type="epub"><day>02</day><month>12</month><year>2015</year></pub-date><volume>06</volume><issue>19</issue><fpage>3244</fpage><lpage>3250</lpage><history><date date-type="received"><day>19</day>	<month>November</month>	<year>2015</year></date><date date-type="rev-recd"><day>accepted</day>	<month>15</month>	<year>December</year>	</date><date date-type="accepted"><day>18</day>	<month>December</month>	<year>2015</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Simple sequence repeat (SSR) or microsatellite markers, are a valuable tool for several purposes such as evaluation of genetic diversity, fingerprinting, marker assisted selection, and breeding. Recent developments in sequencing technologies and bioinformatics analyses provide new opportunity to produce a high number of less costly SSRs. Here, we used for the first time a wholegenome shotgun sequencing of the nuclear genome and transcriptome of hemp to develop microsatellite markers for C. sativa L. (hemp). Hemp is an ancient crop that is widely cultivated as a source of fiber, seeds and medicine. The analysis using the MISA program revealed a total of 407,491 SSRs (from mono-nucleotide to deca-nucleotide) in the hemp genome and 15,655 SSRs in the transcriptome. Analysis of the frequency and distribution of SSRs showed that the mono-nucleotide repeats were the most abundant (55.4%) in the genome whereas the tri-nucleotide motifs (30.4%) resulted highly predominant in the transcriptome. Poly A/T was predominant over poly G/C in both genome and transcriptome sequences. Among the tri-nucleotide repeats AAG/CTT (34.5%) resulted the most abundant in the transcriptome. Repeats larger than tri-nucleotide were also observed in the hemp genome and transcriptome. Dinucleotide and tri-nucleotide repeat expansion of 8605 and 1401 times iteration were observed however, other SSR expansion more than 387 times repetition was not found. Primers were designed for amplification of few long microsatellite sequences which could be used to identify polymorphism and to study genetic diversity among hemp cultivars.
 
</p></abstract><kwd-group><kwd>Microsatellite</kwd><kwd> Relative Density</kwd><kwd> Relative Abundance</kwd><kwd> PCR Amplification</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Repetitive elements are present in large quantities in eukaryotic genome, both in coding and non-coding region [<xref ref-type="bibr" rid="scirp.62020-ref1">1</xref>] . Among them the tandemly repeated DNA sequences of 1 - 6 bp are referred to as simple sequence repeats (SSRs), sequence tagged sites (STS) or microsatellites and resulted very useful for genetic marker development and genome application [<xref ref-type="bibr" rid="scirp.62020-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.62020-ref3">3</xref>] . Simple sequence repeats are codominant, abundant, multi-allelic, and uniformly distributed over the genome, and can be detected by simple reproducible assays [<xref ref-type="bibr" rid="scirp.62020-ref4">4</xref>] . Traditionally, SSRs have been isolated from partially digested genomic DNA libraries and several thousands of clones were screened through colony/plaque hybridization using repetitive DNA probes. Later on several other methods have been used in order to decrease the time and cost invested and simultaneously increasing the yield of microsatellites. Today the increasing whole-genome sequences of many plant species provide sources for SSR mining in silico. Therefore, the low cost of in silico mining and high abundance of microsatellites in different sequence resources make this approach extremely attractive for the generation of microsatellite markers.</p><p>Recently, a whole-genome shotgun sequencing of the nuclear genome and transcriptome of hemp has been reported by van Bekel et al. (2011) [<xref ref-type="bibr" rid="scirp.62020-ref5">5</xref>] . This project provides the assembled draft genome and transcriptome of Cannabis sativa strain Purple Kush (PK). The contig assembly contains 534.0 Mb without gaps and 786.6 Mb including gaps representing an estimated 65% and 96% genome coverage of the haploid hemp genome ~820 Mb [<xref ref-type="bibr" rid="scirp.62020-ref5">5</xref>] . A total of 136,290 scaffolds were obtained from the whole-genome shotgun assembly and 40,224 from the transcriptome. Availability of hemp genome led to the possibility of in silico analysis of the genome for the identification of microsatellite which could be useful for cultivar identification, mapping and genetic diversity evaluation. Therefore, in the present study, we analysed the hemp genome and transcriptome sequences using several publicly available software programs with the objectives: a) to retrieve and characterize microsatellite loci from the genome and transcriptome, b) to develop and characterize a collection of SSR-markers for hemp in terms of frequency, information content, genomic distribution, and c) to assess their potential for diversity analysis in a reference set of hemp cultivars of different origin.</p></sec><sec id="s2"><title>2. Material and Methods</title><sec id="s2_1"><title>2.1. Identification of Microsatellites</title><p>Genomic and transcriptomic sequences of hemp in FASTA format were downloaded from the Cannabis Genome Browser http://genome.ccbr.utoronto.ca/ database. The Perl script MIcroSAtellite (MISA) (http://pgrc.ipk-gatersleben.de/misa/) was used to identify microsatellites from both genomes and coding DNA sequences (CDS) from the transcriptome. To identify the presence of SSRs, only 1 to 10 nucleotide motifs were considered, and the minimum repeat unit was defined as 10 for mono-, 6 for di-, 5 for tri-, tetra-, penta-, hexa-, 3 for septa- and 2 for octa- to deca-nucleotides. Compound SSRs were defined as ≥2 SSRs interrupted by ≤100 bases [<xref ref-type="bibr" rid="scirp.62020-ref6">6</xref>] .</p><p>The categorization proposed by Weber (1990) [<xref ref-type="bibr" rid="scirp.62020-ref7">7</xref>] was used. Perfect repeats are formed from identical repetitive units; imperfect repeats are units with small mutations, and repetitive compound elements are composed of sequences in which two or more repetitions (perfects or imperfects) are arranged successively with or without nucleotide bases between them.</p></sec><sec id="s2_2"><title>2.2. Statistical Analysis</title><p>SSR types were analysed for their abundance and density per Mb for both genome and coding sequences. Statistical data not present in the MISA output files, like e.g. the relative abundance and the relative density have been calculated using the custom program statistics_misa.py and statgetlongest.py. The relative abundance and density were calculated by following formulas:</p><disp-formula id="scirp.62020-formula307"><graphic  xlink:href="http://html.scirp.org/file/24-2602459x6.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.62020-formula308"><graphic  xlink:href="http://html.scirp.org/file/24-2602459x7.png"  xlink:type="simple"/></disp-formula></sec><sec id="s2_3"><title>2.3. Sequence Analysis for Primer Designing</title><p>Genomic and CDS SSRs generated by MISA were analysed for designing primers flanking the repeats. Genomic microsatellites have been selected that match the following criteria: minimum and maximum repeat length of 30 and 200 bp, respectively and having an up- and downstream flanking region of at least 200 bp. For CDS microsatellites the minimum and maximum repeat length was set to 20 and 200 bp, respectively with an up- and downstream flanking region of at least 150 bp.</p><p>In order to find microsatellites matching the before mentioned criteria the custom programs filterrepeatsmisa.py and getsequences.py were used. The custom programs used in this study (PySSRstat) have been written in the Python 3 language and are available from http://www.nenno.it/PySSRstat.</p></sec><sec id="s2_4"><title>2.4. Designing SSR Based Primers and Validation of SSR Markers for Amplification</title><p>To design primers flanking the microsatellite loci, Primer3 (http://bioinfo.ut.ee/primer3-0.4.0/primer3/) program was used. The length of the amplicons was set to 100 - 350 bp. Oligonucleotide parameters for Primer3 were set to a length of 18 - 27 bp with an optimum of 20 bp, a GC content of 20% - 80% with an optimum of 50%, a melting temperature (Tm) of 57˚C - 63˚C with an optimum of 60˚C, and a primer Tm maximum difference of 1˚C or 2˚C.</p><p>Ten cultivars of industrial non-drug hempseed, which are the most cultivated in Europe (Eletta Campana, Kc Dora, Codimono, Carmaleonte, Felina, Fibranova, Fedora, Futura, Carmagnola and Finola), were chosen and used for the validation of 15 SSR markers randomly selected. Ten SSR markers were chosen from the genomic DNA and five from the transcriptome (<xref ref-type="table" rid="table1">Table 1</xref>). Genomic DNA from all hemp cultivars was isolated from young leaves. Each PCR reaction was performed in a total volume of 15 &#181;l containing 10 ng of genomic DNA, 5 pmole each of forward and reverse primers, 0.1 mM dNTPs, 1 &#215; PCR buffer (10 mM Tris, pH 8.0, 50 mM KCl and 50 mM ammonium sulphate), 1.8 mM MgCl<sub>2</sub>, and 0.2 unit of Taq DNA polymerase. The cycling conditions involved initial denaturation at 94˚C for 4 min, followed by 36 cycles of denaturation at 94˚C for 1 min, primer annealing at 56˚C for 45 sec, and primer extension at 72˚C for 45 sec. A final extension at 72˚C for 7 min was done and products stored at 4˚C until electrophoresis. The PCR products were resolved by electrophoresis in 2% agarose gels in 1 &#215; TAE buffer and visualized by ethidium bromide staining.</p></sec></sec><sec id="s3"><title>3. Results and Discussion</title><p>The analysis by the MISA program revealed a total of 407,491 SSRs (from mono-nucleotide to deca-nucleotide) in the hemp genome and 15,655 SSRs in the transcriptome (<xref ref-type="table" rid="table2">Table 2</xref>). The relative density and abundance of SSRs for the genome was 1527 bp/Mb and 518 SSR/Mb, respectively and for the CDS 1351 bp/Mb and 385 SSR/Mb, respectively (<xref ref-type="table" rid="table2">Table 2</xref>). The relative abundance of SSR/Mb in the hemp genome is in line with that reported by Sonah et al., 2011 [<xref ref-type="bibr" rid="scirp.62020-ref6">6</xref>] for other dicot plant species such as Arabidopsis thaliana (416.6/Mb), Medicago truncatula (405.8/Mb) and Populus trichocarpa (667.9/Mb).</p><p>Using MISA program, we obtained a detailed analysis of the frequency and distribution of all mono- to deca-nucleotides repeats from the hemp genomic DNA and CDS (<xref ref-type="table" rid="table3">Table 3</xref>). Similarly to other plant genomes studied so far [<xref ref-type="bibr" rid="scirp.62020-ref6">6</xref>] also in hemp genome the most frequent microsatellite type was the mono-nucleotide repeat (55.4%), whereas the most abundant repeat in the CDS resulted the tri-nucleotide repeats (30.4%) followed by the mono-dinucleotide repeat (27.3%) (<xref ref-type="table" rid="table3">Table 3</xref>). The accumulation of tri-nucleotide repeats in the hemp CDS is consistent with the results of other authors which analysed the CDS of several plant species [<xref ref-type="bibr" rid="scirp.62020-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.62020-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.62020-ref9">9</xref>] .</p><p>Among the other repeats the octa-nucleotide showed the highest frequency for both CDS and genomic DNA, 11.4% and 12.9%, respectively. Except the nona-nucleotide repeat which was 8.3% and 5.8% in the CDS and genome respectively, all the remaining repeats (tetra-, penta- hexa-, septa- and deca-nucleotide) were present below 2.5% (<xref ref-type="table" rid="table3">Table 3</xref>).</p><p>Among the mono-nucleotide repeats, the motif A/T was the most common both in the hemp genome and CDS (<xref ref-type="table" rid="table4">Table 4</xref>). The AT/AT di-nucleotides was the most frequent in the genome with 51.1% whereas AG/CT motif was the most abundant in the CDS with 64.6%. For tri- and tetra-nucleotides, the motifs AAT/ATT reached 45.8%, AAG/CTT 34.5%, AAAT/ATTT 54.5%, and AAAG/CTTT 31.6%, respectively. In the hemp genome the penta- to septa-nucleotide repeats were represented by AATAC/GTATT (23.2%), AATGGG/CCCATT (13.0%), and AAAAAAT/ATTTTTT (15.6%), whereas in the transcriptome 16.7% by the penta-nucleotides AAGAG/CTCTT, AACTC/GAGTT and 14.5% by the deca-nucletide AAAGAGAGAG/CTCTCTCTTT.</p><p>All the remaining motifs were less than 10% (<xref ref-type="table" rid="table4">Table 4</xref>). As reported by Grover et al., 2007 [<xref ref-type="bibr" rid="scirp.62020-ref10">10</xref>] also in hemp genome and transcriptome, microsatellites show a decrease in abundance with increasing repeat length. In hemp genome the longest mono-nucleotide repeat was Poly A repeated 294 times followed by Poly T iterated 113 times, similarly in the hemp CDS the longest mono-nucleotide repeat was Poly A repeated 47 times followed by</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Identification number (N), primer sequence and melting temperature (Tm) of primer designed to PCR amplify hemp SSRs</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Identification N.</th><th align="center" valign="middle" >Primer Name</th><th align="center" valign="middle" >Sequence</th><th align="center" valign="middle" >Tm</th><th align="center" valign="middle" >Size</th><th align="center" valign="middle" >Repeat</th></tr></thead><tr><td align="center" valign="middle"  colspan="6"  >Genome</td></tr><tr><td align="center" valign="middle" >Scaffold5651</td><td align="center" valign="middle" >Scaf5651F</td><td align="center" valign="middle" >5’GTGGTGGCATCATTCAACAG3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >229</td><td align="center" valign="middle" >(TGG) 10</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf5651R</td><td align="center" valign="middle" >5’CAAAGCCAAAACTCCCAAAA3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold30053</td><td align="center" valign="middle" >Scaf30053F</td><td align="center" valign="middle" >5’TGTTGGGTTAAGGGCATTTT3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >239</td><td align="center" valign="middle" >(TC) 24</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf30053R</td><td align="center" valign="middle" >5’CCTTGTTCTAGCTGCCTTCG3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold12289</td><td align="center" valign="middle" >Scaf12289F</td><td align="center" valign="middle" >5’GGTGCATTGCAAGAGAACAA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >181</td><td align="center" valign="middle" >(GA) 18</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf12289R</td><td align="center" valign="middle" >5’CCCTCAATCCACTCTGAAAAA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold103666</td><td align="center" valign="middle" >Scaf103666F</td><td align="center" valign="middle" >5’AGCTTTCGAATTCGTCTGGA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >198</td><td align="center" valign="middle" >(GAT) 12</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf103666R</td><td align="center" valign="middle" >5’TCACTCCCATCATTAACCAACTC3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold138656</td><td align="center" valign="middle" >Scaf138656F</td><td align="center" valign="middle" >5’TGGTCCACCAGGTCAAGATT3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >209</td><td align="center" valign="middle" >(CAA) 14</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf138656R</td><td align="center" valign="middle" >5’ATTCCCAACTCCTCCGTTCT3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold39138</td><td align="center" valign="middle" >Scaf39138F</td><td align="center" valign="middle" >5’CTGTCATCACAACCCACCAT3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >210</td><td align="center" valign="middle" >(TGA) 14</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf39138R</td><td align="center" valign="middle" >5’ACCGATTTCTCCATTGTTGC3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold42744</td><td align="center" valign="middle" >Scaf42744F</td><td align="center" valign="middle" >5’TTCATCTAGCTGATCTGGCAAA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >215</td><td align="center" valign="middle" >(TTG) 11</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf42744R</td><td align="center" valign="middle" >5’CCAACCTCAACTCTCTTCTTCC3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold143652</td><td align="center" valign="middle" >Scaf143652F</td><td align="center" valign="middle" >5’TGTTGGCGATATTTCCACAGT3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >169</td><td align="center" valign="middle" >(TTG) 12</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf143652R</td><td align="center" valign="middle" >5’GGGAAAATCATGTCTGCTCAA3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold27728</td><td align="center" valign="middle" >Scaf27728F</td><td align="center" valign="middle" >5’GCCAAAAATCAAGCAATTCA3’</td><td align="center" valign="middle" >58</td><td align="center" valign="middle" >202</td><td align="center" valign="middle" >(GA) 20</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf27728R</td><td align="center" valign="middle" >5’GCCCTTGTTTGAGTTTGGAA3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >Scaffold104423</td><td align="center" valign="middle" >Scaf104423F</td><td align="center" valign="middle" >5’TGGCCTAACACACTTGCGTA3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >245</td><td align="center" valign="middle" >(TTCT) 12</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >Scaf104423R</td><td align="center" valign="middle" >5’CACCACTTAGAGTTTTGAGTGCTTT3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle"  colspan="6"  >Transcriptome</td></tr><tr><td align="center" valign="middle" >PK24944</td><td align="center" valign="middle" >PK24944F</td><td align="center" valign="middle" >5’GATCCGACTTCCTGATTTCAA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >238</td><td align="center" valign="middle" >(AAG) 11</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >PK24944R</td><td align="center" valign="middle" >5’ACGTTTGTGGAAGCAAGAGC3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >PK14152</td><td align="center" valign="middle" >PK14152F</td><td align="center" valign="middle" >5’CCTCCGATTTGATGCTCATT3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >213</td><td align="center" valign="middle" >(AG) 25</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >PK14152R</td><td align="center" valign="middle" >5’CAAACACTGGTTCAGCCTCA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >PK18141</td><td align="center" valign="middle" >PK18141F</td><td align="center" valign="middle" >5’GAAGAACACGCCAAATCCTC3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >245</td><td align="center" valign="middle" >(ACC) 11</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >PK18141R</td><td align="center" valign="middle" >5’TGAAACTCATCGTCGTCTCG3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >PK13506</td><td align="center" valign="middle" >PK13506F</td><td align="center" valign="middle" >5’ACATTTGTGGATGGGGGTAA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" >212</td><td align="center" valign="middle" >(AG) 17</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >PK13506R</td><td align="center" valign="middle" >5’GAACCAGCTTTGGAAACCAT3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr><tr><td align="center" valign="middle" >PK27965</td><td align="center" valign="middle" >PK27965F</td><td align="center" valign="middle" >5’CCCACCTCCTTCTCCTCTTC3’</td><td align="center" valign="middle" >60</td><td align="center" valign="middle" >237</td><td align="center" valign="middle" >(CTCCCA) 7</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" >PK27965R</td><td align="center" valign="middle" >5’TTGAGGCATGGTATTGGTGA3’</td><td align="center" valign="middle" >59</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Number and distribution of SSRs in whole-genome and transcriptome of hemp</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Genome</th><th align="center" valign="middle" >Transcriptome</th></tr></thead><tr><td align="center" valign="middle" >Total size covered by examined sequences (Mb)</td><td align="center" valign="middle" >786.6</td><td align="center" valign="middle" >40.63</td></tr><tr><td align="center" valign="middle" >Total number of sequences examined</td><td align="center" valign="middle" >136,290</td><td align="center" valign="middle" >40,224</td></tr><tr><td align="center" valign="middle" >Total number of SSR identified</td><td align="center" valign="middle" >407,491</td><td align="center" valign="middle" >15,655</td></tr><tr><td align="center" valign="middle" >Total length of SSR (bp)</td><td align="center" valign="middle" >1,200,858</td><td align="center" valign="middle" >54,896</td></tr><tr><td align="center" valign="middle" >Total relative abundance (SSR/Mb)</td><td align="center" valign="middle" >518</td><td align="center" valign="middle" >385</td></tr><tr><td align="center" valign="middle" >Total relative density (bp/Mb)</td><td align="center" valign="middle" >1527</td><td align="center" valign="middle" >1351</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Distribution of SSR motifs in the whole-genome and transcriptome of hemp</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle" >Motif length</th><th align="center" valign="middle" >Number</th><th align="center" valign="middle" >Frequency %</th><th align="center" valign="middle" >Longest SSR motifs</th></tr></thead><tr><td align="center" valign="middle"  rowspan="10"  >Genome</td><td align="center" valign="middle" >Mono</td><td align="center" valign="middle" >225,883</td><td align="center" valign="middle" >55.4</td><td align="center" valign="middle" >(A)<sub>294</sub>, (T)<sub>113</sub>, (C)<sub>23</sub>, (G)<sub>24</sub></td></tr><tr><td align="center" valign="middle" >Di</td><td align="center" valign="middle" >60,704</td><td align="center" valign="middle" >14.9</td><td align="center" valign="middle" >(GT)<sub>8605</sub></td></tr><tr><td align="center" valign="middle" >Tri</td><td align="center" valign="middle" >28,125</td><td align="center" valign="middle" >6.9</td><td align="center" valign="middle" >(TTA)<sub>1401</sub></td></tr><tr><td align="center" valign="middle" >Tetra</td><td align="center" valign="middle" >2991</td><td align="center" valign="middle" >0.7</td><td align="center" valign="middle" >(AAGA)<sub>182</sub></td></tr><tr><td align="center" valign="middle" >Penta</td><td align="center" valign="middle" >526</td><td align="center" valign="middle" >0.1</td><td align="center" valign="middle" >(ATCCA)<sub>99</sub></td></tr><tr><td align="center" valign="middle" >Hexa</td><td align="center" valign="middle" >362</td><td align="center" valign="middle" >0.1</td><td align="center" valign="middle" >(ACACAT)<sub>387</sub></td></tr><tr><td align="center" valign="middle" >Septa</td><td align="center" valign="middle" >2634</td><td align="center" valign="middle" >0.6</td><td align="center" valign="middle" >(GAGCAAG)<sub>106</sub></td></tr><tr><td align="center" valign="middle" >Octa</td><td align="center" valign="middle" >52,586</td><td align="center" valign="middle" >12.9</td><td align="center" valign="middle" ><sup>*</sup>(AAAAAAAC)<sub>4</sub>, (ACCACCAC)<sub>4</sub>, (CCTCACTC)<sub>4</sub></td></tr><tr><td align="center" valign="middle" >Nona</td><td align="center" valign="middle" >23,500</td><td align="center" valign="middle" >5.8</td><td align="center" valign="middle" >(TCTCATCAG)<sub>39</sub></td></tr><tr><td align="center" valign="middle" >Deca</td><td align="center" valign="middle" >10,180</td><td align="center" valign="middle" >2.5</td><td align="center" valign="middle" >(AGTGCTAGGT)<sub>4</sub>, (CTCTCTCGAA)<sub>4</sub></td></tr><tr><td align="center" valign="middle"  rowspan="10"  >Transcriptome</td><td align="center" valign="middle" >Mono</td><td align="center" valign="middle" >4281</td><td align="center" valign="middle" >27.3</td><td align="center" valign="middle" >(A)<sub>47</sub>, (T)<sub>43</sub>, (G)<sub>10</sub></td></tr><tr><td align="center" valign="middle" >Di</td><td align="center" valign="middle" >2884</td><td align="center" valign="middle" >18.4</td><td align="center" valign="middle" >(AG)<sub>25</sub></td></tr><tr><td align="center" valign="middle" >Tri</td><td align="center" valign="middle" >4762</td><td align="center" valign="middle" >30.4</td><td align="center" valign="middle" >(AGA)<sub>16</sub>, (ATA)<sub>16</sub>, (ATG)<sub>16</sub>, (CAA)<sub>16</sub>, (TCT)<sub>16</sub></td></tr><tr><td align="center" valign="middle" >Tetra</td><td align="center" valign="middle" >187</td><td align="center" valign="middle" >1.2</td><td align="center" valign="middle" >(AGAA)<sub>7</sub>, (AGGA)<sub>7</sub>, (TAGA)<sub>7</sub>, (TTCT)<sub>7</sub>, (TTTC)<sub>7</sub></td></tr><tr><td align="center" valign="middle" >Penta</td><td align="center" valign="middle" >36</td><td align="center" valign="middle" >0.2</td><td align="center" valign="middle" ><sup>**</sup>(AACTC)<sub>6</sub>, (AGAAG)<sub>6</sub>, (CTCAA)<sub>6</sub></td></tr><tr><td align="center" valign="middle" >Hexa</td><td align="center" valign="middle" >47</td><td align="center" valign="middle" >0.3</td><td align="center" valign="middle" >(GATGGT)<sub>8</sub></td></tr><tr><td align="center" valign="middle" >Septa</td><td align="center" valign="middle" >114</td><td align="center" valign="middle" >0.7</td><td align="center" valign="middle" >(TCCTTGC)<sub>7</sub></td></tr><tr><td align="center" valign="middle" >Octa</td><td align="center" valign="middle" >1792</td><td align="center" valign="middle" >11.4</td><td align="center" valign="middle" >(CCTCACTC)<sub>4</sub>, (TTTCTTTT)<sub>4</sub></td></tr><tr><td align="center" valign="middle" >Nona</td><td align="center" valign="middle" >1303</td><td align="center" valign="middle" >8.3</td><td align="center" valign="middle" ><sup>***</sup>(AATGATGAT)<sub>3</sub>, (ACACCAAGA)<sub>3</sub>, (CAACCAAAC)<sub>3</sub></td></tr><tr><td align="center" valign="middle" >Deca</td><td align="center" valign="middle" >249</td><td align="center" valign="middle" >1.6</td><td align="center" valign="middle" ><sup>****</sup>(AAAAAAAAAC)<sub>2</sub>, (AAAAAAAAAG)<sub>2</sub>, (AAAAAAGAAA)<sub>2</sub></td></tr></tbody></table></table-wrap><p><sup>*</sup>Three out of twenty-three SSRs; <sup>**</sup>Three out of nine; <sup>***</sup>Three out of ten; <sup>****</sup>Three out of one hundred eighty-three.</p><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> The most abundant repeat types from mono- to deca-nucleotide. Freq = Frequency</title></caption><table><tbody><thead><tr><th align="center" valign="middle" ></th><th align="center" valign="middle"  colspan="2"  >Genome</th><th align="center" valign="middle"  colspan="2"  >Transcriptome</th></tr></thead><tr><td align="center" valign="middle" >Type</td><td align="center" valign="middle" >Repeat</td><td align="center" valign="middle" >Freq.</td><td align="center" valign="middle" >Repeat</td><td align="center" valign="middle" >Freq.</td></tr><tr><td align="center" valign="middle" >Mono</td><td align="center" valign="middle" >A/T</td><td align="center" valign="middle" >99.5%</td><td align="center" valign="middle" >A/T</td><td align="center" valign="middle" >100.0%</td></tr><tr><td align="center" valign="middle" >Di</td><td align="center" valign="middle" >AT/AT</td><td align="center" valign="middle" >51.1%</td><td align="center" valign="middle" >AG/CT</td><td align="center" valign="middle" >64.6%</td></tr><tr><td align="center" valign="middle" >Tri</td><td align="center" valign="middle" >AAT/ATT</td><td align="center" valign="middle" >45.8%</td><td align="center" valign="middle" >AAG/CTT</td><td align="center" valign="middle" >34.5%</td></tr><tr><td align="center" valign="middle" >Tetra</td><td align="center" valign="middle" >AAAT/ATTT</td><td align="center" valign="middle" >54.5%</td><td align="center" valign="middle" >AAAG/CTTT</td><td align="center" valign="middle" >31.6%</td></tr><tr><td align="center" valign="middle" >Penta</td><td align="center" valign="middle" >AATAC/GTATT</td><td align="center" valign="middle" >23.2%</td><td align="center" valign="middle" >AAGAG/CTCTT, AACTC/GAGTT</td><td align="center" valign="middle" >16.7%</td></tr><tr><td align="center" valign="middle" >Hexa</td><td align="center" valign="middle" >AATGGG/CCCATT</td><td align="center" valign="middle" >13.0%</td><td align="center" valign="middle" >ACCATC/GATGGT, AAGATG/CATCTT</td><td align="center" valign="middle" >8.5%</td></tr><tr><td align="center" valign="middle" >Septa</td><td align="center" valign="middle" >AAAAAAT/ATTTTTT</td><td align="center" valign="middle" >15.6%</td><td align="center" valign="middle" >AAAAAAG/CTTTTTT, AAGAGAG/CTCTCTT</td><td align="center" valign="middle" >7.9%</td></tr><tr><td align="center" valign="middle" >Octa</td><td align="center" valign="middle" >AAAAAAAT/ATTTTTTT</td><td align="center" valign="middle" >6.6%</td><td align="center" valign="middle" >AAAAAAAG/CTTTTTTT</td><td align="center" valign="middle" >6.1%</td></tr><tr><td align="center" valign="middle" >Nona</td><td align="center" valign="middle" >AAAAAAAAT/ATTTTTTTT</td><td align="center" valign="middle" >6.5%</td><td align="center" valign="middle" >AAGAGAGAG/CTCTCTCTT</td><td align="center" valign="middle" >2.6%</td></tr><tr><td align="center" valign="middle" >Deca</td><td align="center" valign="middle" >AAAAAAAAAT/ATTTTTTTTT</td><td align="center" valign="middle" >6.4%</td><td align="center" valign="middle" >AAAGAGAGAG/CTCTCTCTTT</td><td align="center" valign="middle" >14.5%</td></tr></tbody></table></table-wrap><p>Poly T iterated 43. The longest di-nucleotide repeat in hemp genome was made of GT/AC repeated 8605 times (scaffold81868) whereas in the hemp CDS was AG/CT repeated 25 times (PK14152). Tri-nucleotide repeats were the first most abundant SSRs present within the hemp CDS and of the 64 triplet repeat types five: (ATG, PK16635), (ATA, PK09074), (TCT, PK14855), (CAA, PK15453), (AGA, PK13649) were made by 16 repeats, while in the genome the longest TTA tri-nucleotide was repeated 1401 times (scaffold120259) (<xref ref-type="table" rid="table3">Table 3</xref>).</p><p>Analysing of the 407,491 (genomic SSR) and 15,655 (CDS SSR) repeat motifs using the custom programs filterrepeatsmisa.py and getsequences.py revealed 3353 (0.82%) and 507 (3.24%) repeat motifs, respectively having an up- and downstream flanking region of at least 200 bp for the genomic SSRs and 150 for the CDS SSRs (http://www.hempssr.altervista.org/). The rationale for screening all SSRs generate by MISA using the above programs was necessary in order to capture individual microsatellites along with enough flanking sequence for the design of forward and reverse primers for PCR amplification. However using less stringent parameters probably the number of SSRs will increase.</p><p>Among all sequences reported (http://www.hempssr.altervista.org/), fifteen sequences (from genomic and CDS DNA) were randomly chosen to design primers flanking di-, tri-, tetra-, and hexa-nucleotide repeats (see <xref ref-type="table" rid="table1">Table 1</xref>) and validated by PCR. After PCR amplification all SSRs tested showed a prominent PCR product on the agarose gel (<xref ref-type="fig" rid="fig1">Figure 1</xref>(a)). Furthermore to analyse the potential of these markers for genetic variability studies four of them were tested on ten hemp cultivars. In Figures 1(b)-(e) is reported the PCR products after amplification. Although we tested only 4 SSRs the CDS-SSRs appeared more polymorphic than the genomic-SSRs (<xref ref-type="fig" rid="fig1">Figure 1</xref>(d) and <xref ref-type="fig" rid="fig1">Figure 1</xref>(e)).</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> (a) PCR-amplified product of 15 SSR markers tested on the C. sativa cv. Futura. PCR-am- plified product of Scaf5651 (b); Scaf30053 (c); PK24944 (d) and PK14152 (e) tested on ten hemp cul- tivars: 1 Eletta Campana; 2 Kc Dora; 3 Codimono; 4 Carmaleonte; 5 Felina; 6 Fibranova; 7 Fedora; 8 Futura; 9 Carmagnola; 10 Finola. M = Molecular marker size in base pair (bp)</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/24-2602459x8.png"/></fig></sec><sec id="s4"><title>4. Conclusion</title><p>Traditionally, SSR loci have been isolated from partially digested genomic DNA libraries of small size inserts. Conversely, as showed in this study, the use of whole-genome sequence provided a valuable resource for the development of SSR-markers saving both cost and time, once a sufficient amount of sequences are available. Indeed, in this study, we reported the analysis of whole-hemp genome sequence and the identification of many SSRs that would serve as an important resource for genetic studies.</p></sec><sec id="s5"><title>Acknowledgements</title><p>This study was partially supported by Regione Lombardia/CNR. Research Project FilAgro “Strategie innovative e sostenibili per la ﬁliera agroalimentare” Accordo Quadro N.18093/RCC del 5/8/2013. We would like to thank Dr. Mario G. Nenno for writing the custom programs (PySSRstat).</p></sec><sec id="s6"><title>Cite this paper</title><p>IncoronataGalasso,ElenaPonzoni, (2015) In Silico Exploration of Cannabis sativa L. Genome for Simple Sequence Repeats (SSRs). American Journal of Plant Sciences,06,3244-3250. doi: 10.4236/ajps.2015.619315</p></sec></body><back><ref-list><title>References</title><ref id="scirp.62020-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Zane, L., Bargelloni, L. and Patarnello T. (2002) Strategies for Microsatellite Isolation: A Review. Molecular Ecololy, 11, 1-16. http://dx.doi.org/10.1046/j.0962-1083.2001.01418.x</mixed-citation></ref><ref id="scirp.62020-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Tautz, D. (1989) Hypervariability of Simple Sequences as a General Source for Polymorphic DNA Markers. Nucleic Acids Research, 17, 6463-6471. http://dx.doi.org/10.1093/nar/17.16.6463</mixed-citation></ref><ref id="scirp.62020-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Morgante, M. and Olivieri, A.M. (1993) PCR-Amplified Microsatellites as Markers in Plant Genetics. The Plant Journal, 3, 175-182. http://dx.doi.org/10.1111/j.1365-313X.1993.tb00020.x</mixed-citation></ref><ref id="scirp.62020-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Powell, W., Machray, G. and Provan, J. (1996) Polymorphism Revealed by Simple Sequence Repeats. Trends Plant Science, 1, 215-222. http://dx.doi.org/10.1016/S1360-1385(96)86898-0</mixed-citation></ref><ref id="scirp.62020-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">van Bakel, H., Stout, J.M., Cote, A.G., Tallon, C.M., Sharpe, A.G., Hughes, T.R. and Page, J.E. (2011) The Draft Genome and Transcriptome of Cannabis sativa. Genome Biology, 12, R102. http://dx.doi.org/10.1186/gb-2011-12-10-r102</mixed-citation></ref><ref id="scirp.62020-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Sonah, H., Deshmukh, R.K., Sharma, A., Singh, V.P. and Gupta, D.K. (2011) Genome-Wide Distribution and Organization of Microsatellites in Plants: An Insight into Marker Development in Brachypodium. PLoS ONE, 6, e21298. http://dx.doi.org/10.1371/journal.pone.0021298</mixed-citation></ref><ref id="scirp.62020-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Weber, J.L. (1990) Informativeness of Human (dC-dA)n (dG-dT)n Polymorphisms. Genomics, 7, 524-530. http://dx.doi.org/10.1016/0888-7543(90)90195-Z</mixed-citation></ref><ref id="scirp.62020-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Morgante, M., Hanafey, M. and Powell, W. (2002) Microsatellites Are Preferentially Associated with Nonrepetitive DNA in Plant Genomes. Nature Genetics, 30, 194-200. http://dx.doi.org/10.1038/ng822</mixed-citation></ref><ref id="scirp.62020-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Li, Y.C., Korol, A.B., Fahima, T. and Nevo, E. (2004) Microsatellites within Genes: Structure, Function, and Evolution. Molecular Biology and Evolution, 21, 991-1007. http://dx.doi.org/10.1093/molbev/msh073</mixed-citation></ref><ref id="scirp.62020-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Grover, A., Aishwarya, V. and Sharma P.C. (2007) Biased Distribution of Microsatellite Motifs in the Rice Genome. Molecular Genetics and Genomics, 277, 469-480. http://dx.doi.org/10.1007/s00438-006-0204-y</mixed-citation></ref></ref-list></back></article>