<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">APM</journal-id><journal-title-group><journal-title>Advances in Pure Mathematics</journal-title></journal-title-group><issn pub-type="epub">2160-0368</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/apm.2019.93010</article-id><article-id pub-id-type="publisher-id">APM-91519</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Deconvolution of the Error Associated with Random Sampling
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Peter</surname><given-names>L. Irwin</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yiping</surname><given-names>He</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Chin-Yi</surname><given-names>Chen</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Molecular Characterization of Foodborne Pathogens, United States Department of Agriculture, Wyndmoor, PA, USA</addr-line></aff><pub-date pub-type="epub"><day>29</day><month>03</month><year>2019</year></pub-date><volume>09</volume><issue>03</issue><fpage>205</fpage><lpage>227</lpage><history><date date-type="received"><day>11,</day>	<month>February</month>	<year>2019</year></date><date date-type="rev-recd"><day>26,</day>	<month>March</month>	<year>2019</year>	</date><date date-type="accepted"><day>29,</day>	<month>March</month>	<year>2019</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p><html>
 <head></head>
 
  In this work empirical models describing sampling error (
  &amp;#916;) are reported based upon analytical findings elicited from 3 common probability density functions (
  PDF): the Gaussian, representing any real-valued, randomly changing variable 
  x of mean 
  &amp;#956; and standard deviation 
  &amp;#963;; the Poisson, representing counting data: 
  i.e., any integral-valued entity’s count of 
  x (cells, clumps of cells or colony forming units, molecules, mutations, etc.) per tested volume, area, length of time, etc. with population mean of 
  &amp;#956; and 
  <img src="Edit_c80031ba-63b7-456c-b76d-20e546e68215.bmp" alt="" />; binomial data representing the number of successful occurrences of something (
  x&lt;sup&gt;+&lt;/sup&gt;) out of 
  n observations or sub-samplings. These data were generated in such a way as to simulate what should be observed in practice but avoid other forms of experimental error. Based upon analyses of 10
  <sup>4</sup> 
  &amp;#916; measurements, we show that the average 
  &amp;#916; (
  <img src="Edit_d28b9c77-c9bd-4467-97f1-83270b4db043.bmp" alt="" />) is proportional to 
  <img src="Edit_47e340e5-fc42-49fe-9d79-09affd0e09b0.bmp" alt="" /> (
  &amp;#963;&lt;sub&gt;x&lt;/sub&gt;&amp;#8226;&amp;#956;&lt;sup&gt;-1&lt;/sup&gt;; Gaussian) or 
  <img src="Edit_c2251e07-9677-4976-8cad-8b93081cdf65.bmp" alt="" /> (Poisson &amp; binomial). The average proportionality constants associated with these disparate populations were also nearly identical (
  <img src="Edit_3c18ecf1-8065-44b0-bd67-50fbeb0e3fa1.bmp" alt="" />; &#177;
  s). However, since 
  <img src="Edit_221ce9fd-26cc-4a34-a93f-0efb58572a88.bmp" alt="" /> for any Poisson process, 
  <img src="Edit_46105c02-681b-427a-a0d9-857db2186682.bmp" alt="" />. In a similar vein, we have empirically demonstrated that binomial-associated 
  <img src="Edit_e7f6d5bb-8de0-44f2-ba8e-b1419d035cbe.bmp" alt="" /> were also proportional to &amp;#963;&lt;sub&gt;x&lt;/sub&gt;&amp;#8226;&amp;#956;&lt;sup&gt;-1&lt;/sup&gt;. Furthermore, we established that, when all 
  <img src="Edit_45bbf7f9-63de-4c9c-ba3c-2ab2072a2950.bmp" alt="" /> were plotted against either 
  <img src="Edit_d8425b47-e64a-41ed-ac5b-75e3b4ac4978.bmp" alt="" /> or &amp;#963;&lt;sub&gt;x&lt;/sub&gt;&amp;#8226;&amp;#956;&lt;sup&gt;-1&lt;/sup&gt;, there was only one relationship with a slope = 
  A (0.767 &#177; 0.0990) and a near-zero intercept. This latter finding also argues that all 
  <img src="Edit_80eaf25f-16c2-434b-8e0b-093c70154816.bmp" alt="" />, regardless of parent 
  PDF, are proportional to &amp;#963;&lt;sub&gt;x&lt;/sub&gt;&amp;#8226;&amp;#956;&lt;sup&gt;-1&lt;/sup&gt; which is the coefficient of variation for a population of sample means (
  <img src="Edit_a24f0001-4bc6-4516-a6a2-a81a8b19322e.bmp" alt="" />). Lastly, we establish that the proportionality constant 
  A is equivalent to the coefficient of variation associated with 
  &amp;#916; (
  <img src="Edit_40db5300-47cc-46d3-9742-0a0adcbee92f.bmp" alt="" />) measurement and, therefore, 
  <img src="Edit_370a84ef-a79e-454b-a829-575f14a76e3a.bmp" alt="" />. These results are noteworthy inasmuch as they provide a straightforward empirical link between stochastic sampling error and the aforementioned 
  C&lt;sub&gt;v&lt;/sub&gt;s. Finally, we demonstrate that all attendant empirical measures of 
  &amp;#916; are reasonably small (e.g., 
  <img src="Edit_ff3b33e1-4f7c-489f-9fb7-e92b15d696c6.bmp" alt="" />) when an environmental microbiome was well-sampled: 
  n = 16 - 18 observations with &amp;#956;&amp;#8764;3 isolates per observation. These colony counting results were supported by the fact that the two major isolates’ relative abundance was reproducible in the four most probable composition observations from one common population.
 
</html></p></abstract><kwd-group><kwd>Stochastic Sampling Error</kwd><kwd> Modeling</kwd><kwd> Most Probable Composition</kwd><kwd> Quantitative Metagenomics</kwd><kwd> Food-Borne Bacteria</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>There are various analytical procedures for enumerating organisms in environmental samples which diverge in their experimental approach yet are mathematically inter-related. Thus, if V represents the sample volume and V e the volume occupied by a test entity of interest (e.g., colony forming units or CFUs), the probability that one particular V e will not contain this entity at concentration δ [<xref ref-type="bibr" rid="scirp.91519-ref1">1</xref>] is</p><p>( V / V e − V ⋅ δ V / V e ) = ( 1 − V e ⋅ δ ) ;</p><p>i.e., V / V e ―maximum possible number of entities in V and V ⋅ δ ~the actual number of objects present.</p><p>Assuming that many V e aliquots have been combined to generate V, the probability that no organism will be contained in V is [<xref ref-type="bibr" rid="scirp.91519-ref1">1</xref>]</p><p>P − = [ 1 − V e ⋅ δ ] V V e</p><p>therefore</p><p>ln [ P − ] = V V e ⋅ ln [ 1 − V e ⋅ δ ] .</p><p>Since</p><p>ln [ 1 − ψ ] ~ − ψ − ψ 2 2 − ψ 3 3 − ψ 4 4 − ⋯</p><p>then, if ψ = V e ⋅ δ ,</p><p>ln [ P − ] ~ V V e ( − V e ⋅ δ − V e 2 ⋅ δ 2 2 − V e 3 ⋅ δ 3 3 − V e 4 ⋅ δ 4 4 − ⋯ ) ~ − V ⋅ δ ( 1 + V e ⋅ δ 2 + V e 2 ⋅ δ 2 3 + V e 3 ⋅ δ 3 4 + ⋯ ) .</p><p>For V e → 0 (e.g., E. coli [<xref ref-type="bibr" rid="scirp.91519-ref2">2</xref>] has a V e ~ 0.6   μ m 3 ~ 6 &#215; 10 − 13 mL ),</p><p>ln [ P − ] ~ − V ⋅ δ</p><p>P − = exp [ − V ⋅ δ ] = exp [ − μ ]</p><p>therefore</p><p>P + = 1 − P − = 1 − exp [ − V ⋅ δ ] = 1 − exp [ − μ ] . (1)</p><p>In certain circumstances it is only possible to determine an organism’s δ by diluting the sample to such an extent that only a fraction of the n “technical” replicates tested are positive ( x + ) for the presence of the entity, or microbe, in question [<xref ref-type="bibr" rid="scirp.91519-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref4">4</xref>]. This technique is referred to as the “dilution method” [<xref ref-type="bibr" rid="scirp.91519-ref1">1</xref>] since it involves diluting a test sample’s content to extinction ( δ → 0 ). This enumeration protocol is also known as the most probable number (MPN) method and entails sampling from a liquid source, making serial dilutions from this, distributing an aliquot of each of these dilutions into separate receptacles, incubating these under suitable growth conditions, and observing if any growth has occurred based upon some organism-specific detection method [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref6">6</xref>]. The MPN enumeration procedure is particularly useful when sampling from environmental sources, such as foods, since damaged cells frequently recover in liquid media [<xref ref-type="bibr" rid="scirp.91519-ref7">7</xref>].</p><p>For example, were one to obtain a food sample containing ~14 CFU of a particular organism per 50 g, the cells would typically be washed from the food matrix, concentrated to a few mL (e.g., via centrifugation), and brought up to some appropriate volume (say 40 mL = V<sub>sample</sub>) with media [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>]. From this, eight 4 mL (V) samples could be randomly selected and distributed into 8 separate receptacles (n = 8 with a dilution factor of 1; i.e., undiluted). Of the remaining 8 mL, 4 could be further diluted with 36 mL (40 mL total) liquid media, mixed and distributed into another set of 8 containers. This set of dilutions has a dilution factor of 0.1 relative to the original. With the remaining 8 mL from the 0.1 dilution, 4 mL could be diluted again with 36 mL media, mixed and distributed into yet another eight 4 mL replicates (dilution factor = 0.01). After incubation the most likely number (Equation (2), below) of positive occurrences (e.g., presence of a specific gene [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] ) observed would be x + = 6, 1, and 0 (out of n = 8 observations per dilution) for dilution factors of 1, 0.1, and 0.01, respectively, and the calculated MPN (&#177;s) per 50 g sample would = 13.8 &#177; 5.56. Note the relatively large error term. For a 4-fold proportional (200 g, 160 mL V<sub>sample</sub>) experiment with n = 32, the calculated MPN is 13.8 &#177; 2.78 per 50 g sample.</p><p>For MPN-based organism detection and subsequent enumeration, the number of positive occurrences of growth in any j<sup>th</sup> experiment out of n observations = x j + = ∑ i = 1 n θ i j (θ = either 1 [presence] or 0 [absence]) can be estimated as</p><p>x + ~ n ⋅ P + = n ( 1 − exp [ − V ⋅ δ ] ) (2)</p><p>whereupon x + is integral (=ROUND( n ⋅ P + , 0) in Excel). The probability of observing x + successes out of n Bernoulli trials [<xref ref-type="bibr" rid="scirp.91519-ref8">8</xref>] each of volume V from a population of δ entities per V is</p><p>P b = n ! x + ! ( n − x + ) ! ( P − ) n − x + ( P + ) x +</p><p>which is also known as the binomial PDF. Since n ⋅ P + = the population average (real) [<xref ref-type="bibr" rid="scirp.91519-ref9">9</xref>] number of positive responses out of n tests ( μ + ), the above can be also written as</p><p>P b = n ! x + ! ( n − x + ) ! ( 1 − μ + n ) n − x + ( μ + n ) x + . (3)</p><p>The multiple dilution MPN calculation itself is determined by finding the value of δ at the maximum in the product of the P b s from all l t h dilutions ( ∏ l P b , l ) and is easily achieved by adding the scaled sum of all dilutions’ ∂ δ P b &#247; P b values to an initial guess for δ (i.e.,</p><p>δ m + 1 = δ m + λ m &#215; ∑ l { ∂ δ P b , l &#247; P b , l } m = δ m + λ m &#215; ∑ l { ( x l + − n + ( x l + &#247; ( exp [ V ⋅ δ m ⋅ 0.1 l ] − 1 ) ) ) ⋅ V ⋅ 0.1 l } for any particular</p><p>ℓ<sup>th</sup> one-to-ten dilution and m iterations; λ is a monotonically changing, with m, scaling function) then solving for the MPN recursively [<xref ref-type="bibr" rid="scirp.91519-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref10">10</xref>] which minimizes the summation.</p><p>At the limit n → ∞, Equation (3) simplifies to what is known as the Poisson PDF</p><p>P P = μ x exp [ − μ ] x ! . (4)</p><p>Under these circumstances, x is the observed and μ is the population average number of counts in/on the tested volume, surface, chosen time period, etc. This PDF is applicable to all analytical systems involving, essentially, the counting of objects. However this PDF is applied, the most conspicuous aspect [<xref ref-type="bibr" rid="scirp.91519-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref12">12</xref>] of any Poisson process is that the variance ( σ 2 or second moment)</p><p>σ 2 = ∑ x = 0 ∞ ( x − μ ) 2 P P = μ</p><p>equals the population mean ( μ or first moment)</p><p>μ = ∑ x = 0 ∞ x ⋅ P P .</p><p>The last probability density function utilized in this stochastic sampling exercise is also related to P b , Equation (3). This is the Gaussian PDF which we use to quantitatively examine the effects of n and σ (fixed μ ) on the variability of sample means ( x &#175; ) which have been created by randomly sampling from a population of real-valued variables (x; e.g., doubling time [<xref ref-type="bibr" rid="scirp.91519-ref13">13</xref>] ) which are normally distributed as</p><p>P G = Area σ 2π exp [ − 1 2 ( x − μ σ ) 2 ] ; (5)</p><p>in this relationship the Area term (~ Δ x ⋅ ∑ k = 1 K f k ; for large K) is the approximate area under the fitting function f (frequently taken to be 1 since Δ x is often = 1 and ∑ f is always ~1). There are several derivations of P<sub>G</sub> but none are as persuasive as the fact that this PDF is simple and has been experimentally shown to be the most likely probability distribution associated with most experimental observations [<xref ref-type="bibr" rid="scirp.91519-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref12">12</xref>].</p><p>The original purpose of our sampling-related investigations [<xref ref-type="bibr" rid="scirp.91519-ref7">7</xref>] was to estimate a nominal value for n needed to achieve accurate most probable foodborne bacterial isolate enumeration, combined with 16S rDNA-based identification, for quantitative metagenomic purposes. The relationships were developed by examining the results of 6 &#215; 6 colony counting (Poisson PDF) of highly diluted bacteria [<xref ref-type="bibr" rid="scirp.91519-ref14">14</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] as a function of n and μ as well as by generating counts (x) derived from P P to simulate what occurred in the lab [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref16">16</xref>] but which avoided other forms of experimentally based error [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>]. We were able to establish that n min = n μ → 1 &#247; μ 3 where n μ → 1 is the number of observations necessary to accurately enumerate a population average of 1 count per volume tested. Based mainly on colony counting experience we estimate n μ → 1 is somewhere in the range n ~ 20 - 30 observations.</p><p>Herein we model stochastic sampling errors associated with all the aforementioned PDFs and empirically demonstrate that the resultant mathematical models are, in part, a consequence of the “central limit theorem” [<xref ref-type="bibr" rid="scirp.91519-ref17">17</xref>] (CLT). In general, the CLT states that a distribution of sample means ( x &#175; ), regardless of parent PDF, approaches a normal distribution analytically equivalent to P G , Equation (5), with x = x &#175; , μ = μ x &#175; , and with the σ 2 term = σ x &#175; 2 (= σ 2 &#247; n ) as the number of separate n-samplings increases. We also have elaborated on empirical findings developed previously [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref16">16</xref>] for predicting errors associated with the random sampling of microorganisms as well as comparing the internal variations associated with the three different sampling error data types derived from the Gaussian, binomial (MPN), and Poisson relationships. Thus, new results have been created using the aforementioned probability distributions, Equations (2), (4), and (5), and have been highly replicated since each “experiment”, comprising n (= 3, 6, 9, 12, or 24) observations, were repeated 100 times.</p></sec><sec id="s2"><title>2. Materials and Methods</title><sec id="s2_1"><title>2.1. Poisson-Based Data: Equation (4), <xref ref-type="fig" rid="fig1">Figure 1</xref></title><p>All counting data were created by multiplying Equation (4) by 360 in order to produce a large number of integral-valued repeats (=ROUND ( 360 ⋅ P P , 0)) for any particular count x: e.g., for μ = 1 particle per test volume, area, length of time, etc., there would be, most probably, 132 repeats of x = 0, 132 repeats of x = 1, 66 repeats of x = 2, 22 repeats of x = 3, 6 repeats of x = 4 and 1 repeat of x = 5 entities per test. From this pool of 360 counts for each μ, an n number of x values were randomly selected based upon random number tables created with Mathematica.</p><p><xref ref-type="table" rid="table">Table </xref>[ i = Random [ Integer, { 1,360 } ] , { i , n } ] (6)</p><p>which generates n random numbers between 1 and 360. Thus, 100 such random number sets were utilized for the twenty-five n (= 3, 6, 9, 12, 24) &#215; μ (= 1, 2, 4, 8, 16) combinations. Briefly, each procedure involved arranging the aforementioned 360 x values (one set for each μ) in one column of a spreadsheet followed by filling in n adjacent columns with formulae which refer to the calculated x values but where each row’s reference number was taken from the Mathematica-generated random number, Equation (6), next in sequence. MPN- and Gaussian-based data arrays were treated in an identical fashion. The formula (P-I: normalized deviations of s j from σ = μ ) for calculating our empirical measure of Poisson stochastic sampling error (Δ) was</p><p>Δ j = | μ − s j | μ (7)</p><p>whereupon the s j term is the experimental standard deviation ( ( n − 1 ) − 1 ∑ i = 1 n ( x i j − x &#175; j ) 2 or “=STDEV.S ( x i j -array )” in Excel) for each j<sup> </sup><sup>th</sup></p><p>( j = 1 , 2 , ⋯ , J ; J = 100) experiment and i<sup> </sup><sup>th</sup> ( i = 1 , 2 , ⋯ , n ) x. The average across</p><p>100&#215; experiments, regardless of formulation, were symbolized as Δ &#175; (= J − 1 ⋅ ∑ j = 1 J Δ j or “=AVERAGE ( Δ j -array )”). A second form for the Poisson-based measure of Δ was also calculated (P-II: normalized deviations of x &#175; j from known μ) from these same data</p><p>Δ j = | μ − x &#175; j | μ . (8)</p><p>Here the x &#175; j is the observed arithmetic mean for each j<sup> </sup><sup>th</sup> counting experiment.</p></sec><sec id="s2_2"><title>2.2. MPN Experiments: Equation (1), <xref ref-type="fig" rid="fig2">Figure 2</xref></title><p>All MPN data were created by multiplying Equation (1) by 360 to produce the number (“=ROUND ( 360 ⋅ P + , 0)”) of positive responses (θ = 1) for any particular level of V ⋅ δ (=μ); e.g., for μ = 0.1 entity per volume tested there would be 34 repeats of θ = 1 and 326 repeats of θ = 0. From such a column of 360 θ values (one column for each μ), n were randomly selected based upon Mathematica tables, Equation (6), and treated similar to the Poisson data above. Thus, for each combination of n (= 3, 6, 9, 12, or 24) &#215; μ (= 0.1, 0.2, 0.4, 0.8, 1.6), 100</p><p>random n-selections were performed. The formula for calculating our empirical measure of MPN sampling error was</p><p>Δ j = | n ⋅ P + − ∑ i = 1 n θ i j | n ⋅ P + = | μ + − x j + | μ + ; (9)</p><p>where θ = either a “1” (a positive occurrence) or a “0” (a negative occurrence). As before, the average Δ j across J = 100 experiments (each of n observations) = Δ &#175; . The MPN value for x &#175; j + = ln [ n &#247; ( n − x j + ) ] and provides the average MPN or CFU per sample; a rearrangement of Equation (2).</p></sec><sec id="s2_3"><title>2.3. Gaussian-Based Data: Equation (5), <xref ref-type="fig" rid="fig3">Figure 3</xref></title><p>All Gaussian PDF data were produced by multiplying Equation (5) ( Δ x = 1 ) by 360 producing an integral number of observations (“=ROUND ( 360 ⋅ P G , 0)”) for each value of x as a function of μ (fixed at 20) and σ (= 1, 1.5, 2, 3, 4). For instance, for σ = 1 there would be 2 repeats of x = 17, 19 repeats of x = 18, 87 repeats of x = 19, 144 repeats of x = 20, 87 repeats of x = 21, 19 repeats of x = 22, and 2 repeats of x = 23. From this column of 360 values of x, n (= 3, 6, 9, 12, or</p><p>24) were randomly selected based upon Equation (6) and treated identically to the Poisson and MPN data sets. Thus, for each combination of n &#215; σ 100&#215; n-based selections were performed. The formula for calculating our empirical measure of Gaussian sampling error, similar to Equation (7), was</p><p>Δ j = | σ − s j | μ . (10)</p><p>As usual, the average Δ j across J = 100 such sets of experiments each of n observations = Δ &#175; .</p></sec><sec id="s2_4"><title>2.4. Other Calculations</title><p>All curve-fitting was based upon a modified Gauss-Newton algorithm by least squares [<xref ref-type="bibr" rid="scirp.91519-ref18">18</xref>] minimization performed on a Microsoft Excel spreadsheet: [<xref ref-type="bibr" rid="scirp.91519-ref19">19</xref>] some of these results were fit to the algebraic form f [ X ] = constant ⋅ X a . However, certain MPN data ( x &#175; + and x + ) were also fit to a Gaussian (Equation (5): P G [ x &#175; + ] or P G [ x + ] ) with Δ x used as one of the parameters to be iteratively resolved (i.e., deconvolved). Where appropriate, confidence limits (CL) have been calculated using an approach applicable to any hypothetical fitting function f k = f [ X k ; π p ] : k = 1 , 2 , ⋯ , K rows of the observed X-Y data sets with up to P (typically ≤ 3) fitting parameters π p ( p = 1 , 2 , ⋯ , P ). In this procedure we use the propagation of error method [<xref ref-type="bibr" rid="scirp.91519-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref20">20</xref>] for estimating the standard error associated with each f k ( s f k ; illustrated below for P = 2 fitting parameters) data point</p><p>C L = t ⋅ s f k = t s π 1 2 [ ∂ π 1 f k ] 2 + s π 2 2 [ ∂ π 2 f k ] 2 + 2 ⋅ s π 1 π 2 2 ⋅ ∂ π 1 f k ⋅ ∂ π 2 f k</p><p>where, for any particular fitting parameter ω , s ω = s Y 2 ⋅ [ Z T Z ] ω ω − 1 = “asymptotic standard error” [<xref ref-type="bibr" rid="scirp.91519-ref19">19</xref>] (ASE; s Y 2 = residual sum of squares &#247; [K − P]), and the ∂ π p f k terms symbolize ∂ f k / ∂ π p . The above equation simplifies to</p><p>C L = t 0.01 ⋅ s f k = t 0.01 s Y 2 ( Z k [ Z T Z ]   − 1 Z k T ) .</p><p>In all the above relationships Z is the partial first derivative matrix of f k with respect to the parameters π 1 and π 2 (i.e., a 2-parameter fit) such that</p><p>Z = [ ∂ π 1 f 1 ∂ π 2 f 1 ∂ π 1 f 2 ∂ π 2 f 2 ⋮ ⋮ ∂ π 1 f K ∂ π 2 f K ] ,</p><p>Z T is the transpose of Z, Z k = [ ∂ π 1 f k ∂ π 2 f k ] (K row vectors), and s Y 2 ⋅ [ Z T Z ] − 1 is the variance-covariance matrix [<xref ref-type="bibr" rid="scirp.91519-ref21">21</xref>]. CL were not used for all results since they might have muddled analytical aspects of the compositions.</p></sec><sec id="s2_5"><title>2.5. Microbiome Sampling Data</title><p>For the food microbiome sampling experiment ~25 g of commercial, pre-thawed (~15 min at room temperature), frozen vegetables were washed with a volume of phosphate buffered saline (PBS; 10 mM Na<sub>2</sub>HPO<sub>4</sub> + 2 mM NaH<sub>2</sub>PO<sub>4</sub> + 137 mM NaCl; pH 7.4 &#177; 0.2; Boston BioProducts, 159 Chestnut Street, Ashland, MA 01721) equivalent to double the mass of the sample. In order to assist in the detachment of plant tissue-bound cells, 0.075% [w/v] Tween-20 (Sigma-Aldrich, 3050 Spruce St., St. Louis, MO 63103) was added to the PBS and filter sterilized. All washing was performed in sanitized plastic zip-lock bags wherein the formerly frozen vegetables and buffer wash were gently agitated at 80 rpm for approximately 20 min and immediately passed through a 40 μm nylon filter (BD Falcon; Becton Dickinson Biosciences, Bedford, MA) to remove large particles.</p><p>Directly sampled washes (5 mL Control = Observation I [cultured at 30˚C] and III [cultured at 37˚C]) as well as hollow fiber microfilter-concentrated (each 5 mL sample was diluted to ~100 mL PBS + Tween, concentrated, then washed with another 100 mL buffer, and eluted with ~5 mLs PBS + Tween = Observation II [cultured at 30˚C] and IV [cultured at 37˚C]) samples were collected and enumerated using the 6 &#215; 6 drop plate method [<xref ref-type="bibr" rid="scirp.91519-ref14">14</xref>] but using 1:2 serial dilutions for colony selection on Brain Heart Infusion agar (BHI + 2% [w/v] agar). Briefly, this drop plate method involved loading 400 μL of each wash (either control or concentrated samples brought back to the control sample’s original volume = 5 mL) filtrate into the first well (row A) of a 96-well microtiter plate. Two-fold serial dilutions were made by transferring 200 μL (multichannel pipette, Rainin, Emeryville, CA) from the first row (row A; dilution 0) into 200 μL of diluent (PBS) in the 2nd row (row B; dilution 1), mixing 10 times while continuously stirring, and repeating the process until five 1:2 dilutions were produced; pipette tips were changed between dilutions. Based on a previous analysis of 6 &#215; 6 drop plate sampling error [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] , we sampled n = 16 - 18 seven μL volumes from each of the 6 dilutions (dilutions 0 - 5; overall dilution factors of 0.5<sup>0</sup> = 1 to 0.5<sup>5</sup> = 0.03125) and drop-plated these onto BHI agar media using a multichannel pipette. After plating, the droplets were allowed to dry, inverted and then incubated at two temperatures (either 30˚C or 37˚C; 3 plates for each temperature and treatment combination). Colonies were counted after 16 - 24 hours. Colony collection for our 16S rDNA bacterial identification protocol [<xref ref-type="bibr" rid="scirp.91519-ref7">7</xref>] involved selecting all colonies from dilution 2 (0.5<sup>2</sup> = 0.25 dilution; x &#175; = 2.79 &#177; 1.52 colonies per drop; &#177; s; the fact that x &#175; = 1.67 ~ s might argue for an appropriately sampled population).</p><p>Each colony (n total) was carefully removed from the agar plate’s surface using a Rainin L20 tip, dispersed into 200 μL BHI in a 96-well plate and incubated at 30˚C for 16 - 24 hours. These cultures were restreaked onto solid media and incubated at 30˚C overnight. One colony from each of the original n plates was selected, suspended into 25 μL of Ultra PrepMan (Applied Biosystems, Foster City, CA) in a PCR tube and heated in a thermocycler at 99˚C for 15 min. Upon cooling, samples were centrifuged 10 min. to separate the DNA solution from the cell debris. A sample of supernatant was transferred to a new tube for the DNA amplification step (end-point PCR). Once the 16S rRNA “gene” amplification, sequencing reactions (EubA and EubB primers) and Sanger sequencing were performed, DNA sequences were edited, and contigs assembled using Sequencher software as explained in detail previously [<xref ref-type="bibr" rid="scirp.91519-ref7">7</xref>].</p></sec></sec><sec id="s3"><title>3. Results and Discussion</title><p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows results related to averages of 100 &#215; Δ j values ( Δ &#175; ) derived from Equations (7) (P-I, black data) or (8) (P-II, red data) as a function of n (<xref ref-type="fig" rid="fig1">Figure 1</xref>(A)) and μ (<xref ref-type="fig" rid="fig1">Figure 1</xref>(B)). The least squares curve-fitting results show that the <xref ref-type="fig" rid="fig1">Figure 1</xref>(A) data follow the general form Δ &#175; = Δ n ⋅ n a &#175; whereupon a &#175; (averaged across 5 n-based fits) = −0.556 &#177; 0.00986 (black data sets; &#177;s) or a &#175; = − 0.529 &#177; 0.0387 (red data). These findings suggest that Δ &#175; changes as the inverse square root of n for all values of μ. <xref ref-type="fig" rid="fig1">Figure 1</xref>(C) displays these same results on a linearized scale (X-axis = n − 2 ) whereupon the slopes ( ∂ Δ &#175; / ∂ [ n − 2 ] ) ~ Δ n . <xref ref-type="fig" rid="fig1">Figure 1</xref>(B) illustrates that the Δ n values derived from <xref ref-type="fig" rid="fig1">Figure 1</xref>(A) non-linear regression change as the inverse square root of μ: i.e., Δ n = A ⋅ μ a where a = −0.547 &#177; 0.0179 (black data) or −0.503 &#177; 0.0374 (red data); a &#177; ASE. <xref ref-type="fig" rid="fig1">Figure 1</xref>(D) shows <xref ref-type="fig" rid="fig1">Figure 1</xref>(B) results plotted on an appropriately linearized scale (X-axis = μ − 2 ) as indicated by the above analysis whereupon the slope ( ∂ Δ n / ∂ [ μ − 2 ] ) ~ A . Combining results from <xref ref-type="fig" rid="fig1">Figure 1</xref>(A) and <xref ref-type="fig" rid="fig1">Figure 1</xref>(B) we see that Δ &#175; ~ A ⋅ n ⋅ μ − 2 . The average value for A was 0.804 &#177; 0.0460 (P-I &amp; P-II curve-fitting results &#177;s).</p><p><xref ref-type="fig" rid="fig2">Figure 2</xref> displays MPN-based enumeration data, Equation (9), manipulated in a similar fashion as that of the above Poisson-based results with a nearly identical result. The least squares curve-fitting shows that the data in <xref ref-type="fig" rid="fig2">Figure 2</xref>(A) once again follow the general form Δ &#175; = Δ n ⋅ n a &#175; with a &#175; = −0.554 &#177; 0.0499 (&#177;s) which is the average a from 5&#215; μ-based data sets. <xref ref-type="fig" rid="fig2">Figure 2</xref>(C) shows these same findings graphed on a linearized scale ( X = n − 2 ) whereupon the slopes = Δ n . <xref ref-type="fig" rid="fig2">Figure 2</xref>(B) also shows that the Δ n values, derived from <xref ref-type="fig" rid="fig2">Figure 2</xref>(A) non-linear regression, change as the inverse square root of μ: Δ n = A ⋅ μ a where a = −0.515 &#177; 0.0910 (&#177;ASE). As previously observed, when these results are presented on a linearized scale ( X = μ − 2 ; <xref ref-type="fig" rid="fig2">Figure 2</xref>(D)) the slope is equivalent to the parameter A. Combining fitting results from <xref ref-type="fig" rid="fig2">Figure 2</xref>(A) and <xref ref-type="fig" rid="fig2">Figure 2</xref>(B) we again note that Δ &#175; ~ A ⋅ n ⋅ μ − 2 (A = 0.807 &#177; 0.139; &#177; ASE).</p><p>Completely homologous relationships to the Poisson and MPN findings were also noted with Gaussian-based data (<xref ref-type="fig" rid="fig3">Figure 3</xref>) whereupon the least squares curve-fitting in <xref ref-type="fig" rid="fig3">Figure 3</xref>(A) shows that these data obey, again, the general form Δ &#175; = Δ n ⋅ n a &#175; whereupon a &#175; = −0.561 &#177; 0.0276 (&#177;s; averaged across all σ since μ was fixed). <xref ref-type="fig" rid="fig3">Figure 3</xref>(C) has these same findings plotted on a linear scale ( X = μ − 2 ) where the slopes = Δ n . <xref ref-type="fig" rid="fig3">Figure 3</xref>(B) and <xref ref-type="fig" rid="fig3">Figure 3</xref>(D) also show that the Δ n values derived from <xref ref-type="fig" rid="fig3">Figure 3</xref>(A) and <xref ref-type="fig" rid="fig3">Figure 3</xref>(C) non-linear regression change linearly with σ &#247; μ : i.e., Δ n = A ⋅ σ &#247; μ (A = 0.725 &#177; 0.0977; &#177; ASE). All Gaussian-based data fitting results combined indicate that Δ &#175; = A ⋅ σ ⋅ n − 2 ⋅ μ − 1 = A ⋅ σ x &#175; ⋅ μ − 1 = A ⋅ C V [ x &#175; ] whereupon C V [ x &#175; ] is the coefficient of variation for a population of means associated with x.</p><sec id="s3_1"><title>3.1. Equivalence of Sampling Errors Associated with Any PDF</title><p>The counting results alluded to above (P-I, P-II, &amp; MPN) are similar to those observed previously: [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref16">16</xref>] i.e., stochastic sampling errors associated with microbiological colony counting and MPN data are proportional to the inverse square root of n &#215; μ. Also, the Poisson population-based results compare favorably with those obtained from actual colony counting experiments [<xref ref-type="bibr" rid="scirp.91519-ref14">14</xref>]. Thus, for all Poisson-based data (<xref ref-type="fig" rid="fig1">Figure 1</xref>)</p><p>Δ &#175; ∝ 1 n ⋅ μ = 1 n ⋅ μ μ = σ n ⋅ 1 μ = σ x &#175; μ = C V [ x &#175; ] (11)</p><p>because σ = μ . We have simplified the expression by utilizing the term σ x &#175; [<xref ref-type="bibr" rid="scirp.91519-ref22">22</xref>] (= σ &#247; n ) which can be derived using the propagation of errors method [<xref ref-type="bibr" rid="scirp.91519-ref20">20</xref>]. Such nomenclature exemplifies the utilization of P G , as an approximation for P P , associated with a population of sample means ( x &#175; ) of mean μ x &#175; and standard deviation σ x &#175; . However, for MPN results, does σ ~ μ as an approximation? This question is addressed in detail (Figures 4-6).</p><p>In <xref ref-type="fig" rid="fig4">Figure 4</xref>(A) and <xref ref-type="fig" rid="fig4">Figure 4</xref>(C), we have examined some of our MPN data (μ = 0.8 per sample in <xref ref-type="fig" rid="fig4">Figure 4</xref>(A) and μ = 0.4 per sample in <xref ref-type="fig" rid="fig4">Figure 4</xref>(C) at the various levels of n-sampling) by converting the total number of positive occurrences ( x j + ) in n observations to the most probable number of entities in the hypothetical sampled aliquot ( x &#175; j + = ln [ n / ( n − x j + ) ] ) and curve-fit the frequency of occurrence of each x &#175; j + to Gaussian PDFs (Equation (5); P G [ x &#175; + ] ). From these curve fits we extracted the parameters σ f i t and μ f i t . In <xref ref-type="fig" rid="fig4">Figure 4</xref>(B) and <xref ref-type="fig" rid="fig4">Figure 4</xref>(D) we show that the average σ f i t n ~ μ f i t (i.e., σ f i t = μ f i t &#247; n = σ x &#175; ) and, therefore, σ = μ . This finding indicates that Equation (11) can be applied to both Poisson and MPN results as a reasonable approximation. We have confirmed the MPN results in <xref ref-type="fig" rid="fig2">Figure 2</xref> and <xref ref-type="fig" rid="fig4">Figure 4</xref> by showing that the frequency distribution of x + which we have observed in these experiments closely follows Equation (3) (compare <xref ref-type="fig" rid="fig5">Figure 5</xref>(A) with <xref ref-type="fig" rid="fig5">Figure 5</xref>(B)) whereupon we establish that σ f i t + , the standard deviation associated with the distribution of x + via the Gaussian approximation, was proportional to μ f i t + (<xref ref-type="fig" rid="fig5">Figure 5</xref>(C)) for both observed (red data) and calculated (blue data) x + with a proportionality constant numerically similar to A (=0.735 &#177; 0.0543; &#177;ASE) alluded to above.</p><p>The equality in Equation (11) is also visually confirmed by the results shown in <xref ref-type="fig" rid="fig6">Figure 6</xref> where one can see that all values of Δ &#175; closely follow the linear expression Δ &#175; = A ⋅ X (for X = σ x &#175; &#247; μ or n ⋅ μ − 2 ; A = 0.781 &#177; 0.0107; &#177;ASE) showing that</p><p>∂ Δ &#175; ∂ [ n ⋅ μ − 2 ] = ∂ Δ &#175; ∂ [ σ x &#175; &#247; μ ] .</p><p>Since the combined data in <xref ref-type="fig" rid="fig6">Figure 6</xref> are linear with a near-zero intercept (−0.0168 &#177; 0.00443), then</p><p>Δ &#175; n ⋅ μ − 2 = Δ &#175; σ x &#175; &#247; μ</p><p>therefore cross-multiplying gives</p><p>Δ &#175; ⋅ σ x &#175; &#247; μ = Δ &#175; ⋅ n ⋅ μ</p><p>and dividing both sides by Δ &#175; produces the equality</p><p>σ x &#175; &#247; μ = n ⋅ μ − 2 .</p><p>All sampling error-related findings are summarized in <xref ref-type="fig" rid="fig7">Figure 7</xref>.</p></sec><sec id="s3_2"><title>3.2. Demonstration That A = ∂ s Δ j / ∂ Δ &#175; = C V [ Δ j ]</title><p>Lastly, all these assertions are substantiated by the observation (<xref ref-type="fig" rid="fig8">Figure 8</xref>) that the standard deviations associated with all our sampling error measurements ( s Δ j ) change linearly as a function of the 4 (P-I, P-II, MPN, Gaussian) sets of Δ &#175; data with an average slope (i.e., average of the 4 ∂ s Δ j / ∂ Δ &#175; values = 0.716 &#177; 0.0739) equivalent to the various values for A in Figures 1-3, <xref ref-type="fig" rid="fig5">Figure 5</xref> and <xref ref-type="fig" rid="fig6">Figure 6</xref>. In fact, the slope in <xref ref-type="fig" rid="fig8">Figure 8</xref> defines the coefficient of variation in Δ &#175; ( C V [ Δ j ] ) and, if equal to A, then</p><p>∂ s Δ j ∂ Δ &#175; = ∂ Δ &#175; ∂ X (12)</p><p>where X = either n ⋅ μ − 2 or σ x &#175; &#247; μ . Since s Δ j in <xref ref-type="fig" rid="fig8">Figure 8</xref> and Δ &#175; in <xref ref-type="fig" rid="fig6">Figure 6</xref> are linear functions with a near zero intercept then, assuming Equation (12) is true,</p><p>s Δ j Δ &#175; = Δ &#175; X .</p><p>Substituting Δ &#175; with A ⋅ X</p><p>s Δ j A ⋅ X = A ⋅ X X</p><p>s Δ j X = A 2</p><p>s Δ j = A 2 ⋅ X = A ( A ⋅ X ) = A ⋅ Δ &#175;</p><p>and therefore</p><p>s Δ j Δ &#175; = C V [ Δ j ] = A</p><p>The above equality establishes that the coefficient of variation associated with Δ &#175; ( C V [ Δ j ] ) is equivalent to the proportionality constant A seen in Figures 1-3 and <xref ref-type="fig" rid="fig6">Figure 6</xref>. Thus sampling errors can be estimated from the relationship Δ &#175; = C V [ Δ j ] &#215; C V [ x &#175; ] whereupon C V [ Δ j ] ~ 0.75 for all PDFs we have tested.</p></sec><sec id="s3_3"><title>3.3. Minimized Errors Associated with a Well-Sampled Food Microbiome via Most Probable Composition [<xref ref-type="bibr" rid="scirp.91519-ref7">7</xref>] <sup> </sup></title><p>Based upon these results, the estimation of C V [ x &#175; ] (i.e., s x &#175; &#247; x &#175; ) should be germane in determining if data have been appropriately sampled. <xref ref-type="fig" rid="fig9">Figure 9</xref> illustrates that all stochastic errors associated with native aerobic bacteria surviving</p><p>on commercially available, frozen vegetables were sufficiently sampled using an n = 16 - 18 inasmuch as the C V [ x &#175; ] -values associated with the normalized colony counts (CFU g<sup>−1</sup> averaged across all l dilutions = x &#175; l &#247; 0.007 mL per drop &#247; 0.5 l dilution factor &#215; 57.2 mL total original sample volume &#247; 28.6 g total frozen vegetable mass) were appropriately small (ranging between ca. 2% to 4%). In a</p><p>similar vein, it is pertinent that the observed (s) and calculated ( x &#175; l ) standard deviations associated with the counts per drop were equivalent since the average deviation ( | s − x &#175; l | ) from ideality varied only 15.7% &#177; 3.54% ( &#177; s x &#175; ). Lastly it is also significant that the dilution factors calculated from the ratios of average plate counts ( x &#175; l &#247; x &#175; l − 1 ) were very close to &#189; (average 0.523 &#177; 0.0172) which also argues for a minimized Δ .</p><p>Across the 4 observational sets (I, II, III, and IV) depicted in <xref ref-type="fig" rid="fig9">Figure 9</xref>, the total number of collected colonies (from l = 2 ) was 55 (n = 16), 49 (n = 16), 42 (n = 17), and 41 (n = 18), respectively. Bacteria identifications for each of these colonies were based upon rDNA sequence matching 1200 - 1400 basepair contigs searching against NCBI’s GenBank database. The rRNA “gene” sequencing results for the 2 major isolates (making up 88.3% &#177; 3.28% of the total sampled colonies) show that the 4 sets of observed bacterial compositions were nearly identical (43.6% &#177; 8.05% Luconostoc and 44.6% &#177; 13.3% Lactococcus; &#177;s) [<xref ref-type="bibr" rid="scirp.91519-ref23">23</xref>]. The remainder of the colonies was mainly Acinetobacter (3.74% &#177; 3.34%) and Streptococcus (4.17% &#177; 2.75%) with small amounts of diverse isolates (e.g., Staphylococcus, Arthrobacter, Sphingobacterium, Enterococcus, Kocuria, Raoultella, and Bacillus: averaging 1.49% &#177; 1.09% each). Such variability is expected for the relatively rare isolates (≤4%) due to errors associated with random sampling. The two major species sampled were relatively repeatable because of their abundance, adequate sampling, and very little treatment effect. The minor constituents would have to have been sampled 2.77 &#177; 0.647-fold more (n &gt; 44) for an equivalent accuracy to the Luconostoc and Lactococcus fractions since the requisite number of samplings for the low count fractions, above, is proportional to the inverse cube root [<xref ref-type="bibr" rid="scirp.91519-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.91519-ref16">16</xref>] of the number of counts per sampled volume (~ x &#175; m a j o r 3 &#247; x &#175; m i n o r 3 ).</p></sec></sec><sec id="s4"><title>4. Summary</title><p>We have performed analyses associated with empirical stochastic sampling errors linked to data generated from 3 common probability density functions. We have used these to describe the limiting behavior of Δ by generating models which suggest a generalized, and facile, mathematical solution. Based upon all our experiments, the common algebraic solution, regardless of parent distribution, is that experimental sampling errors are proportional to σ x &#175; &#247; μ . This generalized relationship is intuitively reasonable inasmuch as this is the C V for any population of sample means ( C V [ x &#175; ] ) and describes how closely x &#175; values approach μ as n increases. The proportionality constant for all these findings was found to be mathematically related to C V [ Δ j ] or ∂ s Δ j / ∂ Δ &#175; , which is the coefficient of variation associated with the error measurement itself. Lastly, using estimates of these sampling-associated errors ( C V [ x &#175; ] ~ s x &#175; &#247; x &#175; ), we show that when a test microbiome was sufficiently sampled, several measures of stochastic sampling error were reasonably small for both counting and DNA sequence-based results.</p></sec><sec id="s5"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s6"><title>Cite this paper</title><p>Irwin, P.L., He, Y. and Chen, C.-Y. (2019) Deconvolution of the Error Associated with Random Sampling. Advances in Pure Mathematics, 9, 205-227. https://doi.org/10.4236/apm.2019.93010</p></sec><sec id="s7"><title>Definitions</title><p>Indices = i ( = 1 , 2 , ⋯ , n ) observations per experiment; j ( = 1 , 2 , ⋯ , J = 100 ) experiments with n observations each; k ( = 1 , 2 , ⋯ , K ) rows of X-Y values; l ( = 1 , 2 , ⋯ , L ) dilutions; m ( = 1 , 2 , ⋯ , M ) iterations; p ( = 1 , 2 , ⋯ , P ) parameters</p><p>Δ j = j<sup> </sup><sup>th</sup> experimental measure of sampling error out of J = 100 experiments: Equations (7)-(10).</p><p>Δ &#175; = average sampling error in J = 100 observations of Δ j</p><p>A = proportionality constant associated with Δ &#175; curve-fitting to n, μ (or σ)</p><p>s Δ j = standard deviation associated with Δ j measurement; for this work there are 25 ( n &#215; μ or n &#215; σ for the Gaussian populations) such s Δ j for each PDF type (2 types of Poisson, MPN or binomial, Gaussian)</p><p>μ = for either Poisson PDF or MPN assays ( μ = V ⋅ δ ), the population average number of biological entities, or other analytes, per test; for Gaussian PDF, the population’s average of any real-valued, randomly changing variable</p><p>V = the sample volume to be tested</p><p>V e = volume of the biological entity, or other analyte, being tested</p><p>δ = concentration of the biological entity (count &#247; V) or other analyte</p><p>μ + = population average number of positive growth responses (MPN) out of n observations; μ + = n ⋅ P +</p><p>σ + = the standard deviation associated with the probability density of x + ; the Gaussian approximations for σ + are plotted in <xref ref-type="fig" rid="fig5">Figure 5</xref>(C) as a function of Gaussian best fits for μ +</p><p>P − = probability that V e will NOT contain the biological entity, or other analyte, being tested</p><p>P + = probability that V e will contain the biological entity, or other analyte, being tested; P + = 1 − P − ; Equation (1)</p><p>∂ X f [ X ] = ∂ f [ X ] / ∂ X</p><p>x i j = for Poisson populations, the i<sup> </sup><sup>th</sup> observation’s number of counts per tested volume, surface area, etc. for each j<sup> </sup><sup>th</sup> experiment; for Gaussian populations, any real-valued, randomly changing variable</p><p>x &#175; j = 1 n ⋅ ∑ i = 1 n x i j</p><p>x j + = j<sup> </sup><sup>th</sup> experiment’s number of positive growth responses out of n observations; x j + = ∑ i = 1 n θ i j where θ = 1 (positive) or 0 (negative)</p><p>x &#175; j + = j<sup> </sup><sup>th</sup> experiment’s number of positive counts in V volume; x &#175; j + = ln [ n &#247; ( n − x j + ) ] ; the x-bar symbol is used here because this relations contains a parameter, x j + , which is the result of a summation across all θ i j ; it just isn’t normalized to n</p><p>n = number of technical replicates in each j<sup> </sup><sup>th</sup> experiment; for MPN, number of observations each of volume V; for Poisson populations we have found [<xref ref-type="bibr" rid="scirp.91519-ref15">15</xref>] that the minimal number of replicates per assay was n c a l c = n μ → 1 ⋅ μ − 3 where n μ → 1 is the number of replicates necessary to enumerate a population with μ = 1</p><p>σ = population standard deviation associated with μ</p><p>σ x &#175; = standard deviation of a population of sample means ( x &#175; ); the formula for the σ x &#175; statistic can be derived from the propagation of errors method [<xref ref-type="bibr" rid="scirp.91519-ref20">20</xref>] without covariance</p><p>σ x &#175; = ( ∂ x &#175; ∂ x 1 ) 2 σ x 1 2 + ( ∂ x &#175; ∂ x 2 ) 2 σ x 2 2 + ⋯ + ( ∂ x &#175; ∂ x n ) 2 σ x n 2 = n σ 2 n 2 = σ n</p><p>since</p><p>∂ x &#175; ∂ x 1 = ∂ x &#175; ∂ x 2 = ⋯ = ∂ x &#175; ∂ x n = 1 n</p><p>and</p><p>σ x 1 2 = σ x 2 2 = ⋯ = σ x n 2 = σ 2 .</p><p>s j = any j<sup>th</sup> experiment’s estimation of population standard deviation</p><p>s x &#175; = estimation of σ x &#175; from a limited number of x &#175; j ; s x &#175; = s j &#247; n</p><p>C V [ x &#175; ] = coefficient of variation for a population of means; C V [ x &#175; ] = σ x &#175; &#247; μ x &#175; = σ x &#175; &#247; μ estimated as s x &#175; &#247; x &#175;</p><p>C V [ x ] = coefficient of variation for any set of observations x; C V [ x ] = σ μ estimated as s x &#175;</p><p>C V [ Δ j ] = ∂ s Δ j / ∂ Δ &#175; ~ s Δ j &#247; Δ &#175; if the s Δ j vs. Δ &#175; intercept ~ 0</p><p>CLT = central limit theorem: the mean ( μ x &#175; ) of a population of observed means ( x &#175; ) will be approximately equal to the mean of the sampled population (μ) and the standard deviation of this population of means will be approximately equal to σ x &#175; ; Equation (5) with x = x &#175; , μ = μ x &#175; = μ , and σ = σ x &#175;</p><p>PDF = probability density function or probability distribution function</p><p>P b = binomial PDF: Equation (3)</p><p>P P = Poisson PDF: Equation (4)</p><p>P G = Gaussian PDF: Equation (5)</p><p>CL = confidence limit = t-statistic &#215; s f k = t ⋅ s f k</p><p>ASE = asymptotic standard error [<xref ref-type="bibr" rid="scirp.91519-ref19">19</xref>] ; for any fitting parameter ω ,</p><p>A S E = s ω = s Y 2 ⋅ [ Z T Z ] ω ω − 1 ; s Y 2 = residual sum of squares &#247; (K − M) where M =</p><p>the number of fitting parameters π p ( p = 1 , 2 , ⋯ , P )</p><p>s f k = k<sup>th</sup> row standard error of fitting function f<sub>k</sub>; s f k = s Y 2 ( Z k [ Z T Z ] − 1 Z k T )</p><p>Z = partial first derivative matrix of f k with respect to associated fitting parameters π 1 , π 2 , ⋯ , π P</p><p>Z T = transposition of Z</p><p>Z k = [ ∂ π 1 f k ∂ π 2 f k ] for f k = f [ X k ; π p ]</p></sec></body><back><ref-list><title>References</title><ref id="scirp.91519-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Halvorson, H.O. and Ziegler, N.R. (1933) Application of Statistics to Problems in Bacteriology. I. A Means of Determining Bacterial Population by the Dilution Method. Journal of Bacteriology, 25, 101-121.</mixed-citation></ref><ref id="scirp.91519-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Kubitschek, H.E. (1990) Cell Volume Increase in Escherichia coli after Shifts to Richer Media. Journal of Bacteriology, 172, 94-101. https://doi.org/10.1128/jb.172.1.94-101.1990</mixed-citation></ref><ref id="scirp.91519-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Barkworth, H. and Irwin, J.O. (1938) Distribution of Coliform Organisms in Milk and the Accuracy of the Presumptive Coliform Test. Journal of Hygiene, 38, 446-457. https://doi.org/10.1017/S0022172400011311</mixed-citation></ref><ref id="scirp.91519-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Best, D.J. (1990) Optimal Determination of Most Probable Numbers. International Journal of Food Microbiology, 11, 159-166. https://doi.org/10.1016/0168-1605(90)90051-6</mixed-citation></ref><ref id="scirp.91519-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P., Reed, S., Nguyen, L., Brewster, J. and He, Y. (2013) Non-Stochastic Sampling Error in Quantal Analyses for Campylobacter Species on Poultry Products. Analytical and Bioanalytical Chemistry, 405, 2353-2369. https://doi.org/10.1007/s00216-012-6659-2</mixed-citation></ref><ref id="scirp.91519-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P., Gehring, A., Tu, S.-I., Brewster, J., Fanelli, J. and Ehrenfeld, E. (2000) Minimum Detectable Level of Salmonellae Using a Binomial-Based Ice Nucleation Detection Assay. Journal of AOAC International, 83, 1087-1095.</mixed-citation></ref><ref id="scirp.91519-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P.L., Nguyen, L.-H.T., Chen, C.-Y. and Paoli, G. (2008) Binding of Nontarget Microorganisms from Food Washes to Anti-Salmonella and anti-E. coli O157 Immunomagnetic Beads: Most Probable Composition of Background Eubacteria. Analytical and Bioanalytical Chemistry, 391, 525-536. https://doi.org/10.1007/s00216-008-1959-2</mixed-citation></ref><ref id="scirp.91519-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">de St. Groth, S.F. (1982) The Evaluation of Limiting Dilution Assays. Journal of Immunological Methods, 49, R11-R23. https://doi.org/10.1016/0022-1759(82)90269-1</mixed-citation></ref><ref id="scirp.91519-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Bevington, P.R. and Robinson, D.K. (1992) Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Boston, 17-23 and 41-43.</mixed-citation></ref><ref id="scirp.91519-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P., Fortis, L. and Tu, S.-I. (2001) A Simple Maximum Probability Resolution Algorithm for Most Probable Number Analysis Using Microsoft Excel. Journal of Rapid Methods and Automation in Microbiology, 9, 33-51. https://doi.org/10.1111/j.1745-4581.2001.tb00226.x</mixed-citation></ref><ref id="scirp.91519-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Gosset, W.S. (1907) “Student” on the Error of Counting with a Haemocytometer. Biometrika, 5, 351-360. https://doi.org/10.1093/biomet/5.3.351</mixed-citation></ref><ref id="scirp.91519-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Fisher, R.A. (1922) On the Mathematical Foundations of Theoretical Statistics. Philosophical Transactions of the Royal Society, London, Series A, 222, 309-368. https://doi.org/10.1098/rsta.1922.0009</mixed-citation></ref><ref id="scirp.91519-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P.L., Nguyen, L.-H.T., Paoli, G.C. and Chen, C.-Y. (2010) Evidence for a Bimodal Distribution of Escherichia coli Doubling Times below a Threshold Initial Cell Concentration. BMC Microbiology, 10, 207.</mixed-citation></ref><ref id="scirp.91519-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Chen, C.-Y., Nace, G.W. and Irwin, P.L. (2003) A 6×6 Drop Plate Method for Simultaneous Colony Counting and MPN Enumeration of Campylobacter jejuni, Listeria monocytogenes, and Escherichia coli. Journal of Microbiological Methods, 55, 475-479. https://doi.org/10.1016/S0167-7012(03)00194-5</mixed-citation></ref><ref id="scirp.91519-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P.L., Nguyen, L.-H.T. and Chen, C.-Y. (2008) Binding of Nontarget Microorganisms from Food Washes to Anti-Salmonella and Anti-E. coli O157 Immunomagnetic Beads: Minimizing the Errors of Random Sampling in Extreme Dilute Systems. Analytical and Bioanalytical Chemistry, 391, 515-524. https://doi.org/10.1007/s00216-008-1961-8</mixed-citation></ref><ref id="scirp.91519-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P.L., Nguyen, L.-H.T. and Chen, C.-Y. (2010) The Relationship between Purely Stochastic Sampling Error and the Number of Technical Replicates Used to Estimate Concentration at an Extreme Dilution. Analytical and Bioanalytical Chemistry, 398, 895-903. https://doi.org/10.1007/s00216-010-3967-2</mixed-citation></ref><ref id="scirp.91519-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Trotter, H.F. (1959) An Elementary Proof of the Central Limit Theorem. Archiv der Mathematik, 10, 226-234. https://doi.org/10.1007/BF01240790</mixed-citation></ref><ref id="scirp.91519-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Hartley, H.O. (1961) The Modified Gauss-Newton Method for Fitting of Non-Linear Regression Functions by Least Squares. Technometrics, 3, 269-280. https://doi.org/10.1080/00401706.1961.10489945</mixed-citation></ref><ref id="scirp.91519-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P.L., Damert, W.C. and Doner, L.W. (1994) Curve Fitting in Nuclear Magnetic Resonance Spectroscopy: Illustrative Examples Using a Spreadsheet and Microcomputer. Concepts in Magnetic Resonance, 6, 57-67. https://doi.org/10.1002/cmr.1820060105</mixed-citation></ref><ref id="scirp.91519-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Beers, Y. (1957) Introduction to the Theory of Error. Addison-Wesley Publishing Company, Inc., Reading, 29-30.</mixed-citation></ref><ref id="scirp.91519-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Salter, C. (2000) Error Analysis Using the Variance-Covariance Matrix. Journal of Chemical Education, 77, 1239-1243. https://doi.org/10.1021/ed077p1239</mixed-citation></ref><ref id="scirp.91519-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Steel, R.G.D. and Torrie, J.H.D. (1960) Principles and Procedures of Statistics. McGraw-Hill, New York, 409.</mixed-citation></ref><ref id="scirp.91519-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Irwin, P., Capobianco, J., Nguyen, L., He, Y., Gehring, M., Gehring, A. and Chen, C.-Y. (2019) Bacterial Cell Recovery after Hollow Fiber Microfiltration Sample Concentration and Washing: Most Probable Bacterial Composition in Frozen Vegetables.</mixed-citation></ref></ref-list></back></article>