<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OALibJ</journal-id><journal-title-group><journal-title>Open Access Library Journal</journal-title></journal-title-group><issn pub-type="epub">2333-9705</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/oalib.1109474</article-id><article-id pub-id-type="publisher-id">OALibJ-121609</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biomedical&amp;Life Sciences</subject><subject> Business&amp;Economics</subject><subject> Chemistry&amp;Materials Science</subject><subject> Computer Science&amp;Communications</subject><subject> Earth&amp;Environmental Sciences</subject><subject> Engineering</subject><subject> Medicine&amp;Healthcare</subject><subject> Physics&amp;Mathematics</subject><subject> Social Sciences&amp;Humanities</subject></subj-group></article-categories><title-group><article-title>
 
 
  Distribution Estimation of Invasive Species Based on Crowdsourcing Reports
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yuxin</surname><given-names>Shi</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Siyuan</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Tingzhen</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Information Science and Technology College, Dalian Maritime University, Dalian, China</addr-line></aff><aff id="aff2"><addr-line>Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China</addr-line></aff><pub-date pub-type="epub"><day>01</day><month>11</month><year>2022</year></pub-date><volume>09</volume><issue>11</issue><fpage>1</fpage><lpage>11</lpage><history><date date-type="received"><day>20,</day>	<month>October</month>	<year>2022</year></date><date date-type="rev-recd"><day>27,</day>	<month>November</month>	<year>2022</year>	</date><date date-type="accepted"><day>30,</day>	<month>November</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Species invasion will cause certain harm to the local ecosystem. 
  <em>Vespa mandarinia</em>, discovered on Vancouver Island, is harmful to agriculture and predators of European honeybees. The government tried to use a crowdsourcing system to collect information and formulate policies to eliminate 
  <em>Vespa mandarinia</em>. However, the information provided by the local population about 
  <em>Vespa mandarinia</em> is not entirely accurate. For this problem, we build a method to mine trusted information in massive crowdsourcing 
  <em>Vespa mandarinia</em> reports. We consider providing the date and location of the report, and establishing a credibility calculation model for further analysis. For the report date, we calculate the normal distribution parameters based on the frequency of the report in each season to measure the reliability of a single report. For report location, we use K-means cluster analysis to find the location of the center point, which is regarded as a hive, count the report points in each hive radiation range, and use these points to generate two-dimensional normal distribution parameters to normalize the data and eliminate statistical errors. We take the probability density of the report at its location as the reliability of the reports. Through credibility, we can screen out reports that are more likely to be positive for prioritizing investigation. In order to better analyze the newly discovered reports in the future and ensure the timeliness of the model, we set up distributed incremental adjustment model to modify normal distribution parameters, and update the existing model.
 
</p></abstract><kwd-group><kwd>Data Mining</kwd><kwd> Public Health</kwd><kwd> Biotechnology</kwd><kwd> Cluster Analysis</kwd><kwd> Crowdsourcing Data</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Vespa mandarinia, discovered on Vancouver Island in the fall of 2019, is harmful to agriculture and predators of European honeybees. Therefore, it is necessary to study this hornet and the spread of them over time, officially developed a crowdsourcing system [<xref ref-type="bibr" rid="scirp.121609-ref1">1</xref>] for Vespa mandarinia. But citizens did not know Vespa mandarinia very well, many witnesses provided wrong information. Part of the information has been tested by the laboratory and verified. However, there are still many reports that cannot be verified by the laboratory due to a lack of information, and some reports have not yet been processed.</p><p>How to eliminate errors in crowdsourcing data to extract trusted information is a topic of social data analysis research [<xref ref-type="bibr" rid="scirp.121609-ref2">2</xref>]. Willett et al. [<xref ref-type="bibr" rid="scirp.121609-ref3">3</xref>] used clustering analysis and user interface optimization to improve the yield of crowdsourcing data. Koswatte et al. [<xref ref-type="bibr" rid="scirp.121609-ref4">4</xref>] used the naive Bayesian network to assess the credibility of crowdsourcing rescue information in the 2011 Australian flood event. Loganathan et al. [<xref ref-type="bibr" rid="scirp.121609-ref5">5</xref>] used logical regression to classify whether crowdsourcing data is reliable. Shamir et al. [<xref ref-type="bibr" rid="scirp.121609-ref6">6</xref>] used the performance of the supervised learning model to judge the noise level in the crowdsourcing data. Silverman et al. [<xref ref-type="bibr" rid="scirp.121609-ref7">7</xref>] evaluated the conditions under which the sample mean of crowdsourcing data can measure data reliability based on the maximum entropy principle.</p><p>In this paper, we will build a model to mine the information in crowdsourcing Vespa mandarinia reports [<xref ref-type="bibr" rid="scirp.121609-ref8">8</xref>]. The information available from the report includes their submission date, longitude and latitude. We build two models to analyze their submission date and longitude and latitude. For the submission date, since the life habits of Vespa mandarinia are affected by the seasons, we build a model to analyze the probability density of the number of reports in each month as the season’s credibility.</p><p>For longitude and latitude coordinates, we use K-means [<xref ref-type="bibr" rid="scirp.121609-ref9">9</xref>] to determine the cluster center, then, analyze to determine the distribution of hives. Since the radiation range of a hive is 30 km, we first need to find the reports within 30 km of each hive, then, use these reports to calculate the two-dimensional normal distribution parameters [<xref ref-type="bibr" rid="scirp.121609-ref10">10</xref>] radiated by each hive. In this way, for each unverified and unprocessed report, the nearest hive can be found, and the probability density of its corresponding position in the two-dimensional normal distribution radiated by this hive can be calculated. This probability density value is regarded as the location’s credibility. The season’s credibility and location’s credibility are combined to calculate the final credibility, so that public health agencies can take can first investigate reports with high credibility. When a new report is received, we quickly update the model by incrementally estimating the parameters of the normal distribution.</p></sec><sec id="s2"><title>2. Data Pre-Processing</title><p>Since we are handing a problem with big data, there is a diversity of data with different types. Besides, the data interact with each other to some degree. We must deeply analyze the data to dig out the meaning of each column and the validity of each data set.</p><p>In order to analyze the distribution of hornets over time, we first simply process the data, and select data from the past two years and exclude negative reports. We use Python to draw the distribution map of hornets over time (including positive reports, unverified reports and unprocessed reports), as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref> and <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>We found that the number of positive reports is very small and most of them are concentrated in one area. Therefore, the information that positive reports can provide is very limited. It is necessary to find a way to mine information from unified and unprocessed reports. In addition, no obvious trends can be seen from the year. We will build a model to find out the trends of its seasons and geographic locations.</p></sec><sec id="s3"><title>3. Credibility Calculation Model</title><p>In order to make better use of the information provided in the reports, more accurately judge the correctness of each report, We calculate their credibility based on the reported Detection Date (season) and the reported longitude and latitude (location).</p><sec id="s3_1"><title>3.1. Calculate Reliability Based on Season</title><p>As climatic conditions have a great influence on the survival of Vespa mandarinia, the detection date reported is an important factor for judging an unverified or unprocessed report. Taking 2020 as an example, we have compiled the number of reports for each month, as shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><p>It can be observed that August has the largest number of reports. Since the number of Vespa mandarinia is affected by many factors. The amount of data in practice is large and independent of each other. Therefore, theoretically, the number of reports per month should follow a normal distribution. However, the number of reports in June in the picture is less than in the two adjacent months. This may be caused by errors in data collection. In order to solve this error, we re-normalize the data based on the statistics of the normal distribution.</p><p>The mean value of month μ = 8 , its variance σ 2 = ∑ k = 1 12 ( x k − 8 ) 2 p k , p k is the normalized result of the number of reports. After calculation, we get <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>After calc After calculation we get σ 2 = 3.034234234 . We get a normal distribution with μ = 8 , σ 2 = 3.034234234 .</p><p>The month M is a random variable, and the reliability of each month’s reports obeys this distribution, written as M~N(8, 3.03).</p><p>Substituting the months from January to August into the normal distribution, we get the credibility of the probability density function for these months. Because in reality, the number of Vespa mandarinia in winter decreases faster than it grows in spring. The purpose of using the normal distribution is to make the data on the left and right sides of the mean of the month have their own monotonicity, but at the same time make the data on both sides have symmetry. So we have to deal with it to make it asymmetrical. The probability density from September to December is replaced by the probability density from April to January.</p></sec><sec id="s3_2"><title>3.2. Calculate Reliability Based on Location</title><p>Since a new queen usually has a range estimated at 30 km for establishing her hive [<xref ref-type="bibr" rid="scirp.121609-ref11">11</xref>], the reported longitude and latitude are also important factors for judging credibility. Next, we build a model based on the reported location.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> The normalized result of the number of reports</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Month</th><th align="center" valign="middle" >Number of reports</th><th align="center" valign="middle" >Number of normalized reports</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.000675676</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >3</td><td align="center" valign="middle" >0.000900901</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >23</td><td align="center" valign="middle" >0.002702703</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >188</td><td align="center" valign="middle" >0.033108108</td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" >566</td><td align="center" valign="middle" >0.168243243</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" >316</td><td align="center" valign="middle" >0.074774775</td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" >980</td><td align="center" valign="middle" >0.200675676</td></tr><tr><td align="center" valign="middle" >8</td><td align="center" valign="middle" >1346</td><td align="center" valign="middle" >0.324324324</td></tr><tr><td align="center" valign="middle" >9</td><td align="center" valign="middle" >637</td><td align="center" valign="middle" >0.140315315</td></tr><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" >176</td><td align="center" valign="middle" >0.054279279</td></tr><tr><td align="center" valign="middle" >11</td><td align="center" valign="middle" >2</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0</td></tr></tbody></table></table-wrap><p>We divide hives into two types, one is the location that has been identified as a positive report, namely hives. The other is the gathering point of the report, we call them uncertain hives. The calculation steps are as follows.</p><sec id="s3_2_1"><title>3.2.1. Use K-Means Cluster Analysis to Find Uncertain Hives</title><p>Step 1. Analyze the data with Elbow Method [<xref ref-type="bibr" rid="scirp.121609-ref12">12</xref>] to determine the cluster center k, that is, the number of uncertain hives (<xref ref-type="fig" rid="fig4">Figure 4</xref>).</p><p>The point of maximum slope change rate is 3.75. So there are 4 cluster centers, that is, the number of uncertain hives is 4.</p><p>Step 2. Perform K-means clustering and divide it into k categories to get <xref ref-type="fig" rid="fig5">Figure 5</xref>.</p><p>The purple part in the lower right corner of the picture is the actual area with positive reports. And K-means clustering results show that the purple part is the smallest category. It shows that this area is the densest, which is consistent with the existing positive reports. This situation can show that the model is reasonable.</p><p>Step 3. Find the positive point and all points within 30 kilometers from each cluster center point. This is the maximum radiation range of this hive or uncertain hive. We get Schematic diagram of reports under each hive radiation (<xref ref-type="fig" rid="fig6">Figure 6</xref>).</p><p>Step 4. Since the random variables are longitude and latitude, their correlation coefficient is 0. Calculate the mean variance of these points, and each hive or uncertain hive can get a two-dimensional normal distribution. The credibility of each point can be the probability density of the point on the distribution.</p><p>After calculation, we get the two-dimensional normal distribution parameters of radiation (<xref ref-type="table" rid="table2">Table 2</xref>).</p></sec><sec id="s3_2_2"><title>3.2.2. Calculate Credibility of Each Unverified Report</title><p>Step 1. Find the nearest hive or uncertain hive to this point.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> The two-dimensional normal distribution parameters of radiation</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >s 1 2</th><th align="center" valign="middle" >s 2 2</th><th align="center" valign="middle" >u 1</th><th align="center" valign="middle" >u 2</th></tr></thead><tr><td align="center" valign="middle" >0.021559586</td><td align="center" valign="middle" >0.02275687</td><td align="center" valign="middle" >47.37780629</td><td align="center" valign="middle" >−122.4103369</td></tr><tr><td align="center" valign="middle" >0.006087037</td><td align="center" valign="middle" >0.01913566</td><td align="center" valign="middle" >47.56529326</td><td align="center" valign="middle" >−117.5211199</td></tr><tr><td align="center" valign="middle" >0.023101033</td><td align="center" valign="middle" >0.015975192</td><td align="center" valign="middle" >48.58362288</td><td align="center" valign="middle" >−122.5277207</td></tr><tr><td align="center" valign="middle" >0.008281684</td><td align="center" valign="middle" >0.01116439</td><td align="center" valign="middle" >47.23487839</td><td align="center" valign="middle" >−120.042999</td></tr><tr><td align="center" valign="middle" >0.014257235</td><td align="center" valign="middle" >0.015585253</td><td align="center" valign="middle" >48.723779</td><td align="center" valign="middle" >−122.354431</td></tr><tr><td align="center" valign="middle" >7.17E−05</td><td align="center" valign="middle" >0.004930427</td><td align="center" valign="middle" >49.149394</td><td align="center" valign="middle" >−123.943134</td></tr><tr><td align="center" valign="middle" >0.006557754</td><td align="center" valign="middle" >0.020917926</td><td align="center" valign="middle" >48.993892</td><td align="center" valign="middle" >−122.702242</td></tr><tr><td align="center" valign="middle" >0.007117487</td><td align="center" valign="middle" >0.021407771</td><td align="center" valign="middle" >48.971949</td><td align="center" valign="middle" >−122.700941</td></tr><tr><td align="center" valign="middle" >0.004096438</td><td align="center" valign="middle" >0.019428031</td><td align="center" valign="middle" >49.025831</td><td align="center" valign="middle" >−122.810653</td></tr><tr><td align="center" valign="middle" >0.006893771</td><td align="center" valign="middle" >0.021722665</td><td align="center" valign="middle" >48.980994</td><td align="center" valign="middle" >−122.688503</td></tr><tr><td align="center" valign="middle" >0.003876809</td><td align="center" valign="middle" >0.017631808</td><td align="center" valign="middle" >49.060215</td><td align="center" valign="middle" >−122.641648</td></tr><tr><td align="center" valign="middle" >0.008632947</td><td align="center" valign="middle" >0.020996337</td><td align="center" valign="middle" >48.955587</td><td align="center" valign="middle" >−122.661037</td></tr><tr><td align="center" valign="middle" >0.011103904</td><td align="center" valign="middle" >0.017261207</td><td align="center" valign="middle" >48.777534</td><td align="center" valign="middle" >−122.418612</td></tr><tr><td align="center" valign="middle" >0.015007777</td><td align="center" valign="middle" >0.01841511</td><td align="center" valign="middle" >47.579579</td><td align="center" valign="middle" >−122.124218</td></tr><tr><td align="center" valign="middle" >0.011930149</td><td align="center" valign="middle" >0.009424903</td><td align="center" valign="middle" >46.706048</td><td align="center" valign="middle" >−120.481003</td></tr><tr><td align="center" valign="middle" >0.007938466</td><td align="center" valign="middle" >0.020856617</td><td align="center" valign="middle" >48.927519</td><td align="center" valign="middle" >−122.745016</td></tr><tr><td align="center" valign="middle" >0.008228704</td><td align="center" valign="middle" >0.016522836</td><td align="center" valign="middle" >48.984269</td><td align="center" valign="middle" >−122.574809</td></tr><tr><td align="center" valign="middle" >0.017076043</td><td align="center" valign="middle" >0.018369782</td><td align="center" valign="middle" >47.50218</td><td align="center" valign="middle" >−122.16402</td></tr><tr><td align="center" valign="middle" >0.008228655</td><td align="center" valign="middle" >0.01652283</td><td align="center" valign="middle" >48.98422</td><td align="center" valign="middle" >−122.574726</td></tr><tr><td align="center" valign="middle" >0.008228608</td><td align="center" valign="middle" >0.01652283</td><td align="center" valign="middle" >48.984172</td><td align="center" valign="middle" >−122.57472</td></tr><tr><td align="center" valign="middle" >0.008273893</td><td align="center" valign="middle" >0.016495515</td><td align="center" valign="middle" >48.979497</td><td align="center" valign="middle" >−122.581335</td></tr><tr><td align="center" valign="middle" >0.008141993</td><td align="center" valign="middle" >0.016516265</td><td align="center" valign="middle" >48.983375</td><td align="center" valign="middle" >−122.582465</td></tr></tbody></table></table-wrap><p>Step 2. Calculate the probability density of this point on the hive corresponding distribution as the reliability. If no hive is found within 30 kilometers of this point, the reliability of this point is 0.</p></sec></sec><sec id="s3_3"><title>3.3. Final Credibility</title><p>Further, considering that seasonal factors are relatively fixed and location factors are more differentiated, we will combine the season credibility and the location credibility in 4:6 ratio to get the final credibility. Among them, if the distance credibility is 0, the final credibility is directly 0.</p><p>After calculation, we get the credibility of all reports. Due to the huge amount of data, we show some data in <xref ref-type="table" rid="table3">Table 3</xref>.</p></sec><sec id="s3_4"><title>3.4. Distributed Incremental Adjustment Model</title><p>In the future, people may continue to discover Vespa mandarinia and provide new reports. For new reports that people discover later, we use the following algorithm to online update our model.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Final reliability of partial reports</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >ID</th><th align="center" valign="middle" >Final credibility</th></tr></thead><tr><td align="center" valign="middle" >{17EBEDE2-7DFF-4342-A66E-376185CD95DE}</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >{204DA998-2F64-40CD-9A52-30D25D272CF5}</td><td align="center" valign="middle" >0.207743828</td></tr><tr><td align="center" valign="middle" >{47BBB2BB-2996-4BF5-8329-73F28A7E5778}</td><td align="center" valign="middle" >0.065569054</td></tr><tr><td align="center" valign="middle" >{DB01EEBC-66F4-4012-A16B-DDE2365FAE24}</td><td align="center" valign="middle" >0.916002762</td></tr><tr><td align="center" valign="middle" >{589E0AD2-4EEF-4588-8969-158B084727AE}</td><td align="center" valign="middle" >0.207819937</td></tr><tr><td align="center" valign="middle" >{A0D45071-DFD3-4F0E-9851-FFBD637FBF58}</td><td align="center" valign="middle" >0</td></tr><tr><td align="center" valign="middle" >{EB6CACF8-D71E-4C68-AE03-D2AEF2E42421}</td><td align="center" valign="middle" >0.207766593</td></tr><tr><td align="center" valign="middle" >{7D0FB65F-A3EA-4436-AA6D-B379565CFE87}</td><td align="center" valign="middle" >0.776898177</td></tr><tr><td align="center" valign="middle" >{7E522AE9-2155-4ECE-A3CD-34C4CB91650C}</td><td align="center" valign="middle" >0.207851304</td></tr><tr><td align="center" valign="middle" >{CD18D749-4333-4A1A-96FC-1287A8F66B32}</td><td align="center" valign="middle" >0.207789183</td></tr></tbody></table></table-wrap><p>μ 1 is the expectation of the model composed of n data, s 1 2 is the variance of the model composed of n data. μ is the expectation of the updated model composed of n + 1 data, s 2 is the variance of the model composed of n + 1 data after the update. We get the following equation:</p><p>μ = 1 n ( a 1 + a 2 + ⋯ + a n )</p><p>s 2 = 1 n [ ( a 1 − μ ) 2 + ( a 2 − μ ) 2 + ⋯ + ( a n − μ ) 2 ]</p><p>μ 1 = 1 n + 1 ( a 1 + a 2 + ⋯ + a n + a n + 1 )</p><p>s 1 2 = 1 n + 1 [ ( a 1 − μ 1 ) 2 + ( a 2 − μ 1 ) 2 + ⋯ + ( a n − μ 1 ) 2 + ( a n + 1 − μ 1 ) 2 ]</p><p>Since the value of n is very large, when calculating the variance of the n + 1 th data, it can be regarded as μ = μ 1 .</p><p>That is, the variance expression at this time can be written as:</p><p>s 1 2 = 1 n + 1 [ ( a 1 − μ ) 2 + ( a 2 − μ ) 2 + ⋯ + ( a n − μ ) 2 + ( a n + 1 − μ ) 2 ]</p><p>We can get:</p><p>μ 1 = 1 n + 1 μ + a n + 1 n + 1</p><p>s 1 2 = n n + 1 s 2 + ( a n + 1 − μ ) 2 n + 1 = n n + 1 s 2 + ( a n + 1 − μ 1 ) 2 n + 1</p></sec></sec><sec id="s4"><title>4. Conclusions</title><p>In this paper, we build three models to analyze the known data and get the final credibility. Using K-means clustering-normal distribution model, and using a normal distribution to repair errors in known data. In theory, the random independent time affected by multiple factors is normally distributed. So, we normalize the data to a normal distribution and two-dimensional normal distribution to repair the errors. This can improve the accuracy of the data and the rationality of the results. Using distributed incremental adjustment model modifies the normal distribution parameters, updates the credibility calculation model, and ensures the timeliness of the model. The update frequency is that every new report can be updated. Final credibility is the likehood of correct classification. The calculation formula for the probability of misclassification is e = 1 − y .</p><p>We sort the reports according to their final credibility. The top-ranked reports are the reports that are investigated first, and they are most likely to be positive sightings.</p></sec><sec id="s5"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest.</p></sec><sec id="s6"><title>Cite this paper</title><p>Shi, Y.X., Liu, S.Y. and Liu, T.Z. (2022) Distribution Estimation of Invasive Species Based on Crowdsourcing Reports. Open Access Library Journal, 9: e9474. https://doi.org/10.4236/oalib.1109474</p></sec></body><back><ref-list><title>References</title><ref id="scirp.121609-ref1"><label>1</label><mixed-citation publication-type="book" xlink:type="simple">Keating, M., Rhodes, B. and Richards, A. (2013) Crowdsourcing: A Flexible Method for Innovation, Data Collection, and Analysis in Social Science Research. In: Hill, C.A., Dean, E. and Murphy, J., Eds., Social Media, Sociality, and Survey Research, Wiley, New York, 179-201. https://doi.org/10.1002/9781118751534.ch8</mixed-citation></ref><ref id="scirp.121609-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Yuen, M.-C., King, I. and Leung, K.-S. (2011) A Survey of Crowdsourcing Systems. 2011 IEEE Third International Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third International Conference on Social Computing, Boston, MA, 9-11 October 2011, 766-773. https://doi.org/10.1109/PASSAT/SocialCom.2011.203</mixed-citation></ref><ref id="scirp.121609-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Willett, W., Ginosar, S., Steinitz, A., Hartmann, B. and Agrawala, M. (2013) Identifying Redundancy and Exposing Provenance in Crowdsourced Data Analysis. IEEE Transactions on Visualization and Computer Graphics, 19, 2198-2206.  
https://doi.org/10.1109/TVCG.2013.164</mixed-citation></ref><ref id="scirp.121609-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Koswatte, S., McDougall, K. and Liu, X.Y. (2017) VGI and Crowdsourced Data Credibility Analysis Using Spam Email Detection Techniques. International Journal of Digital Earth, 11, 520-532. https://doi.org/10.1080/17538947.2017.1341558</mixed-citation></ref><ref id="scirp.121609-ref5"><label>5</label><mixed-citation publication-type="book" xlink:type="simple">Loganathan, V., Subramani, G. and Bhaskar, N. (2020) Crowdsourcing Data Analysis for Crowd Systems. In: Ranganathan, G., Chen, J. and Rocha, á., Eds., Inventive Communication and Computational Technologies, Springer, Singapore.  
https://doi.org/10.1007/978-981-15-0146-3_117</mixed-citation></ref><ref id="scirp.121609-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Shamir, L., Diamond, D. and Wallin, J. (2015) Leveraging Pattern Recognition Consistency Estimation for Crowdsourcing Data Analysis. IEEE Transactions on Human-Machine Systems, 46, 474-480. https://doi.org/10.1109/THMS.2015.2463082</mixed-citation></ref><ref id="scirp.121609-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Silverman, M.P. (2019) Extraction of Information from Crowdsourcing: Experimental Test Employing Bayesian, Maximum Likelihood, and Maximum Entropy Methods. Open Journal of Statistics, 9, 571-600. https://doi.org/10.4236/ojs.2019.95038</mixed-citation></ref><ref id="scirp.121609-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Shi, Y.X. (2022) Vespa Mandarinia Crowdsourcing Reports. Figshare. Dataset.  
https://doi.org/10.6084/m9.figshare.21333966.v1</mixed-citation></ref><ref id="scirp.121609-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Z., Liu, Q. and Chen, E. (2009) A K-Means Algorithm for Optimizing the Initial Center Point. Pattern Recognition and Artificial Intelligence, 22, 299-304.</mixed-citation></ref><ref id="scirp.121609-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Xia, X.-F., Liu, X. and Li, X.-M. (2010) User-Item Missing Ratings Complement Based on Two-Dimensional Normal Distribution. 2010 2nd International Workshop on Database Technology and Applications, Wuhan, 27-28 November 2010, 1-6.  
https://doi.org/10.1109/DBTA.2010.5658988</mixed-citation></ref><ref id="scirp.121609-ref11"><label>11</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Huang</surname><given-names> S.K. </given-names></name>,<etal>et al</etal>. (<year>2001</year>)<article-title>The Preliminary Report on Vespa mandarinia and Other Arthropods in Its Cave</article-title><source> Journal of Fujian Agricultural University (Natural Science)</source><volume> 30</volume>,<fpage> 99</fpage>-<lpage>102</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.121609-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Wu, G.J., Zhang, J.L. and Yuan, D. (2019) Automatically Obtaining K Value Based on K-Means Elbow Method. Computer Engineering &amp; Software, 40, 167-170.</mixed-citation></ref></ref-list></back></article>