<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">AJPS</journal-id><journal-title-group><journal-title>American Journal of Plant Sciences</journal-title></journal-title-group><issn pub-type="epub">2158-2742</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ajps.2015.619311</article-id><article-id pub-id-type="publisher-id">AJPS-61884</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Biomedical&amp;Life Sciences</subject></subj-group></article-categories><title-group><article-title>
 
 
  Testing Leaf Multispectral Reflectance Data as Input into Random Forest to Differentiate Velvetleaf from Soybean
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>eginald</surname><given-names>S. Fletcher</given-names></name><xref ref-type="aff" rid="aff1"><sub>1</sub></xref></contrib></contrib-group><aff id="aff1"><label>1</label><addr-line>United States Department of Agriculture, Agricultural Research Service, Crop Production Systems Research
Unit, Stoneville, USA</addr-line></aff><author-notes><corresp id="cor1">* E-mail:</corresp></author-notes><pub-date pub-type="epub"><day>02</day><month>12</month><year>2015</year></pub-date><volume>06</volume><issue>19</issue><fpage>3193</fpage><lpage>3204</lpage><history><date date-type="received"><day>22</day>	<month>September</month>	<year>2015</year></date><date date-type="rev-recd"><day>accepted</day>	<month>11</month>	<year>December</year>	</date><date date-type="accepted"><day>14</day>	<month>December</month>	<year>2015</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Velvetleaf (
  Abutilon theophrasti Medic.) infestations negatively impact row crop production throughout the United States and Canada’s eastern provinces. To implement management strategies to control velvetleaf, managers need tools for differentiating it from crop plants. 5 Band, 7 Band, 8 Band, and 16 Band multispectral datasets simulating LANDSAT 3 plus a blue band, LANDSAT 8, WorldView 2, and WorldView 3 spectral bands, respectively were tested as input into the random forest algorithm for velvetleaf soybean [
  Glycine max L. (Merr.)] discrimination. During two separate greenhouse experiments in 2014, leaf reflectance measurements were obtained at the vegetative growth stage of velvetleaf plants and two soybean varieties. The reflectance measurements were collected with a plant contact probe attached to a hyperspectral spectroradiometer. Leaf hyperspectral reflectance measurements were convolved to the four multispectral datasets with computer software. Overall, user’s, and producer’s accuracies and kappa coefficient were employed to determine classification accuracies. Using the multispectral datasets as input, the random forest algorithm differentiated velvetleaf from the soybean varieties with accuracies ranging from 86.7% to 100%. 7 Band, 16 Band, 8 Band, and 5 Band datasets ranked or tied for the highest accuracies seventeen, sixteen, twelve, and one time, respectively. Kappa coefficients indicated an almost perfect agreement (i.e., kappa value, 0.81 - 1.0) to substantial agreement (i.e., kappa value, 0.61 - 0.80) between reference data and model predicted classes. This study was the first to demonstrate the application of the random forest machine learner and leaf multispectral reflectance data as tools to distinguish velvetleaf from soybean and to identify multispectral band combinations providing the best accuracies. Findings support further application of the random forest machine learner along with remotely-sensed multispectral data as tools for velvetleaf soybean discrimination with future implications for site-specific management of velvetleaf.
 
</p></abstract><kwd-group><kwd>&lt;i&gt;Glycine max&lt;/i&gt;</kwd><kwd> &lt;i&gt;Abutilon theophrasti&lt;/i&gt;</kwd><kwd> Machine Learning</kwd><kwd> Supervised Classification</kwd><kwd> Ensemble Technique</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Velvetleaf (Abutilon theophrasti Medic.), a broadleaf plant native to China, was introduced into the United States from India as a fiber crop. It escaped cultivation and now has become a problem weed in row crops, especially in corn (Zea mays L.) and soybean (Glycine max (L.) Merr.) fields throughout the United States and Canada’s eastern provinces. The summer annual weed grows to heights ranging from 0.3 to 2.0 m. The plant reproduces from seed and can develop up to 17,000 seeds that may remain viable for up to sixty years. Velvetleaf grows best in warm regions and invades vacant lots, gardens, and cultivated fields. Once established, it is a problem weed for many years to come.</p><p>Velvetleaf infestations negatively impact a crop and field in several ways. Velvetleaf plants emerging before or at the same time as crop plants are highly competitive for water and plant nutrients and thus can outgrow the crop. A 25% decrease in crop yield can occur when the velvetleaf plant population is equivalent to 1 plant per 30 cm [<xref ref-type="bibr" rid="scirp.61884-ref1">1</xref>] . Seeds, adult plants, and decaying plant parts contain or produce allelopathic (toxic) chemicals that inhibit water uptake and chlorophyll production of some crop plants, particularly soybean, thus preventing growth. The chemicals enter the soil during rain events.</p><p>Producers commonly use preemergence and postemergence measures to manage or control velvetleaf infestations. Detecting and eliminating the plant before seeding is vital because of the long-term dormancy of the seeds and the future problems they may cause. Therefore, field managers need additional techniques besides the com- mon field survey for detecting velvetleaf infestation in crop fields.</p><p>Remote sensing technology has gained popularity as a tool for weed detection in agricultural systems [<xref ref-type="bibr" rid="scirp.61884-ref2">2</xref>] - [<xref ref-type="bibr" rid="scirp.61884-ref8">8</xref>] . The technology involves using ground, airborne, or satellite-borne sensors to obtain light reflectance measurements of plant leaves and canopies to differentiate between weed and crop plants. Detecting weeds with remote sensing technologies requires that differences in spectral reflectance exist between weeds and their environment and that the spatial and spectral resolution of remote sensing equipment is sufficient to detect these differences [<xref ref-type="bibr" rid="scirp.61884-ref9">9</xref>] .</p><p>Soybean weed discrimination has been the focus of several remote sensing studies including velvetleaf as one of the weeds of interest. Reference [<xref ref-type="bibr" rid="scirp.61884-ref10">10</xref>] determined from statistical analysis of multispectral data spanning the visible to near infrared region of the light spectrum that weed-free soybean plots could be distinguished from soybean plus velvetleaf plots, soybean plus mixed weed plots, and soybean plus grass plots. The separation only occurred with red/infrared ratios. None of the soybean plus weed plots, however, could be distinguished from each other with single bands or red/infrared ratios.</p><p>Reference [<xref ref-type="bibr" rid="scirp.61884-ref11">11</xref>] obtained mixed results differentiating soybean from velvetleaf and foxtail (Setaria faberi Herrm.) in a controlled experiment. At one study site, they reported a classification error less than 17% for the weeds; at the other study site they achieved classification errors of 17% and 39% for foxtail and velvetleaf, respectively. Their study focused on using airborne multispectral imagery collected within the visible green (520 to 600 nm), visible red (630 to 690 nm), and near infrared (760 to 900 nm) wavebands. They concluded that if weed differentiation was not an issue for the weed management program then remote sensing techniques have good potential to differentiate weeds from crops.</p><p>Reference [<xref ref-type="bibr" rid="scirp.61884-ref4">4</xref>] demonstrated that a single decision tree approach based on the classification and regression technique could use vegetation indices as input to discriminate between corn, corn and velvetleaf, corn and a mixture of various grass species, corn and mixture of random predominant weed species, soybean, soybean and velvetleaf, soybean and mixture of various grass species, velvetleaf, mixed grass, and mixtures of random predominant weed species. The classification success rate was 85% &#177; 6%. The study focused on using twenty-four narrowband multispectral bands within the visible and near infrared regions of the light spectrum. From those bands, sixty-five normalized difference vegetation index bands were created and were used as input for classification. Accurate results were achieved; however, the sample size was small at only three plots per treatment.</p><p>Based on the above studies, more information is needed on the potential of using remote sensing technology for soybean weed discrimination, especially in the case of velvetleaf. Currently, information is lacking on the comparison of multispectral systems wavebands for soybean velvetleaf discrimination. Additionally, no information exists on including shortwave infrared spectral data to discriminate soybeans from velvetleaf. The shortwave infrared region of the light spectrum (1300 - 2500 nm) is sensitive to the water content in plants [<xref ref-type="bibr" rid="scirp.61884-ref12">12</xref>] . Finally, no information is available on the role that soybean variety may play in differentiating it from velvetleaf.</p><p>Another key aspect of using remote sensing technology is the computer algorithm employed to process the data. The success or failure of using the technology is affected by the algorithm selected to analyze the data. In this study, it is proposed to use the random forest machine learner for soybean velvetleaf discrimination. Random forest has gained popularity as a tool to use for classification problems because it is fully automated, and users have the ability to design powerful models with little experience in using the machine learner. Random forest has been ranked as one of the best learners to employ for classification and regression problems [<xref ref-type="bibr" rid="scirp.61884-ref13">13</xref>] . Researchers have successfully used it in genetics, clinical medicine, bioinformatics, agriculture, and remote sensing applications [<xref ref-type="bibr" rid="scirp.61884-ref14">14</xref>] - [<xref ref-type="bibr" rid="scirp.61884-ref17">17</xref>] .</p><p>Random forest is an ensemble learning method based on the principle that a group of “weak learners” can come together to develop a “strong learner” [<xref ref-type="bibr" rid="scirp.61884-ref18">18</xref>] . Thus, it uses multiple decision trees to make a consensus prediction, hence the name random forest. Each decision tree in the “so-called forest” is derived from a bootstrap sample (i.e., a percentage of the original data is selected for training, and the non-selected data are used for testing) of the original data (sampling with replacement). The splitting of each tree node is determined by the Gini criterion (i.e., a measurement of node purity). For the splitting process, the algorithm selects a subset of the predictor variables at each node and then the best-splitting variable is chosen from that subset. Samples not selected in the bootstrap process for a tree (i.e., approximately 36.8% of the original samples), known as “out-of-bag” (OOB) samples, are used to test the accuracy of the classifier. Random forest assigns an OOB sample to a class by using the decision trees in which the sample was OOB. The votes of each tree are tallied, and the OOB sample is assigned to the class receiving the largest votes. Compared with other machine learners, the random forest algorithm does not need an independent test set because the OOB samples serve as the test set [<xref ref-type="bibr" rid="scirp.61884-ref18">18</xref>] . Random forest also provides a variable importance reading representing the importance of each predictor variable to the model.</p><p>Currently, no information is available on using leaf multispectral reflectance data as input into random forest for soybean velvetleaf discrimination. The objective of this investigation was to evaluate leaf multispectral reflectance data as input into the random forest machine learner to differentiate velvetleaf from soybean. Specifically, the study focused on evaluating multispectral data mimicking the spectral bands of satellite sensors to discriminate the velvetleaf from two soybean varieties. Spectral bands of satellite sensors were chosen because the bands are strategically placed in different regions of the light spectrum for land cover mapping, thus providing different spectral band combinations for the model to test for separating velvetleaf from soybean.</p></sec><sec id="s2"><title>2. Materials and Methods</title><sec id="s2_1"><title>2.1. Plant Descriptions</title><p>Two Progeny (P) brand LibertyLink (LL) soybean varieties (P4928LL and P5460LL, Progeny Ag Products, Wynne, Arkansas) and non-glyphosate resistant velvetleaf (United States Department of Agriculture, Agri- cultural Research Service, Stoneville, MS) were grown for the study. All three plants are characterized as pubescent plants, consisting of gray, light tawny, and white hairs for soybean P4928LL, soybean P5460LL, and velvetleaf, respectively. Soybean P4928LL is characterized as having an indeterminate growth habit (i.e., a continuation of vegetative growth after flowering) and soybean P5460LL as having a determinate growth habit (i.e., vegetative growth completed prior to flowering). The maturity group assigned to soybean P4928LL and soybean P5460LL are 4.9 and 5.4, respectively.</p></sec><sec id="s2_2"><title>2.2. Greenhouse Experiment</title><p>The study was conducted at the United States Department of Agriculture, Agricultural Research Service, Stoneville, MS facility. Data were collected from two separate greenhouse experiments initiated on June 13, 2014, and August 28, 2014. Soybean and velvetleaf seeds were sown in plugs containing commercial potting mix (Pro-Mix, Ultimate Potting Mix, Quakertown, Pennsylvania). Ten days after germination, thirty plants of each soybean variety and weed species were transplanted to individual 1 L pots filled with the commercial potting mix. Plants were watered at three- to four-day intervals. The potting mix consisted of a slow release nitrogen, phosphorus, and potassium fertilizer. The plants were grown at a temperature and photoperiod of 26.6˚C and 14-h, respectively.</p></sec><sec id="s2_3"><title>2.3. Data Collection</title><p>Leaf reflectance measurements were obtained with a full range hyperspectral spectroradiometer (FieldSpec 3, PANalytical Boulder, Boulder, CO). The instrument’s fiber optic was attached to a plant probe (PANalytical Boulder, Boulder, CO) equipped with a light source. The plant probe has a 1 cm field of view. A leaf clip (PANalytical Boulder, Boulder, CO) was fastened to the contact probe. This device has a trigger lock/release gripping system designed to hold the leaf in place without removing it from the plant or causing damage to the plant. The leaf clip has a two-sided rotating head. One side of the head contains a black panel face, and the other side has a white panel face. The black and white panels are ideal for reflectance and transmittance measurements, respectively. The former was employed in this study.</p><p>The spectroradiometer obtained continuous spectra in the range of 350 - 2500 nm. Its sampling interval and spectral resolution were 1.4 nm and 3 nm, respectively, within the 350 nm to 1000 nm spectral range. The sampling interval and spectral resolution were 2 nm and 10 nm, respectively, within the 1000 nm to 2500 nm spectral range. The proprietary software operating the instrument resampled the reflectance data to 1 nm wavelengths.</p><p>Reflectance measurements were collected from the most recently matured leaf of each plant. Soybean has a trifoliate leaf, therefore, the center leaflet of the most recently matured leaf was chosen for data collection. At the selected sample spot of each plant leaf, reflectance measurements were an average of fifteen readings. Leaf reflectance measurements were obtained on June 30, 2014, and September 17, 2014, for the first and second experiments, respectively. For velvetleaf, it is important to identify and treat the plant prior to seeding. Measurements were obtained for all plants during the vegetative growth stage. The instrument was calibrated with a white spectralon panel (PANalytical Boulder, Boulder, CO) at 15-minute intervals.</p></sec><sec id="s2_4"><title>2.4. Development of Multispectral Bands</title><p>The hyperspectral reflectance measurements of the soybean and velvetleaf leaves were resampled to four multispectral datasets (<xref ref-type="table" rid="table1">Table 1</xref>), referred to as 5 Band, 7 Band, 8 Band, and 16 Band. The green, red, and near infrared bands of the 5 Band dataset were replicates of the green, red, and near infrared bands obtained by LANDSAT 1 - 4 multispectral scanners [<xref ref-type="bibr" rid="scirp.61884-ref19">19</xref>] . The blue band was added to the 5 Band dataset to represent a broad region of the blue spectrum. Also, it represented blue light reflectance data obtained by many commercial handheld cameras. The 7 Band, 8 Band, and 16 Band datasets simulated the spectral bands of the LANDSAT 8 Operational Land Imager, WorldView 2 sensors, and WorldView 3 sensors, respectively [<xref ref-type="bibr" rid="scirp.61884-ref19">19</xref>] - [<xref ref-type="bibr" rid="scirp.61884-ref21">21</xref>] . The datasets were unique because they represented different regions of the light spectrum and provided spectral resolutions ranging from 20 to 300 nm. The multispectral bands were created by resampling the original hyperspectral bands using a Gaussian distribution function and the lower and the upper bounds of each satellite sensor spectral bands. The resampled spectral data were created with the hsdar package [<xref ref-type="bibr" rid="scirp.61884-ref22">22</xref>] of the R software [R version 3.2.0 (April 16, 2015) Full of Ingredients].</p></sec><sec id="s2_5"><title>2.5. Classification Model Development</title><p>The conditional inference version of random forest (cforest) was used to create the models evaluated in this study. Reference [<xref ref-type="bibr" rid="scirp.61884-ref15">15</xref>] recommended using cforest instead of the original version of random forest if the prediction variables were highly correlated. Some of the variables were highly correlated in each dataset, thus justifying the use of cforest for model creation. Strong correlation between variables biases the variable importance rankings provided by random forest for classification or regression problems. Cforest implementation of random forest was designed to better handle correlation among variables, thus providing more accurate and unbiased rankings of the variable of importance [<xref ref-type="bibr" rid="scirp.61884-ref15">15</xref>] . The cforest technique utilizes conditional inference trees as base learners, in contrast to random forest, which employs classification and regression trees as base learners [<xref ref-type="bibr" rid="scirp.61884-ref15">15</xref>] [<xref ref-type="bibr" rid="scirp.61884-ref23">23</xref>]</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Spectral band descriptions and wavelengths of the multispectral datasets used in this study</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Spectral Band</th><th align="center" valign="middle"  colspan="4"  >Wavelengths of Each Dataset</th></tr></thead><tr><td align="center" valign="middle" >5 Band<sup>a</sup></td><td align="center" valign="middle" >7 Band</td><td align="center" valign="middle" >8 Band</td><td align="center" valign="middle" >16 Band</td></tr><tr><td align="center" valign="middle" >Coastal</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >430 - 450 nm</td><td align="center" valign="middle" >400 - 450 nm</td><td align="center" valign="middle" >400 - 450 nm</td></tr><tr><td align="center" valign="middle" >Blue</td><td align="center" valign="middle" >400 - 500 nm</td><td align="center" valign="middle" >450 - 510 nm</td><td align="center" valign="middle" >450 - 510 nm</td><td align="center" valign="middle" >450 - 510 nm</td></tr><tr><td align="center" valign="middle" >Green</td><td align="center" valign="middle" >500 - 600 nm</td><td align="center" valign="middle" >530 - 590 nm</td><td align="center" valign="middle" >510 - 580 nm</td><td align="center" valign="middle" >510 - 580 nm</td></tr><tr><td align="center" valign="middle" >Yellow</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >585 - 625 nm</td><td align="center" valign="middle" >585 - 625 nm</td></tr><tr><td align="center" valign="middle" >Red</td><td align="center" valign="middle" >600 - 700 nm</td><td align="center" valign="middle" >640 - 670 nm</td><td align="center" valign="middle" >630 - 690 nm</td><td align="center" valign="middle" >630 - 690 nm</td></tr><tr><td align="center" valign="middle" >Red-edge</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >705 - 745 nm</td><td align="center" valign="middle" >705 - 745 nm</td></tr><tr><td align="center" valign="middle" >Near infrared 1</td><td align="center" valign="middle" >700 - 800 nm</td><td align="center" valign="middle" >850 - 880 nm</td><td align="center" valign="middle" >770 - 895 nm</td><td align="center" valign="middle" >770 - 895 nm</td></tr><tr><td align="center" valign="middle" >Near infrared 2</td><td align="center" valign="middle" >800 - 1100 nm</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >860 - 1040 nm</td><td align="center" valign="middle" >860 - 1040 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 1</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1570 - 1650 nm</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1195 - 1225 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 2</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >2110 - 2290 nm</td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1550 - 1590 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 3</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1640 - 1680 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 4</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >1710 - 1750 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 5</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >2145 - 2185 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 6</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >2185 - 2225 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 7</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >2235 - 2285 nm</td></tr><tr><td align="center" valign="middle" >Shortwave infrared 8</td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >2295 - 2365 nm</td></tr></tbody></table></table-wrap><p><sup>a</sup>5 Band-simulates LANDSAT 3 spectral bands plus an additional blue band, 7 Band-simulates LANDSAT 8 spectral bands, 8 Band-simulates WorldView 2 spectral bands, and 16 Band-simulates WorldView 3 spectral bands.</p><p>[<xref ref-type="bibr" rid="scirp.61884-ref24">24</xref>] . Furthermore, instead of using bootstrap samples to construct its decision trees, cforest utilizes subsampling without replacement for constructing unbiased decision trees for the forest. Finally, the cforest algorithm uses the conditional permutation scheme described by [<xref ref-type="bibr" rid="scirp.61884-ref15">15</xref>] to determine the variable of importance ranking.</p><p>The number of samples to evaluate at each split of the tree (mtry) and the number of trees to use for creating the model (ntree) were the two parameters needed to be set before completing the classification. For this study, the default mtry value of 5 was used for each dataset. The default ntree value of 500 was employed as the starting point and was adjusted accordingly to obtain consistent variable importance rankings.</p><p>The following procedure was used to test the robustness of the models relative to variable importance [<xref ref-type="bibr" rid="scirp.61884-ref15">15</xref>] . A model was created using the default mtry and ntree values, the variable importance rankings were tabulated, and then the model was rerun using the same mtry and ntree values and a different starting seed (i.e., the random generator used as a starting point for sampling). The model parameters were accepted if the variable importance ranking was similar between the first and the second runs. If the variable importance rankings were not consistent between runs, then the ntree value was increased by 1000, and the model was retested using the same mtry and seed values. This process was continued until a stable variable importance ranking was obtained.</p></sec><sec id="s2_6"><title>2.6. Accuracy Assessment</title><p>Classification accuracies of the selected models were determined by evaluating the user’s, producer’s, and overall accuracies and kappa coefficient [<xref ref-type="bibr" rid="scirp.61884-ref24">24</xref>] . User’s accuracy represents the percentage of predicted samples classified correctly. Producer’s accuracy characterizes the percentage of reference samples correctly identified. The overall accuracy is a measure of the total number of correctly classified samples divided by the total number of samples. The kappa coefficient quantifies the variation between the observed agreement of the reference data and predicted data and the chance agreement between the two. The accuracy values were tabulated from the “out of bag” samples, those samples not used to train the model. Model development and evaluation were determined with the party package of the R software [<xref ref-type="bibr" rid="scirp.61884-ref25">25</xref>] - [<xref ref-type="bibr" rid="scirp.61884-ref27">27</xref>] .</p></sec></sec><sec id="s3"><title>3. Results</title><sec id="s3_1"><title>3.1. Accuracy Assessment</title><p>The accuracy assessment results of the random forest classification for the June 30, 2014, dataset are summarized in <xref ref-type="table" rid="table2">Table 2</xref> for the velvetleaf soybean P4928LL classification. Overall, user’s, and producer’s accuracies greater than 90% were achieved for all of the multispectral datasets. The highest overall classification accuracy of 96.7% was obtained with the 7 Band, 8 Band, and 16 Band datasets; the lowest overall classification accuracy of 95% occurred for the 5 Band dataset. The same ranking order of the datasets was observed for the kappa coefficients (<xref ref-type="table" rid="table2">Table 2</xref>). The user’s and producer’s accuracy ranged from 93.3% to 100%. For the velvetleaf class, a tie occurred between the 7 Band and the 16 Band multispectral datasets for the highest user’s accuracy; whereas, the 8 Band dataset ranked best in the producer’s accuracy (<xref ref-type="table" rid="table2">Table 2</xref>). The 8 Band, and the 7 Band and 16 Band multispectral datasets achieved the greatest user’s and producer’s accuracies, respectively, for the soybean P4928LL class (<xref ref-type="table" rid="table2">Table 2</xref>).</p><p>The random forest classification results of the velvetleaf soybean P4928LL classes are tabulated in <xref ref-type="table" rid="table2">Table 2</xref> for the September 17, 2014, multispectral datasets. The 7 Band dataset obtained the highest measurement accuracies with 93.3%, 0.867, 90.6%, 96.4%, 96.7%, and 90.0% for overall accuracy, kappa coefficient, velvetleaf user’s accuracy, soybean P4928LL user’s accuracy, velvetleaf producer’s accuracy, and soybean P4928LL producer’s accuracy, respectively. The other multispectral datasets were tied for second in the measurement accuracies.</p><p>Overall, user’s, and producer’s accuracies and the kappa coefficients are presented in <xref ref-type="table" rid="table3">Table 3</xref> for the June 30, 2014, velvetleaf soybean P5460LL classification. The 7 Band, 8 Band, and 16 Band datasets ranked best in all accuracy categories. Their user’s, producer’s, and overall accuracies ranged from 96.7% to 100%, and the kappa coefficients were 0.967. The 5 Band dataset obtained the lowest accuracies, with user’s, producer’s, and overall accuracies ranging from 93.5% to 96.7%. The kappa value was 0.9.</p><p>The September 17, 2014, dataset for the velvetleaf soybean P5460LL classification indicated that the 16 Band dataset model was ranked or tied for first in all of the accuracy categories (<xref ref-type="table" rid="table3">Table 3</xref>). The 7 Band and 8 Band dataset models were tied for first for the soybean P5460LL producer’s accuracy. They obtained the second highest accuracies for the other categories. The 5 Band dataset ranked last in all of the accuracy assessment categories.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Accuracy assessment of the velvetleaf versus soybean P4928LL classification based on leaf multispectral data input into the random forest classifier</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Classification</th><th align="center" valign="middle"  rowspan="2"  >Date</th><th align="center" valign="middle"  rowspan="2"  >Accuracy Measurement</th><th align="center" valign="middle"  colspan="4"  >Multispectral Dataset<sup>a</sup></th></tr></thead><tr><td align="center" valign="middle" >5 Band</td><td align="center" valign="middle" >7 Band</td><td align="center" valign="middle" >8 Band</td><td align="center" valign="middle" >16 Band</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >June 30, 2014</td><td align="center" valign="middle" >User’s accuracy velvetleaf</td><td align="center" valign="middle" >93.5%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >93.8%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >User’s accuracy soybean P4928LL</td><td align="center" valign="middle" >96.6%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy velvetleaf</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy soybean P4928LL</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Overall accuracy</td><td align="center" valign="middle" >95.0%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Kappa coefficient</td><td align="center" valign="middle" >0.900</td><td align="center" valign="middle" >0.933</td><td align="center" valign="middle" >0.933</td><td align="center" valign="middle" >0.933</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >September 17, 2014</td><td align="center" valign="middle" >User’s accuracy velvetleaf</td><td align="center" valign="middle" >90.3%</td><td align="center" valign="middle" >90.6%</td><td align="center" valign="middle" >90.3%</td><td align="center" valign="middle" >90.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >User’s accuracy soybean P4928LL</td><td align="center" valign="middle" >93.1%</td><td align="center" valign="middle" >96.4%</td><td align="center" valign="middle" >93.1%</td><td align="center" valign="middle" >93.1%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy velvetleaf</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy soybean P4928LL</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >90.0%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Overall accuracy</td><td align="center" valign="middle" >91.7%</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >91.7%</td><td align="center" valign="middle" >91.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Kappa coefficient</td><td align="center" valign="middle" >0.833</td><td align="center" valign="middle" >0.867</td><td align="center" valign="middle" >0.833</td><td align="center" valign="middle" >0.833</td></tr></tbody></table></table-wrap><p><sup>a</sup>Refer to <xref ref-type="table" rid="table1">Table 1</xref> for the spectral band designations of the multispectral datasets.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Accuracy assessment of the velvetleaf versus soybean P5460LL classification based on leaf multispectral data input into the random forest classifier</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Classification</th><th align="center" valign="middle"  rowspan="2"  >Date</th><th align="center" valign="middle"  rowspan="2"  >Accuracy Measurement</th><th align="center" valign="middle"  colspan="4"  >Multispectral Dataset<sup>a</sup></th></tr></thead><tr><td align="center" valign="middle" >5 Band</td><td align="center" valign="middle" >7 Band</td><td align="center" valign="middle" >8 Band</td><td align="center" valign="middle" >16 Band</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >June 30, 2014</td><td align="center" valign="middle" >User’s accuracy velvetleaf</td><td align="center" valign="middle" >93.5%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >User’s accuracy soybean P5460LL</td><td align="center" valign="middle" >96.6%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >100%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy velvetleaf</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >100%</td><td align="center" valign="middle" >100%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy soybean P5460LL</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td><td align="center" valign="middle" >96.7%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Overall accuracy</td><td align="center" valign="middle" >95.0%</td><td align="center" valign="middle" >98.3%</td><td align="center" valign="middle" >98.3%</td><td align="center" valign="middle" >98.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Kappa coefficient</td><td align="center" valign="middle" >0.900</td><td align="center" valign="middle" >0.967</td><td align="center" valign="middle" >0.967</td><td align="center" valign="middle" >0.967</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >September 17, 2014</td><td align="center" valign="middle" >User’s accuracy velvetleaf</td><td align="center" valign="middle" >87.1%</td><td align="center" valign="middle" >93.1%</td><td align="center" valign="middle" >93.1%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >User’s accuracy soybean P5460LL</td><td align="center" valign="middle" >89.7%</td><td align="center" valign="middle" >90.3%</td><td align="center" valign="middle" >90.3%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy velvetleaf</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >90.0%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Producer’s accuracy soybean P5460LL</td><td align="center" valign="middle" >86.7%</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >93.3%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Overall accuracy</td><td align="center" valign="middle" >88.3%</td><td align="center" valign="middle" >91.7%</td><td align="center" valign="middle" >91.7%</td><td align="center" valign="middle" >93.3%</td></tr><tr><td align="center" valign="middle" ></td><td align="center" valign="middle" ></td><td align="center" valign="middle" >Kappa coefficient</td><td align="center" valign="middle" >0.767</td><td align="center" valign="middle" >0.833</td><td align="center" valign="middle" >0.833</td><td align="center" valign="middle" >0.867</td></tr></tbody></table></table-wrap><p><sup>a</sup>Refer to <xref ref-type="table" rid="table1">Table 1</xref> for the spectral band designations of the multispectral datasets.</p></sec><sec id="s3_2"><title>3.2. Model Parameters</title><p>For fourteen out of the sixteen classification models, the default mtry and ntree values were adequate for obtaining stable variable importance readings (<xref ref-type="table" rid="table4">Table 4</xref>). The two exceptions were the random forest models used to complete the velvetleaf soybean P4928LL and the velvetleaf soybean 5460LL classifications based on the 8 Band and 16 Band datasets, respectively, for September 17, 2014. Three thousand five-hundred and 4500 trees were used to complete the classifications of the former and latter, respectively.</p></sec><sec id="s3_3"><title>3.3. Variable Importance</title><p>The variable importance rankings of the random forest models used for the June 30, 2014, velvetleaf versus soybean P4928LL were as follows (<xref ref-type="fig" rid="fig1">Figure 1</xref>). The green (G) and near infrared two (NIR2) spectral bands were relevant to the model and had similar variable importance scores for the 5 Band dataset. The NIR1, G, and shortwave infrared one (SWIR1) spectral bands were important to the 7 Band dataset model while noticeable differences occurred in their variable importance scores. NIR1 and 2, G, and yellow (Y) spectral bands were needed by the model for the 8 Band dataset; the NIR2 and G spectral bands had similar variable importance scores and appeared in the top tier of variable importance scores. The NIR1 and Y spectral bands had variable importance scores similar to each other and appeared in the second tier of variable importance scores. SWIR1 to 4, NIR1 and 2, G, and Y spectral bands were the most important variables in the 16 Band dataset model. The spectral bands were grouped into six tiers: tier one-SWIR1, tier two-NIR2, tier three-G, tier four-NIR1 and Y, tier five-SWIR3, and tier six-SWIR2 and 4.</p><p>Variable importance rankings of the random forest models are shown in <xref ref-type="fig" rid="fig2">Figure 2</xref> for the September 17, 2014, velvetleaf versus soybean P4928LL classification. The G spectral band was the most important variable in the 5 Band dataset model. The G and blue (B) spectral bands were needed by 7 Band dataset model, with the G band ranked best. The G, Y, and B spectral bands were ranked most useful by the random forest model for the classification with the 8 Band dataset. Noticeable differences occurred in their importance scores. Eight spectral bands including G, Y, B, NIR1 and 2, red (R), SWIR1, and coastal (CA) were ranked important to the 16 Band dataset model. The G spectral band ranked first followed by the Y, B, NIR2, R, NIR1, SWIR1, and CA spectral bands.</p><p><xref ref-type="fig" rid="fig3">Figure 3</xref> illustrates the variable importance rankings of random forest models used in the classification of the June 30, 2014, velvetleaf and soybean P5460LL classes. The G and NIR2 spectral bands were ranked most im-</p><fig id="fig1"  position="float"><label><xref ref-type="fig" rid="fig1">Figure 1</xref></label><caption><title> Variable importance rankings per multispectral dataset derived by the random forest model used for the velvetleaf and soybean P4928LL classification, June 30, 2014. CA = coastal, B = blue, G = green, Y = yellow, R = red, RE = red-edge, NIR = near infrared, and SWIR = shortwave infrared</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/20-2602354x6.png"/></fig><fig id="fig2"  position="float"><label><xref ref-type="fig" rid="fig2">Figure 2</xref></label><caption><title> Variable importance rankings per multispectral dataset derived by the random forest model used for the velvetleaf and soybean P4928LL classification, September 17, 2014. CA = coastal, B = blue, G = green, Y = yellow, R = red, RE = red-edge, NIR = near infrared, and SWIR = shortwave infrared</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/20-2602354x7.png"/></fig><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Random forest model parameters used with the multispectral datasets to distinguish velvetleaf from two soybean varieties</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Classification</th><th align="center" valign="middle" >Dataset<sup>a</sup></th><th align="center" valign="middle" >mtry<sup>b</sup></th><th align="center" valign="middle" >Ntrees (June 30, 2014)</th><th align="center" valign="middle" >Ntrees (September 17, 2014)</th></tr></thead><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >5 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >7 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >8 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >3500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P4928LL</td><td align="center" valign="middle" >16 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >5 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >7 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >8 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >500</td></tr><tr><td align="center" valign="middle" >Velvetleaf-soybean P5460LL</td><td align="center" valign="middle" >16 Band</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >500</td><td align="center" valign="middle" >4500</td></tr></tbody></table></table-wrap><p><sup>a</sup>Refer to <xref ref-type="table" rid="table1">Table 1</xref> for the spectral band designations of the multispectral datasets. <sup>b</sup>mtry = number of randomly preselected variables; ntrees = number of trees used in the classification.</p><fig id="fig3"  position="float"><label><xref ref-type="fig" rid="fig3">Figure 3</xref></label><caption><title> Variable importance rankings per multispectral dataset derived by the random forest model used for the velvetleaf and soybean P5460LL classification, June 30, 2014. CA = coastal, B = blue, G = green, Y = yellow, R = red, RE = red-edge, NIR = near infrared, and SWIR = shortwave infrared</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/20-2602354x8.png"/></fig><p>portant to the model for the 5 Band multispectral dataset. Distinct differences were observed in the scores, with the G band ranked the most important. Essential spectral bands for the 7 Band dataset model in descending order were G, NIR1, SWIR1, and R. The 8 Band dataset random forest model selected the G, Y, and NIR1 and 2 spectral bands as valuable variables for the classification; the rankings appeared in four distinct tiers: tier one-G, tier two-Y, tier three-NIR2, and tier four-NIR1. Five class tiers was observed for the most important rankings for the 16 Band dataset including the G spectral band in tier one, the Y spectral band in tier two, NIR1 and 2 spectral bands in tier three, SWIR spectral bands one and three in tier four, and the SWIR4 band in tier five.</p><p>Variable importance scores of the random forest models are shown in <xref ref-type="fig" rid="fig4">Figure 4</xref> for the September 17, 2014, velvetleaf P5460LL classification. The NIR2 and G spectral bands were the most useful to the model when using the 5 Band dataset, and their scores were nearly identical. The spectral bands critical to the classification model using the 7 Band dataset were as follows in descending order: NIR1, G, and B. There was an obvious difference in the variable importance scores. Four spectral bands were relevant to the model using the 8 Band dataset: NIR1, NIR2, G, and Y. NIR1, NIR2 and G, and Y spectral bands appeared in the first, second, and third tiers of the rankings, respectively. Eight spectral bands were relevant to the model using the 16 Band dataset, and their rankings in descending order were NIR2, NIR1, G, Y, SWIR1, B, SWIR8, and SWIR7.</p></sec></sec><sec id="s4"><title>4. Discussion</title><p>The objective of this study was to evaluate leaf multispectral reflectance data as input into the random forest classification algorithm to differentiate soybean from velvetleaf, an invasive weed affecting soybean production throughout the United States and eastern provinces of Canada. The study emphasized using different multispectral band combinations as input into the algorithm to differentiate velvetleaf from two different soybean varieties. The algorithm achieved overall, user’s, and producer’s accuracies that were greater than 85% for velvetleaf soybean discrimination (<xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref>), which was comparable to soybean weed discrimination studies using statistical methods [<xref ref-type="bibr" rid="scirp.61884-ref11">11</xref>] and single decision trees [<xref ref-type="bibr" rid="scirp.61884-ref4">4</xref>] to classify airborne imagery. Kappa values indicated that an almost perfect agreement (i.e., kappa value range 0.81 - 1.0) to substantial agreement (i.e., kappa value range 0.61 - 0.80) occurred between the reference data and predicted data (<xref ref-type="table" rid="table2">Table 2</xref> and <xref ref-type="table" rid="table3">Table 3</xref>). The latter was observed only for the 5 Band dataset for the velvetleaf soybean P5460LL classification occurring on September 17, 2014.</p><p>Generally, for all the datasets, the G and NIR spectral bands were ranked as important variables to the models for discriminating velvetleaf from soybean. Plant leaf reflectance and absorption of green light are influenced by leaf chlorophyll content [<xref ref-type="bibr" rid="scirp.61884-ref12">12</xref>] , and may have been responsible for soybean velvetleaf differentiation. The inter-</p><fig id="fig4"  position="float"><label><xref ref-type="fig" rid="fig4">Figure 4</xref></label><caption><title> Variable importance rankings per multispectral dataset derived by the random forest model used for the velvetleaf and soybean P5460LL classification, September 17, 2014. CA = coastal, B = blue, G = green, Y = yellow, R = red, RE = red-edge, NIR = near infrared, and SWIR = shortwave infrared</title></caption><graphic mimetype="image"   position="float"  xlink:type="simple"  xlink:href="http://html.scirp.org/file/20-2602354x9.png"/></fig><p>micellular spaces of plant leaves affect their ability to reflect and absorb near infrared light [<xref ref-type="bibr" rid="scirp.61884-ref12">12</xref>] . Therefore, leaf pigment and internal structure appear to be important components for distinguishing soybean from velvetleaf. Additionally, for the 7 Band and 16 Band datasets, the SWIR bands were important to the models for velvetleaf soybean discrimination; however, the SWIR bands’ importance to a model was date specific. The shortwave infrared reflectance of plant leaves is affected by the water content of the leaf tissues [<xref ref-type="bibr" rid="scirp.61884-ref12">12</xref>] . Furthermore, for the 8 Band and 16 Band datasets, the Y spectral band was consistently ranked as an important variable to the models. Plant leaves reflectance of yellow light is also affected by chlorophyll content of the leaves.</p><p>With the increase in the number of spectral bands, more variables were ranked important to the random forest models (Figures 1-4); however, the increase in the number of bands per se did not always result in an increase in classification accuracy. For example, the number of accuracy test results completed for both dates and soybean varieties equal twenty-four. The 7 Band, 16 Band, 8 Band, and 5 Band datasets ranked or tied for the highest accuracies seventeen, sixteen, twelve, and one time, respectively. The differences in overall, user’s, and producer’s accuracies ranged from 0% to 6.6%, with the lowest accuracies occurring 95% of the time for the 5 Band dataset. For the kappa coefficients, the 5 Band model ranked last 100% of the time. The lower classification accuracies observed for the 5 Band dataset were most likely a result of the broader bandwidths (i.e., 100 nm or greater). Also, the findings indicated that reliable accuracies generally can be achieved using the default mtry and ntree values (<xref ref-type="table" rid="table4">Table 4</xref>).</p><p>To put this study into perspective, leaf multispectral reflectance data were used as input into the random forest model for differentiating the velvetleaf from the soybean varieties. Leaf reflectance measurements represent pure reflectance measurements. Plant canopy response is affected by leaf angle, leaf positioning in the plant canopy, inter-canopy shadowing, soil background, and intermixing of plant canopies. Those aspects could lead to a different variable importance ranking of the spectral bands for plant canopy studies. Additionally, the study focused on binary classifications of soybean versus velvetleaf. Future studies need to focus on determining the potential of discriminating more than one weed at a time from soybean. Overall, this study provided valuable information on using the machine learning technique and on the influence of using different multispectral band combinations as input into the model for velvetleaf soybean discrimination.</p></sec><sec id="s5"><title>5. Conclusion</title><p>This study provided new information on using the random forest algorithm with leaf multispectral reflectance data for differentiating velvetleaf from soybean. It demonstrated that the random forest algorithm could be used with a complement of multispectral datasets to separate velvetleaf from soybean. The best accuracies were achieved with multispectral datasets sensitive to visible (green and yellow spectral bands), near infrared, and shortwave infrared light. Findings support further application of the random forest machine learner along with remotely-sensed multispectral data as tools for velvetleaf soybean discrimination with future implications for site-specific management of velvetleaf.</p></sec><sec id="s6"><title>Acknowledgements and Disclaimer</title><p>The author is grateful to Dr. Vijay Nandula for supplying the velvetleaf seed, Mr. Milton Gaston Jr., Mr. Arrington Smith, Ms. Keysha Hamilton, Mr. David Fisher, Ms. Raven Thompson, and Ms. Keyanna Nealon for their assistance in data collection, and Dr. Ken Fisher and Dr. Chenghai Yang for their critical review of the manuscript. Mention of trade names or commercial products in this report is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the US Department of Agriculture.</p></sec><sec id="s7"><title>Cite this paper</title><p>Reginald S.Fletcher, (2015) Testing Leaf Multispectral Reflectance Data as Input into Random Forest to Differentiate Velvetleaf from Soybean. American Journal of Plant Sciences,06,3193-3204. doi: 10.4236/ajps.2015.619311</p></sec></body><back><ref-list><title>References</title><ref id="scirp.61884-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Lanini, W.T. and Wertz, B.A. (2015) Velvetleaf. Penn State Extension. http://extension.psu.edu/pests/weeds/weed-id/velvetleaf</mixed-citation></ref><ref id="scirp.61884-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Koger, C.H., Bruce, L.M., Shaw, D.R. and Reddy, K.N. (2003) Wavelet Analysis of Hyperspectral Reflectance Data for Detecting Pitted Morning Glory (Ipomoea lacunosa) in Soybean (Glycine max). Remote Sensing Environment, 86, 108-119. http://dx.doi.org/10.1016/S0034-4257(03)00071-3</mixed-citation></ref><ref id="scirp.61884-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Smith, A.M. and Blackshaw, R.E. (2003) Weed-Crop Discrimination Using Remote Sensing: A Detached Leaf Experiment. Weed Technology, 17, 811-820. http://dx.doi.org/10.1614/WT02-179</mixed-citation></ref><ref id="scirp.61884-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Yang, C.C., Prasher, S.O. and Goel, P.K. (2004) Differentiation of Crop and Weeds by Decision-Tree Analysis of Multi-Spectral Data. Transactions of the ASAE, 47, 873-879. http://dx.doi.org/10.13031/2013.16084</mixed-citation></ref><ref id="scirp.61884-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Iqbal, J., Owens, P.R. and Ali., I. (2006) Application of Remote Sensing Data to Assess Weed Infestation in Cotton. Agricultural Journal, 1, 186-191.</mixed-citation></ref><ref id="scirp.61884-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Gómez-Casero, M.T., Castillejo-González, I.L. and García-Ferrer, A. (2010) Spectral Discrimination of Wild Oat and Canary Grass in Wheat Fields for Less Herbicide Application. Agronomy for Sustainable Development, 30, 689-699. http://dx.doi.org/10.1051/agro/2009052</mixed-citation></ref><ref id="scirp.61884-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Nieuwenhuizen, A.T., Hofstee, J.W., van de Zande, J.C., Meuleman, J. and van Henten, E.J. (2010) Classification of Sugar Beet and Volunteer Potato Reflection Spectra with a Neural Network and Statistical Discriminant Analysis to Select Discriminative Wavelengths. Computers Electronics in Agriculture, 73, 146-153. http://dx.doi.org/10.1016/j.compag.2010.05.008</mixed-citation></ref><ref id="scirp.61884-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">de Castro, A.I., Jurado-Expósito, M., Gómez-Casero, M.T. and López-Granados, F. (2012) Applying Neural Networks to Hyperspectral and Multispectral Field Data for Discrimination of Cruciferous Weeds in Winter Crops. Science World Journal, Article ID: 630390. http://dx.doi.org/10.1100/2012/630390</mixed-citation></ref><ref id="scirp.61884-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Lamb, D.W. and Brown, R.B. (2001) Remote-Sensing and Mapping of Weeds in Crops. Journal of Agricultural Engneering Research, 78, 117-125. http://dx.doi.org/10.1006/jaer.2000.0630</mixed-citation></ref><ref id="scirp.61884-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Goel, P.K., Prasher, S.O., Patel, R.M., Smith, D.L. and Di Tommaso, A. (2002) Use of Airborne Multi-Spectral Imagery for Weed Detection in Field Crops. Transactions of American Society of Agricultural Engineers, 45, 443-449.</mixed-citation></ref><ref id="scirp.61884-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Gibson, K.D., Dirks, R., Medlin, C.R. and Johnston, L. (2004) Detection of Weed Species in Soybean Using Multispectral Digital Images. Weed Technology, 18, 742-749. http://dx.doi.org/10.1614/WT-03-170R1</mixed-citation></ref><ref id="scirp.61884-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Gausman, H. (1985) Plant Leaf Optical Properties. Texas Tech Press, Lubbock.</mixed-citation></ref><ref id="scirp.61884-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Fernández-Delgado, M., Cernadas, E., Barro, S. and Amorim, D. (2014) Do We Need Hundreds of Classifiers to Solve Real World Classification Problems? Journal of Machine Learning Research, 15, 3133-3181.</mixed-citation></ref><ref id="scirp.61884-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Gislason, P.O., Benediktsson, J.A. and Sveinsson, J.R. (2006) Random Forests for Land Cover Classification. Pattern Recognition Letters, 27, 294-300. http://dx.doi.org/10.1016/j.patrec.2005.08.011</mixed-citation></ref><ref id="scirp.61884-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Strobl, C., Malley, J. and Tutz, G. (2009) An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests. Psychological Methods, 14, 323-348. http://dx.doi.org/10.1037/a0016973</mixed-citation></ref><ref id="scirp.61884-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Goldstein, B.A., Polley, E.C. and Briggs, F.B.S. (2011) Random Forest for Genetic Association Studies. Applications in Genetics and Molecular Biology, 10, 1-34. http://dx.doi.org/10.2202/1544-6115.1691</mixed-citation></ref><ref id="scirp.61884-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Ok, A.O., Akar, O. and Gungor, O. (2012) Evaluation of Random Forest Method for Agricultural Crop Classification. European Journal of Remote Sensing, 45, 421-432.</mixed-citation></ref><ref id="scirp.61884-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Breiman, L. (2001) Random Forests. Machine Learning, 45, 5-32. http://dx.doi.org/10.1023/A:1010933404324</mixed-citation></ref><ref id="scirp.61884-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">US Geological Survey (2015) Frequently Asked Questions about the Landsat Missions. http://landsat.usgs.gov/best_spectral_bands_to_use.php</mixed-citation></ref><ref id="scirp.61884-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Digital Globe (2010) The Benefits of the Eight Spectral Bands of WorldView 2. http://global.digitalglobe.com/sites/default/files/DG-8SPECTRAL-WP_0.pdf</mixed-citation></ref><ref id="scirp.61884-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Digital Globe (2014) WorldView 3 Data Sheet. https://dg-cms-uploads-production.s3.amazonaws.com/uploads/document/file/95/DG_WorldView3_DS_forWeb_0.pdf</mixed-citation></ref><ref id="scirp.61884-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Lehnert, L.W., Meyer, H. and Bendix, J. (2015) Hsdar: Manage, Analyse and Simulate Hyperspectral Data in R. R Package Version 0.3.0. https://cran.r-project.org/web/packages/hsdar/index.html</mixed-citation></ref><ref id="scirp.61884-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Hothorn, T., Buehlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. (2006) Survival Ensembles. Biostatistics, 7, 355-373. http://dx.doi.org/10.1093/biostatistics/kxj011</mixed-citation></ref><ref id="scirp.61884-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Congalton, R. and Green, K. (2009) Assessing the Accuracy of Remotely Sensed Data: Principles and Practices. 2nd Edition, CRC/Taylor &amp; Francis, Boca Raton, 183 p.</mixed-citation></ref><ref id="scirp.61884-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Hothorn, T., Hornik, K. and Zeileis, A. (2006) Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15, 651-674. http://dx.doi.org/10.1198/106186006X133933</mixed-citation></ref><ref id="scirp.61884-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Strobl, C., Boulesteix, A.L., Zeileis, A. and Hothorn, T. (2007) Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution. BMC Bioinformatics, 8, 25. http://dx.doi.org/10.1186/1471-2105-8-25</mixed-citation></ref><ref id="scirp.61884-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T. and Zeileis, A. (2008) Conditional Variable Importance for Random Forests. BMC Bioinformatics, 9, 307. http://www.biomedcentral.com/1471-2105/9/307 http://dx.doi.org/10.1186/1471-2105-9-307</mixed-citation></ref></ref-list></back></article>