<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JCC</journal-id><journal-title-group><journal-title>Journal of Computer and Communications</journal-title></journal-title-group><issn pub-type="epub">2327-5219</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jcc.2022.1012006</article-id><article-id pub-id-type="publisher-id">JCC-121980</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject></subj-group></article-categories><title-group><article-title>
 
 
  Supply Chain Demand Forecast Based on SSA-XGBoost Model
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Shifeng</surname><given-names>Ni</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yan</surname><given-names>Peng</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Ke</surname><given-names>Peng</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Zijian</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China</addr-line></aff><pub-date pub-type="epub"><day>15</day><month>12</month><year>2022</year></pub-date><volume>10</volume><issue>12</issue><fpage>71</fpage><lpage>83</lpage><history><date date-type="received"><day>21,</day>	<month>November</month>	<year>2022</year></date><date date-type="rev-recd"><day>24,</day>	<month>December</month>	<year>2022</year>	</date><date date-type="accepted"><day>27,</day>	<month>December</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Supply chain management usually faces problems such as high empty rate of transportation, unreasonable inventory management, and large material consumption caused by inaccurate market demand forecasts. To solve these problems, using artificial intelligence and big data technology to achieve market demand forecasting and intelligent decision-making is becoming a strategic technology trend of supply chain management in the future. Firstly, this paper makes a visual analysis of the historical data of the Stock Keeping Unit (SKU); Then, the characteristic factors affecting the future demand are constructed from the storage level, product level, historical usage of SKU, etc; Finally, a supply chain demand forecasting algorithm based on SSA-XGBoost model has proposed around three aspects of feature engineering, parameter optimization and model integration, and is compared with other machine learning models. The experiment shows that the forecasting result of SSA-XGBoost forecasting model is highly consistent with the actual value, so it is of practical significance to adopt this forecasting model to solve the supply chain demand forecasting problem.
 
</p></abstract><kwd-group><kwd>Data Visualization Analysis</kwd><kwd> SSA-XGBoost</kwd><kwd> Supply Chain</kwd><kwd> Demand Forecast</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The development of supply chain management has experienced a transition from a relatively simple labor-intensive model to a relatively complex global functional network model. With the development of manufacturing globalization, the enterprise resource planning (ERP) system has greatly improved the availability and accuracy of data, and the term “supply chain” has been widely recognized [<xref ref-type="bibr" rid="scirp.121980-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref3">3</xref>]. With the maturity of the supply chain management model, the application of computer technology to supply chain management activities is conducive to enhancing the trust and cooperation between upstream and downstream nodes, and promoting the digital transformation of the supply chain.</p><p>The research on supply chain demand forecasting can be mainly conducted from the following perspectives: The first is from the perspective of long-term forecast and short-term forecast. The former usually forecasts the demand by year or quarter, while the latter usually forecasts the demand by month or day [<xref ref-type="bibr" rid="scirp.121980-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref5">5</xref>]. The second is from the perspective of single method and integrated method. The former only uses one algorithm model [<xref ref-type="bibr" rid="scirp.121980-ref6">6</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref8">8</xref>], while the latter uses two or more algorithms to build an integrated model for demand forecasting [<xref ref-type="bibr" rid="scirp.121980-ref9">9</xref>]. The third is to use different model methods. It mainly includes: mathematical statistics methods [<xref ref-type="bibr" rid="scirp.121980-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.121980-ref12">12</xref>], such as grey prediction model, exponential smoothing method [<xref ref-type="bibr" rid="scirp.121980-ref13">13</xref>], ARIMA model [<xref ref-type="bibr" rid="scirp.121980-ref14">14</xref>], etc; Prediction models based on machine learning [<xref ref-type="bibr" rid="scirp.121980-ref15">15</xref>], such as support vector machine, decision tree, random forest regression, etc; Prediction models based on deep learning [<xref ref-type="bibr" rid="scirp.121980-ref16">16</xref>], such as BP neural network [<xref ref-type="bibr" rid="scirp.121980-ref17">17</xref>], LSTM neural network, etc.</p><p>In general, mathematical statistics method is still the mainstream method of supply chain demand forecasting, while machine learning and deep learning methods based on big data mining are relatively few. The traditional mathematical statistics methods are mostly used for forecasting data series with obvious time series law and stable trend. In real scenarios, the changing trend and fluctuation of supply chain demand are unstable, and the changes of various factors have different effects on demand. Therefore, this paper proposes a SSA-XGBoost model for supply chain demand forecasting from three aspects of feature analysis, optimization and demand forecasting. The experiment takes the historical consumption data of SKU as the demand, analyzes the influence degree of each influencing factor on the demand, combines the demand and influencing factors to build a data set as the input of SSA-XGBoost forecasting model, obtains the demand forecasting results, and compares them with different models.</p></sec><sec id="s2"><title>2. Theoretical Basis of Related Research</title><sec id="s2_1"><title>2.1. XGBoost Model</title><p>XGBoost is an integrated learning algorithm in machine learning that takes decision tree as the base classifier and proposes optimization based on GBDT principle [<xref ref-type="bibr" rid="scirp.121980-ref18">18</xref>]. Integrated learning is a technical framework, which trains several weak models to complete machine learning tasks with a certain combination strategy. XGBoost proposes optimization on the basis of GBDT, which has greatly improved accuracy and speed, enabling the model to perform better in big data processing projects. The optimization of XGBoost mainly includes:</p><p>The regularization term is added to the cost function to control the complexity of the model and reduce the possibility of over fitting.</p><p>Different from the way that GBDT uses Gini coefficients, XGBoost obtains the node splitting mode after optimization and derivation. In this way, XGBoost can automatically learn its splitting direction even if there are missing values in the samples of the feature set.</p><p>To determine the best segmentation point, the decision tree needs to sort the eigenvalues in the training process, which is a time-consuming step. Before training, XGBoost will sort the eigenvalues and save them as block structures. This structure will be reused in subsequent iterations to reduce the amount of computation. At the same time, this block structure makes it possible to calculate the gain of each feature by multithreading.</p><p><xref ref-type="table" rid="table1">Table 1</xref> shows several important parameters of XGBoost algorithm. The values of these parameters determine the size and prediction accuracy of XGBoost model. For example, the smaller the value of n_estimators, the easier the model is to be under fitted, while the larger the value, the more over fitted the model will be, and the longer the training time of the model will be. Each parameter has a different value range, and its combination will produce a large number of combinations. XGBoost’s traditional method of finding the optimal parameter combination is to first set several values for each parameter according to experience, and then select the combination with the highest accuracy by calculating the accuracy of each combination. Then, within a certain range of the combination, grid search is carried out to select the optimal combination. The traditional method is easy to understand and implement, but its disadvantage is that it has a large amount of computation and is easy to fall into the local optimal value.</p></sec><sec id="s2_2"><title>2.2. Sparrow Search Algorithm</title><p>Sparrow search algorithm (SSA) is an algorithm that simulates the process of sparrow group foraging. The traditional SSA algorithm focuses on the update of a single parameter. This research will improve the SSA algorithm, so that the model parameters can be updated in different ranges in the form of combination. In this way, SSA algorithm can be combined with XGBoost algorithm. In formula (1), X represents the current position of the population composed of n sparrows, and m is the number of parameters to be optimized.</p><p>X = [ x 1 , 1 x 1 , 2 x 2 , 1 x 2 , 2 ⋯ x 1 , m ⋯ x 2 , m ⋮ ⋮ x n , 1 x n , 2 ⋱ ⋮ ⋯ x n , m ] (1)</p><p>In SSA, individual fitness value is used to evaluate the level of sparrows’ energy reserves. The fitness value matrix of the whole sparrow population is shown in Formula (2):</p><p>F X = [ f ( [ x 1 , 1 x 1 , 2 ⋯ x 1 , m ] ) ⋮ f ( [ x i , 1 x i , 2 ⋯ x i , m ] ) ⋮ f ( [ x n , 1 x n , 2 ⋯ x n , m ] ) ] (2)</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Meaning of important parameters of XGBoost algorithm</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Parameter</th><th align="center" valign="middle" >Meaning</th></tr></thead><tr><td align="center" valign="middle" >n_estimators</td><td align="center" valign="middle" >Number of basic models</td></tr><tr><td align="center" valign="middle" >max_depth</td><td align="center" valign="middle" >Maximum depth of decision tree</td></tr><tr><td align="center" valign="middle" >learning_rate</td><td align="center" valign="middle" >Weight of the model generated by each iteration (learning rate)</td></tr><tr><td align="center" valign="middle" >objective</td><td align="center" valign="middle" >Given loss function</td></tr><tr><td align="center" valign="middle" >reg_alpha</td><td align="center" valign="middle" >Weight of L1 regular term</td></tr><tr><td align="center" valign="middle" >reg_lambda</td><td align="center" valign="middle" >Weight of L2 regular term</td></tr></tbody></table></table-wrap><p>In Equation (2), f ( [ x i , 1 x i , 2 ⋯ x i , m ] ) is used to calculate the fitness value of the ith sparrow’s current position, where i ∈ [ 1 , n ] . F ′ X can be obtained by sorting fitness values. Let p be the discoverer ratio ( p ∈ ( 0 , 0.5 ] ), then the first f sparrows in F ′ X are discoverers, where f = n&#215; p.</p><p>In the process of foraging, there are two roles of sparrows: discoverer and joiner [<xref ref-type="bibr" rid="scirp.121980-ref19">19</xref>]. The discoverer refers to the sparrow with high energy storage level in the population. The responsibility of the discoverer is to find areas rich in food, and provide the scope and direction of foraging for the joiners of the group. When the finder finds the predator, it will immediately send a warning to the group and fly to other safe areas for feeding. Equation (3) describes the location update rule of discoverer in the improved SSA algorithm:</p><p>X i t + 1 = { G r i d S e a r c h C V ( X i t , R ) if   w &lt; s X i t + Q ∗ L if   w &gt; s (3)</p><p>In Formula (3), t represents the number of current iterations, X i t represents the position information of the ith sparrow at time t, and R represents a matrix of 1 &#215; m. Meanwhile, w represents the optimal individual fitness value among the discoverers at time t ( w ∈ [ 0 , 1 ] ), and s represents the safety value ( s ∈ [ 0.5 , 1 ] ). When w&lt; s, it means that there are no predators around the foraging environment at time t, and the finder performs grid search in [ X i t − R , X i t + R ] to find the location of the local optimal fitness value; Q and L both represent a matrix of 1 &#215; m, where each element in L is −1 or 1, and the operation symbol ∗ represents Hadamard product; When w&gt; s, it means that there are predators around at time t. Some discoverers have found predators and sent alarm signals to other individuals in the group. At this time, all sparrows in the group need to fly to other safe areas for feeding.</p><p>The remaining individuals of the group are the joiners. Some of the joiners with high reserve energy are mainly responsible for monitoring the discoverer. Once they know that the discoverer has found a better place to look for food, they will immediately fly near the discoverer to compete with it. If the energy reserve of the joiner is higher than that of the discoverer after the location update, the roles of the two will be exchanged to keep the proportion of the two in the population unchanged. The remaining joiners are very hungry due to their low energy reserves, and they will fly to other places to find food in order to obtain more energy. Equation (4) describes the rules for updating the joiners’ location in the improved SSA algorithm:</p><p>X i t + 1 = { X i t + I ∗ X w o r s t t − X i t i 2 if   i &gt; n / 2 X b e s t t + Q ∗ L otherwise (4)</p><p>In formula (4), I represents a matrix of 1 &#215; m, and the elements in matrix I conform to the normal distribution with the distribution center of 0 and the standard deviation of 1; X w o r s t t represents the worst position occupied by individual population at time t; X b e s t t represents the optimal position occupied by the discoverer at time t; When i&gt; n/2, the i-th joiner with low fitness is hungry and must fly to other places for feeding. Other joiners monitored and robbed food around the discoverer.</p><p>After the location update of the discoverer and the joiner is completed, 20% of the sparrows in the group will realize that they are on the edge of the group and are easy to be attacked by predators. This part of the sparrow will immediately move to the safe area to get a safer position. In the improved SSA algorithm, the position update of sparrows at the edge of the group is as shown in Formula (5):</p><p>X i t + 1 = X i t + β ⋅ ( f i − f b e s t ) f w o r s t − f b e s t ⋅ ( X b e s t t − X i t )     if   i ≥ θ ⋅ n (5)</p><p>In Equation (5), f i represents the fitness value of the ith sparrow at time t; f b e s t and f w o r s t represents the best and worst fitness values of the current population respectively; β is a constant ( β ∈ ( 0 , 1 ) ), used to avoid that the fraction is equal to 1; θ is a constant ( θ ∈ ( 0 , 1 ) ), set to 0.8 here.</p></sec></sec><sec id="s3"><title>3. Experimental Data Set</title><p>The experiment shows the temporal distribution of historical demand data by visualizing the data, and confirms the influence of product level and geographical level of the unit on demand through one-way ANOVA, providing decision support for the subsequent supply chain demand forecasting link.</p><sec id="s3_1"><title>3.1. Dataset Introduction</title><p>The dataset used in the experiment is from the 2021 Alibaba Cloud Infrastructure Supply Chain Competition, including training set, test set, geographic level information dataset and product level information dataset. The data set contains 632 product unit historical demand information. The timing range of the training set is from June 6, 2018 to March 1, 2021, with a total of 284,832 pieces of data. The time series range of the test set is from March 2, 2021 to June 7, 2021, with a total of 61,936 pieces of data. None of the data sets have missing values. <xref ref-type="table" rid="table2">Table 2</xref> shows the data field details of training set and test set. <xref ref-type="table" rid="table3">Table 3</xref> and <xref ref-type="table" rid="table4">Table 4</xref> respectively show the geographic level information dataset and product level information dataset. It can be seen that the unit name, geographic level information and product level information of the dataset are desensitized. During the experiment, the geographic level information and product level information need to be labeled with numbers for subsequent model training and prediction.</p></sec><sec id="s3_2"><title>3.2. Data Visualization and Analysis</title><p><xref ref-type="fig" rid="fig1">Figure 1</xref> shows the historical data change curve of 6 units randomly selected. It can be seen that the historical demand change of most units shows an overall rise over time, while that of a few units shows an overall rise first and then a decline, accompanied by a sharp increase or decrease in demand. In general, between different units, the time sequence of demand change is different.</p><p><xref ref-type="fig" rid="fig2">Figure 2</xref> shows the histogram of historical average demand of different categories at GL1 and PL1 levels on March 1, 2021. At the same time, the historical demand values of different categories at the geographical level and product level are quite different. <xref ref-type="table" rid="table5">Table 5</xref> shows the ANOVA results of GL1 and PL1 level categories and historical demand data on March 1, 2021. The PR values of both are greater than 0.05, which indicates that at the significance level of 0.05, there is a significant difference in the average unit historical demand between different categories of geographic and product levels, and there is a correlation between characteristics (geographic and product levels) and forecast variables (unit demand).</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Data field details of training set and test set</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Column name</th><th align="center" valign="middle" >Description</th><th align="center" valign="middle" >First row data</th></tr></thead><tr><td align="center" valign="middle" >Unit</td><td align="center" valign="middle" >Cargo unit</td><td align="center" valign="middle" >9b8f48bacb1a63612f3a210ccc6286cc</td></tr><tr><td align="center" valign="middle" >ts</td><td align="center" valign="middle" >Date</td><td align="center" valign="middle" >2018/6/4</td></tr><tr><td align="center" valign="middle" >qty</td><td align="center" valign="middle" >Resource usage</td><td align="center" valign="middle" >11926.8286</td></tr><tr><td align="center" valign="middle" >Geography</td><td align="center" valign="middle" >Geographic information of geography_level_3</td><td align="center" valign="middle" >36ab7b000da26b0547bfc3c3fdf143dc</td></tr><tr><td align="center" valign="middle" >Product</td><td align="center" valign="middle" >Product information of product_level_2</td><td align="center" valign="middle" >5cc8015f03554313900f069182bdaf9c</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Geographic level information dataset</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Column name</th><th align="center" valign="middle" >Abbreviation</th><th align="center" valign="middle" >First row data</th></tr></thead><tr><td align="center" valign="middle" >Geography_level_1</td><td align="center" valign="middle" >GL1</td><td align="center" valign="middle" >860b874ad79c7f2072b4dd24a952f027</td></tr><tr><td align="center" valign="middle" >Geography_level_2</td><td align="center" valign="middle" >GL2</td><td align="center" valign="middle" >50fcfda7a3aeb86c6dabc2bd467067b0</td></tr><tr><td align="center" valign="middle" >Geography_level_3</td><td align="center" valign="middle" >GL3</td><td align="center" valign="middle" >0b41e006d801550cba4c38157729bf87</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Product level information dataset</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Column name</th><th align="center" valign="middle" >Abbreviation</th><th align="center" valign="middle" >First row data</th></tr></thead><tr><td align="center" valign="middle" >Product_level_1</td><td align="center" valign="middle" >PL1</td><td align="center" valign="middle" >2eb2930111864beeb409e946751215b1</td></tr><tr><td align="center" valign="middle" >Product_level_2</td><td align="center" valign="middle" >PL2</td><td align="center" valign="middle" >ba569a9ecbaa99645f9baf060d5061e7</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> ANOVA Results of GL1 and PL1 level categories and historical demand data</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Name</th><th align="center" valign="middle" >PR (&gt;F)</th></tr></thead><tr><td align="center" valign="middle" >GL1</td><td align="center" valign="middle" >0.071461</td></tr><tr><td align="center" valign="middle" >PL1</td><td align="center" valign="middle" >0.316654</td></tr></tbody></table></table-wrap></sec></sec><sec id="s4"><title>4. Experiment of Fresh Food E-Commerce Logistics Demand Forecasting</title><sec id="s4_1"><title>4.1. Preliminary Work</title><p>The experiment was carried out on Windows 10 system, using PyCharm2022.2.3 and Python 3.9 as experimental tools. According to the geography and product columns of <xref ref-type="table" rid="table2">Table 2</xref>, merge the data of <xref ref-type="table" rid="table2">Table 2</xref> with <xref ref-type="table" rid="table3">Table 3</xref> and <xref ref-type="table" rid="table4">Table 4</xref>. At the same time, according to the time information, the characteristics “holiday”, “month”, “week” and “weekday” are constructed to represent whether the current time is a holiday, the month, the week of the year and the day of the week. Finally, the sliding window statistics is used to construct the features “LW”, “data_smooth”, “mean”, “max”, “min”, and “std”, which represent the historical value, exponential smoothing value, mean value, maximum value, minimum value, and standard deviation of the previous seven days, respectively. To sum up, 15 influencing factors, namely “GL3”, “GL2”, “GL1”, “PL2”, “PL1”, “holiday”, “month”, “week”, “weekday”, “LW”, “data_smooth”, “mean”, “max”, “min” and “std”, are selected as the characteristics of the data set, and “qty” is used as the label data set.</p></sec><sec id="s4_2"><title>4.2. Establishment of Evaluation Index</title><p>The coefficient of determination (R<sup>2</sup>), mean square error (MSE), root mean square error (RMSE) and mean absolute error (MAE) were selected as evaluation indicators to evaluate the prediction performance of the model. Coefficients of determination can be used to determine the goodness of fit between the predicted value and the actual value. The closer the result is to 1, the higher the goodness of fit is. The specific formula is as follows:</p><p>R 2 = 1 − ∑ i = 1 k ( y i − y ^ i ) 2 ∑ i = 1 k ( y i − y &#175; i ) 2 (6)</p><p>In the formula, y i is the true value, y &#175; i is the average of the true value, y ^ i is the predicted value, and k is the number of data items in the dataset. When R 2 = 1 , it means that the predicted value of the model is equal to the true value, and the prediction accuracy of the model is high; When R 2 = 0 , it means that the predicted values of the model are equal to the mean value, and the prediction accuracy of the model is low; When R 2 &lt; 0 , it means that the model cannot predict accurately.</p><p>The mean square error refers to the mean value of the square sum of the corresponding point errors of the prediction data and the original data. The closer the result is to 0, the smaller the model prediction error is. The specific formula is as follows:</p><p>MSE = 1 k ∑ i = 1 k ( y i − y ^ i ) 2 (7)</p><p>The root mean square error is the arithmetic square root of the mean square error, which is very sensitive to the reflection of outliers in measurement and can reflect the dispersion of samples.</p><p>RMSE = MSE 2 (8)</p><p>The average absolute error refers to the average value of the distance between the predicted value of the model and the true value of the sample, which can better reflect the actual situation of the predicted value error.</p><p>MAE = 1 m ∑ i = 1 m | y i − y ^ i | (9)</p></sec><sec id="s4_3"><title>4.3. Construction of SSA-XGBoost Model</title><p>By integrating SSA model and XGBoost model, a SSA-XGBoost model is proposed. SSA-XGBoost model can automatically find the global optimal parameter combination of XGBoost model to improve the accuracy of XGBoost model. <xref ref-type="table" rid="table6">Table 6</xref> shows the specific flow of SSA-XGBoost algorithm, where the function F(X) is to calculate the fitness value of the position of the sparrow individual. Substitute the position information of the sparrow individual at the current time into the XGBoost model to obtain the prediction data of the verification set. Using the prediction data and the real data, the fitness value of the sparrow individual’s position at the current time can be obtained. The formula is as follows:</p><p>F V j = ∑ i = 1 k ( y i − y ^ i ) 2 ∑ i = 1 k ( y i − y &#175; i ) 2 , j ∈ [ 0 , n ] (10)</p><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> SSA-XGBoost algorithm</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Input: p(population quantity), M(iterations), dim(dimension), d(proportion of discoverers) 1. Define the objective function F(X), where the variable X = (X<sub>1</sub>, X<sub>2</sub>, …, X<sub>n</sub>), the variable X<sub>i</sub> = (x<sub>1</sub>, x<sub>2</sub>, …, x<sub>m</sub>) 2. Randomly initialize the positions of n sparrows X and define their relevant parameters 3. Use function F(X) to obtain the population fitness value matrix 4. for i ← 0 to M do 5. Sort the fitness value matrix from small to large to obtain a matrix named fit 6. R = min(fit) 7. if R &lt; safety value then 8. for i ← 0 to p*d do 9. Update sparrow position using G r i d S e a r c h C V ( X i t , R ) in formula (3) 10. end for 11. else 12. for i ← 0 to p*d do 13. Update sparrow position using X i t + Q ∗ L in formula (3) 14. end for 15. for i ← 0 to p*(1-d) do 16. if (i + p*d) &gt; p/2 then 17. Update sparrow position using X i t + I ∗ X w o r s t t − X i t i 2 in formula (4) 18. else 19. Update sparrow position using X b e s t t + Q ∗ L in formula (4) 20. end for 21. θ ← 0.8 22. for i ← 0 to p do 23. if i &gt;= θ*p then Update sparrow position using X i t + β ⋅ ( f i − f b e s t ) f w o r s t − f b e s t ⋅ ( X b e s t t − X i t ) in formula (5) 24. end for 25. Update the global optimal fitness value and the individual optimal position of the population 26. end for Outout: BestX: location of global optimal fitness value Convergence_Curve: global optimal fitness value for each iteration</th></tr></thead></tbody></table></table-wrap></sec><sec id="s4_4"><title>4.4. Model Training</title><p>Parameters n_estimators, max_depth and learning_rate are selected as optimization objectives in the experiment, and the corresponding upper and lower limits of each parameter are shown in <xref ref-type="table" rid="table7">Table 7</xref>. Each parameter is randomly initialized between the upper and lower limits.</p><p>The experiment takes the training set as the input of SSA-XGBoost model, extracts 0.3 as the verification set, sets the number of training rounds as 15, and outputs the minimum fitness value of each round of training and its corresponding parameter combination value. As can be seen from <xref ref-type="fig" rid="fig3">Figure 3</xref>, with the increase of iteration times, the fitness value of SSA-XGBoost model on the training set is declining, which proves that the model can automatically find a better value and is not easy to fall into the local optimal value. The best parameter combination of the final output of the model is [387, 16, 0.03]. That is, n_estimators is 387, max_depth is 16, and learning_rate is 0.03.</p></sec><sec id="s4_5"><title>4.5. Demand Forecast and Evaluation Analysis</title><p>The experiment uses the trained SSA-XGBoost model (SSAX) to forecast the demand of the test set, and selects ARIMA, exponential smoothing (ES), decision tree (DT), GBDT, XGBoost (XGB) models as the comparison models. Randomly select the prediction results of a unit. The comparison between the predicted value and the real value of the unit is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>. It can be seen that the SSA-XGBoost model has a good prediction effect on the test set. It can not only better predict the trend of data changes, but also has a small prediction error.</p><p>According to the model prediction results, the determination coefficient, mean square error, root mean square error and mean absolute error of different models are calculated. According to the evaluation index results of each model in <xref ref-type="table" rid="table8">Table 8</xref>, the SSA-XGBoost model has the best fitting result among the six models, with the highest R<sup>2</sup> value of 0.988, indicating that SSA-XGBoost model has the best prediction effect on the experimental data set compared with the other five models.</p><table-wrap id="table7" ><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> List of parameters to be optimized</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Parameter</th><th align="center" valign="middle" >Upper limit</th><th align="center" valign="middle" >Lower limit</th><th align="center" valign="middle" >Common difference</th></tr></thead><tr><td align="center" valign="middle" >n_estimators</td><td align="center" valign="middle" >400</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >1</td></tr><tr><td align="center" valign="middle" >max_depth</td><td align="center" valign="middle" >30</td><td align="center" valign="middle" >5</td><td align="center" valign="middle" >1</td></tr><tr><td align="center" valign="middle" >learning_rate</td><td align="center" valign="middle" >0.3</td><td align="center" valign="middle" >0.01</td><td align="center" valign="middle" >0.01</td></tr></tbody></table></table-wrap><table-wrap id="table8" ><label><xref ref-type="table" rid="table8">Table 8</xref></label><caption><title> Comparison of evaluation indicators of different models</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Model</th><th align="center" valign="middle" >ARIMA</th><th align="center" valign="middle" >ES</th><th align="center" valign="middle" >SSAX</th><th align="center" valign="middle" >DT</th><th align="center" valign="middle" >GBDT</th><th align="center" valign="middle" >XGB</th></tr></thead><tr><td align="center" valign="middle" >R<sup>2</sup></td><td align="center" valign="middle" >0.9279</td><td align="center" valign="middle" >0.7647</td><td align="center" valign="middle" >0.988</td><td align="center" valign="middle" >0.7224</td><td align="center" valign="middle" >0.9254</td><td align="center" valign="middle" >0.9524</td></tr><tr><td align="center" valign="middle" >MSE</td><td align="center" valign="middle" >7319.2118</td><td align="center" valign="middle" >7362.5743</td><td align="center" valign="middle" >703.5428</td><td align="center" valign="middle" >1083.3392</td><td align="center" valign="middle" >772.3229</td><td align="center" valign="middle" >964.4053</td></tr><tr><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >20.162</td><td align="center" valign="middle" >20.718</td><td align="center" valign="middle" >2.4518</td><td align="center" valign="middle" >2.1164</td><td align="center" valign="middle" >3.4001</td><td align="center" valign="middle" >4.936</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >85.5524</td><td align="center" valign="middle" >85.8054</td><td align="center" valign="middle" >26.5243</td><td align="center" valign="middle" >32.9141</td><td align="center" valign="middle" >27.7906</td><td align="center" valign="middle" >31.0548</td></tr></tbody></table></table-wrap></sec></sec><sec id="s5"><title>5. Conclusions</title><p>Aiming at the problem of difficult supply chain demand forecasting, this paper uses enterprise historical data as a data set to analyze the influence of various influencing factors on demand, and proposes a supply chain demand forecasting model based on SSA-XGBoost model.</p><p>The future demand of this category can be effectively predicted by using the geographical level information, product level information, time information and historical consumption information of SKU. The experiment proves that in the face of no obvious time sequence law for the change of single SKU demand and the overall change trend of each SKU demand is different, the SSA-XGBoost model can not only realize automatic parameter searching, but also accurately predict the daily demand of each SKU by improving the SSA parameter combination update method.</p><p>Automatic parameter searching can improve the problem that manual parameter searching is easy to fall into local optimal value. The supply chain demand forecasting model based on SSA-XGBoost can take into account the overall change trend of different SKU demand and the impact of changes in various factors on demand. It is applicable to the supply chain demand forecasting problem. It can help enterprises quickly and accurately feedback the demand information to the production side, reduce the information gap, reduce the transportation and inventory costs, and thus improve the efficiency of the entire supply chain.</p><p>In future research, the SKU demand forecast results and inventory situation can be used to build an inventory control and ordering model suitable for supply chain management to achieve efficient operation and reduce inventory costs.</p></sec><sec id="s6"><title>Acknowledgements</title><p>The research was supported by Science and Technology Plan of Zigong Science and Technology Bureau (Grant No. 2018GYCX33) and the Innovation Fund of Postgraduate, Sichuan University of Science &amp; Engineering (Grant No. y2021096).</p></sec><sec id="s7"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s8"><title>Cite this paper</title><p>Ni, S.F., Peng, Y., Peng, K. and Liu, Z.J. (2022) Supply Chain Demand Forecast Based on SSA-XGBoost Model. Journal of Computer and Communications, 10, 71-83. https://doi.org/10.4236/jcc.2022.1012006</p></sec></body><back><ref-list><title>References</title><ref id="scirp.121980-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Lummus, R.R., Krumwiede, D.W. and Vokurka, R.J. (2001) The Relationship of Logistics to Supply Chain Management: Developing a Common Industry Definition. Industrial Management &amp; Data Systems, 101, 426-431. https://doi.org/10.1108/02635570110406730</mixed-citation></ref><ref id="scirp.121980-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Liu, J., Zhang, S. and Cao, W.J. (2002) A Case Study of an Inter-Enterprise Workflow-Supported Supply Chain Management System. Operational Research, 2, 17-34. https://doi.org/10.1007/BF02940119</mixed-citation></ref><ref id="scirp.121980-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Shah, R., Goldstein, S.M. and Ward, P.T. (2002) Aligning Supply Chain Management Characteristics and Interorganizational Information System Types: An Exploratory Study. IEEE Transactions on Engineering Management, 49, 282-292. https://doi.org/10.1109/TEM.2002.803382</mixed-citation></ref><ref id="scirp.121980-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Li, G.X., Ma, W.B. and Xia, G.E. (2021) Research on Logistics Demand Forecasting Model Based on Deep Learning. Chinese Journal of Systems Science, 29, 85-89.</mixed-citation></ref><ref id="scirp.121980-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Giri, C. and Chen, Y. (2022) Deep Learning for Demand Forecasting in the Fashion and Apparel Retail Industry. Forecasting, 4, 565-581. https://doi.org/10.3390/forecast4020031</mixed-citation></ref><ref id="scirp.121980-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Hamie, H., Hoayek, A. and Auer, H. (2021) Modeling Post-Liberalized European Gas Market Concentration—A Game Theory Perspective. Forecasting, 3, 1-16. https://doi.org/10.3390/forecast3010001</mixed-citation></ref><ref id="scirp.121980-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Chen, S.M. (2021) Online Forecasting Model of Supply Chain Demand Based on Incomplete Sales Information. Ph.D. Thesis, South China University of Technology, Guangzhou.</mixed-citation></ref><ref id="scirp.121980-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Liu, J.Y. (2020) Research on Key Technologies of LASSO Time Series Prediction and Recommendation System and Its Application in Supply Chain Management. Ph.D. Thesis, Shanghai Jiao Tong University, Shanghai.</mixed-citation></ref><ref id="scirp.121980-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Wu, W.D. (2021) Research and Implementation of Household Appliance Demand Forecast Based on Multi Model Fusion. Ph.D. Thesis, Southwest University, Chongqing.</mixed-citation></ref><ref id="scirp.121980-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Deng, Q. (2021) Research on Supply Chain Demand Forecast of L Company. Ph.D. Thesis, University of International Business and Economics, Beijing.</mixed-citation></ref><ref id="scirp.121980-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Xie, F. (2020) Research on Demand Forecast and Comprehensive Production Plan of BD Shanghai Company. Ph.D. Thesis, Shanghai University of Finance and Economics, Shanghai.</mixed-citation></ref><ref id="scirp.121980-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Z. (2020) Research on Demand Forecast and Inventory Management of P Company’s Clothing Products. Ph.D. Thesis, Donghua University, Shanghai.</mixed-citation></ref><ref id="scirp.121980-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Sheng, Z.G. (2014) The Application of Exponential Smoothing Method in the Forecast of the Demand for Product Oil Distribution in Sinopec Kunming Area. Ph.D. Thesis, Yunnan University, Kunming.</mixed-citation></ref><ref id="scirp.121980-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Chen, L.P. (2020) Research on the Application of Data Mining Technology in the Supply Chain Management of Operators. Ph.D. Thesis, Jilin University, Changchun.</mixed-citation></ref><ref id="scirp.121980-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Fang, K. (2020) Research on Demand Forecasting of Food Supply Chain Based on Machine Learning. Ph.D. Thesis, North China Electric Power University, Beijing.</mixed-citation></ref><ref id="scirp.121980-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, D.Q. (2020) Research on Supply Chain Demand Forecasting Model Based on Data Mining. Ph.D. Thesis, Huazhong University of Science and Technology, Wuhan.</mixed-citation></ref><ref id="scirp.121980-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Xiao, S.T. (2021) Research on Demand Forecast and Ordering Strategy of M Garment Enterprise. Ph.D. Thesis, Beijing Jiaotong University, Beijing.</mixed-citation></ref><ref id="scirp.121980-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Chen, T.Q. and Guestrin, C. (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, 13-17 August 2016, 785-794. https://doi.org/10.1145/2939672.2939785</mixed-citation></ref><ref id="scirp.121980-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Xue, J.K. (2020) Research and Application of a New Type of Swarm Intelligence Optimization Technology. Ph.D. Thesis, Donghua University, Shanghai.</mixed-citation></ref></ref-list></back></article>