<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJS</journal-id><journal-title-group><journal-title>Open Journal of Statistics</journal-title></journal-title-group><issn pub-type="epub">2161-718X</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojs.2018.82025</article-id><article-id pub-id-type="publisher-id">OJS-84196</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Comparison of Methods of Estimating Missing Values in Time Series
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>I.</surname><given-names>S. Iwueze</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>E.</surname><given-names>C. Nwogu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>V.</surname><given-names>U. Nlebedim</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>U.</surname><given-names>I. Nwosu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>U.</surname><given-names>E. Chinyem</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff2"><addr-line>School of Mathematics and Statistics, University of Sheffield, Sheffield, UK</addr-line></aff><aff id="aff1"><addr-line>Department of Statistics, Federal University of Technology, Owerri, Nigeria</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>uchenlebedim@yahoo.com(VUN)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>30</day><month>03</month><year>2018</year></pub-date><volume>08</volume><issue>02</issue><fpage>390</fpage><lpage>399</lpage><history><date date-type="received"><day>3,</day>	<month>August</month>	<year>2017</year></date><date date-type="rev-recd"><day>27,</day>	<month>April</month>	<year>2018</year>	</date><date date-type="accepted"><day>30,</day>	<month>April</month>	<year>2018</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  This paper proposes new methods of estimating missing values in time series data while comparing them with existing methods. The new methods are based on the row, column and overall averages of time series data arranged in a Buys-Ballot table with m rows and s columns. The methods assume that 
  1
  ) only one value is missing at a time, 
  2
  ) the trending curve may be linear, quadratic or exponential and 
  3
  ) the decomposition method is either Additive or Multiplicative. The performances of the methods are assessed by comparing accuracy measures (MAE, MAPE and RMSE) computed from the deviations of estimates of the missing values from the actual values used in simulation. Results show that, under the stated assumptions, estimates from the new method based on full decomposition of a series is the best (in terms of the accuracy measures) when compared with other two new and the existing methods.
 
</p></abstract><kwd-group><kwd>Missing Values</kwd><kwd> Buys-Ballot Table</kwd><kwd> Row and Column Averages</kwd><kwd> Row and Column Variances</kwd><kwd> Trend Parameters and Seasonal Indices</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>The analysis of time series data constitutes an important area of statistics especially in identifying the nature of the phenomenon represented by the sequence. However, missing observations in time series data are very common [<xref ref-type="bibr" rid="scirp.84196-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.84196-ref2">2</xref>] . This happens when an observation may not be made at a particular time, due to faulty equipment, lost records or a mistake, which cannot be rectified until later. When this happens, it is necessary to obtain estimates of the missing value for better understanding of the nature of the data and make possible a more accurate forecast [<xref ref-type="bibr" rid="scirp.84196-ref2">2</xref>] .</p><p>In time series analysis, a problem frequently encountered in data collection is a missing observation. Missing observations may be virtually impossible to obtain, either because of time or cost constraints. In order to obtain estimates of these observations, there are different options available to the researcher. One of the options is to replace them by the mean of the series. The missing observation may be replaced with naive forecast or with the average of the last two known observations that bound the missing observation [<xref ref-type="bibr" rid="scirp.84196-ref3">3</xref>] .</p><p>Using the Bode-Shannon representation of random processes and the “state-transition” method of analysis of dynamic systems, Kalman [<xref ref-type="bibr" rid="scirp.84196-ref4">4</xref>] worked on classical filtering and prediction in relation to missing values in time series. His work which hinged on state space modelling approach was later extended to observational error and missing observations [Jones [<xref ref-type="bibr" rid="scirp.84196-ref5">5</xref>] ]. Further on, Harvey and Pierse [<xref ref-type="bibr" rid="scirp.84196-ref6">6</xref>] highlighted on the relevance of state space modelling and Kalman filter to the problems of missing data in times series. Their work discussed the maximum likelihood estimation of the parameters in an ARIMA model under missing observations and the estimation of missing observations. Using the univariate form of the modified Kalman filter, Kohn and Ansley [<xref ref-type="bibr" rid="scirp.84196-ref7">7</xref>] defined and computed efficiently the marginal likelihood of an ARIMA model with missing observations. Their work showed light on how to predict by interpolating missing observations and obtaining the mean squared error of the estimates.</p><p>As the literature reveals, missing values in time series has attracted so much research attention. Several approaches to determine missing values like the use of ARIMA models as well as other techniques have continued to evolve. Among them are the optimal linear combination of the forecast and back forecast method [Damsleth [<xref ref-type="bibr" rid="scirp.84196-ref8">8</xref>] ], method for the estimation of models for discrete time series in the presence of missing data [Robinson and Dunsmuir [<xref ref-type="bibr" rid="scirp.84196-ref9">9</xref>] ], forecasting techniques to estimate missing observations in time series [Abraham [<xref ref-type="bibr" rid="scirp.84196-ref10">10</xref>] ], etc. A number of alternative procedures for estimating missing observations in stationary time series for autoregressive moving average models were provided by Ferreiro [<xref ref-type="bibr" rid="scirp.84196-ref11">11</xref>] . Sequel to these alternative procedures in stationary time series, Rosen and Porat [<xref ref-type="bibr" rid="scirp.84196-ref12">12</xref>] introduced the general formulae for the asymptotic second-order moments of the sample covariances, for missing values.</p><p>Another easily applicable spectral estimator for missing data is the method of Scargle [<xref ref-type="bibr" rid="scirp.84196-ref13">13</xref>] . This computes Fourier coefficients as the least squares fit of sines and cosines to the available remaining observations. The Lomb-Scargle spectrum is accurate in detecting strong spectral peaks but this assumption biases the description of slopes and background shapes in the spectrum according to Bos et al. [<xref ref-type="bibr" rid="scirp.84196-ref14">14</xref>] and Broersen et al. [<xref ref-type="bibr" rid="scirp.84196-ref15">15</xref>] .</p><p>Brockwell and Davis [<xref ref-type="bibr" rid="scirp.84196-ref16">16</xref>] gave the option that missing values at the beginning or the end of the time series are simply ignored while intermediate missing values are considered serious flaws in the input time series. It therefore, interpolates values using interpolation algorithms: linear, polynomial, smoothing, spline and filtering.</p><p>Yuan et al. [<xref ref-type="bibr" rid="scirp.84196-ref17">17</xref>] compared the Normal-distribution-based maximum likelihood (ML) and multiple imputation (MI) procedures for analyzing missing value data. The paper compared these two procedures with respect to bias and efficiency of parameter estimates. Their result showed that ML is preferable to MI in practice, although parameter estimates by MI might still be consistent.</p><p>Cheema [<xref ref-type="bibr" rid="scirp.84196-ref18">18</xref>] compared different missing data handling methods (listwise deletion, mean imputation, regression imputation, maximum likelihood imputation and multiple imputation) using different methods of analysis (one sample t-test, two-sample t-test, two-way ANOVA and multiple regression). These methods, according to him are the four analytical methods that are frequently employed in educational research. However, his result did not cover handling missing values in time series data.</p><p>Indeed, procedures have been developed by statisticians to mitigate problems caused by missing data and various estimation methods have been reportedly used by different researchers to replace missing values [<xref ref-type="bibr" rid="scirp.84196-ref19">19</xref>] [<xref ref-type="bibr" rid="scirp.84196-ref20">20</xref>] . Briefly, the methods include mean imputation, series mean, mean of nearby points, median of nearby points, linear interpolation and Regression Imputations.</p><p>In time series, it is assumed that the data consist of observations made sequentially in time; a systematic pattern (usually a set of identifiable components) and random noise (error). So, when some observations are missing it violets the condition for application of time series model. The systematic pattern includes the trend (denoted as T t ), seasonal (denoted as S t ) and the cyclical (denoted as C t ) components. The random noise (or error, irregular component) is denoted as I t or e t , where t stands for the particular point in time. These four classes of time series components may or may not coexist in real-life data. These components can adopt different specific functional relationship. They can be combined in an additive (additive seasonality) or a multiplicative (multiplicative seasonality) fashion and can as well take other forms such as pseudo-additive/mixed (combining the elements of both the additive and multiplicative models) model. The Additive model, Multiplicative model and Pseudo-Additive/Mixed Model are given in Equations (1.1)-(1.3) respectively:</p><p>X t = T t + S t + C t + I t ,           t = 1 , 2 , ⋯ , n (1.1)</p><p>X t = T t &#215; S t &#215; C t &#215; I t ,           t = 1 , 2 , ⋯ , n (1.2)</p><p>X t = T t &#215; S t &#215; C t + I t ,           t = 1 , 2 , ⋯ , n (1.3)</p><p>Cyclical variation which refers to the long term oscillation or swings about the trend appears to an appreciable magnitude only in long period sets of data. However, if short period of time are involved (which is true of all examples of this study), the cyclical component is superimposed into the trend [<xref ref-type="bibr" rid="scirp.84196-ref21">21</xref>] hence, the trend-cycle component is denoted by M t . In this case Equations (1.1)-(1.3), may respectively, be written as:</p><p>X t = M t + S t + I t ,           t = 1 , 2 , ⋯ , n (1.4)</p><p>X t = M t &#215; S t &#215; I t ,           t = 1 , 2 , ⋯ , n (1.5)</p><p>and</p><p>X t = M t &#215; S t + I t ,           t = 1 , 2 , ⋯ , n (1.6)</p><p>The pseudo-additive model is used when the original time series contains very small or zero values. However, this work will discuss only the additive and multiplicative models.</p><p>Missing values can lead to erroneous conclusions about data. Substitution of missing values may introduce inaccuracies. It can lead to false results, forecast and errors or data skews can proliferate across subsequent runs causing a larger cumulative error effect. Most analytical methods cannot be performed if there are missing values in the data. Furthermore, existing methods did not consider the model structure (i.e. whether Additive or Multiplicative models) and other trending curves beyond the linear (Quadratic, Exponential etc.). More so, the seasonal component of the time series data was not taken into consideration in developing estimation methods as can be assessed from literature. Therefore the ultimate objective of this study is to develop methods of estimating missing values which take into consideration the model structure and trending curve. The specific objectives are:</p><p>1) To review existing methods of estimating missing values.</p><p>2) Develop new methods of estimating missing values in time series</p><p>3) Assess the performance of the methods of estimating missing values.</p><p>4) Compare results from the existing methods of estimation of missing values with results from the new methods developed using simulated data.</p><p>Based on the results, recommendations are made.</p><p>The rationale for this study is to fill the gap in the existing methods of estimation of missing values, by providing analyst with a better method for the estimation of missing values irrespective of model structure and functional relationship.</p></sec><sec id="s2"><title>2. Methodology</title><sec id="s2_1"><title>2.1. Existing Methods of Estimating Missing Values</title><p>The new methods proposed in this study assumed that the series are arranged in a Buys Ballot <xref ref-type="table" rid="table">Table </xref>with m rows (periods) and s columns (seasons), for m &gt; s . Under this arrangement, the observation; X t made at time t is identified by the period i, ( i = 1 , 2 , ⋯ , m ) and season j, ( j = 1 , 2 , ⋯ , s ) and t becomes ( t = ( i − 1 ) s + j ). Thus, the observations in the i-th row (period) are X ( i − 1 ) s + 1 , X ( i − 1 ) s + 2 , ⋯ , X ( i − 1 ) s + s and the observations in the j-th column (season) are X j , X s + j , X 2 s + j , X ( i − 1 ) s + j , ⋯ , X ( m − 1 ) s + j For details of the Buys Ballot table see Iwueze and Nwogu [<xref ref-type="bibr" rid="scirp.84196-ref22">22</xref>] [<xref ref-type="bibr" rid="scirp.84196-ref23">23</xref>] and Iwueze et al. [<xref ref-type="bibr" rid="scirp.84196-ref24">24</xref>] . Therefore, for consistency, the existing methods have been presented using the Buys-Ballot format.</p><p>Some of the existing methods of estimating missing values in time series analysis are the Mean Imputation (MI), Series Mean (SM), Linear Interpolation (LI) and Regression Imputation (RI). Assuming an observation ( X ( i − 1 ) s + j ) is missing in the Buys-Ballot table at a point say t = ( i − 1 ) s + j , it is estimated using the different methods listed above as follows:</p><p>1) Mean Imputations (MI)</p><p>Mean imputation entails replacing the missing value with the mean of the values before the missing position. This is achieved by taking the summation of the values and dividing by the number of observation before the missing position.</p><p>MI = X ^ ( i − 1 ) s + j = 1 ( i − 1 ) s + j − 1 [ X 1 + X 2 + X 3 + ⋯ + X ( i − 1 ) s + j − 1 ] = 1 n * ∑ t = 1 n * X t (2.1)</p><p>where n * = ( i − 1 ) s + j − 1 is the number of observations preceding the missing observation.</p><p>2) Series Mean (SM)</p><p>Series mean estimates the missing value with the mean of the remaining series. Symbolically, the series mean is given by</p><p>SM = X ^ ( i − 1 ) s + j = T . . * n − 1 , (2.2)</p><p>where, n = m s and</p><p>T . . * = [ X 1 + X 2 + ⋯ + X ( i − 1 ) s + j − 1 + X ( i − 1 ) s + j + 1 + ⋯ + X m s ] (2.3)</p><p>3) Linear Interpolation (LI)</p><p>This method of linear interpolation for estimating missing values is given by</p><p>LI = X ^ ( i − 1 ) s + j = 1 2 ( X ( i − 1 ) s + j − 1 + X ( i − 1 ) s + j + 1 ) (2.4)</p><p>4) Regression Imputation (RI)</p><p>This method estimates the missing value by the estimate of the trend at the point of the missing value. Thus if the remaining values of the series are used to determine estimates of the trend parameters and the estimate of the missing value at ( i − 1 ) s + j is given as:</p><p>a) For Linear Trend</p><p>RI = X ^ ( i − 1 ) s + j = a ^ + b ^ [ ( i − 1 ) s + j ] (2.5)</p><p>b) For Quadratic Curve:</p><p>RI = X ^ ( i − 1 ) s + j = a ^ + b ^ [ ( i − 1 ) s + j ] + c ^ [ ( i − 1 ) s + j ] 2 (2.6)</p><p>c) For Exponential Curve:</p><p>RI = X ^ ( i − 1 ) s + j = b ^ e c ^ ( ( i − 1 ) s + j ) (2.7)</p></sec><sec id="s2_2"><title>2.2. New Methods of Estimating Missing Values</title><p>The new methods proposed in this work are the Row Mean Imputation, Column mean Imputation and Decomposition Without the Missing Value. The new methods are given as follows:</p><p>1) Row Mean Imputation (RMI)</p><p>The row mean imputation method computes the missing value as the mean of the remaining observations in the row (period) containing the missing value. Thus, the missing value is estimated by</p><p>RMI = X ^ ( i − 1 ) s + j = 1 s − 1 [ ∑ u = 1 j − 1 X ( i − 1 ) s + u + ∑ u = j + 1 s X ( i − 1 ) s + u ] (2.8)</p><p>2) Column Mean Imputation (CMI)</p><p>The columns mean imputation method computes estimate of the missing value as the mean of the remaining observations in the column (season) containing the missing value. Thus, the missing value is estimated as:</p><p>CMI = X ^ ( i − 1 ) s + j = 1 m − 1 [ ∑ u = 1 i − 1 X ( u − 1 ) s + j + ∑ u = i + 1 m X ( u − 1 ) s + j ] (2.9)</p><p>3) Decomposing Without the Missing Value (DWMV)</p><p>In this method, estimates of the trend parameters and seasonal indices obtained from the remaining observations using any of the methods of time series decomposition, are substituted into the expression for the missing value. Hence, the estimates of the missing values by this method are given by:</p><p>a) For Additive Model</p><p>X ^ ( i − 1 ) s + j = M ^ ( i − 1 ) s + j + S ^ j (2.10)</p><p>b) For the Multiplicative model.</p><p>X ^ ( i − 1 ) s + j = M ^ ( i − 1 ) s + j &#215; S ^ j (2.11)</p><p>The trend-cycle components of the DWMV method for the linear, quadratic and exponential curves are:</p><p>i) Linear Trend</p><p>M ^ ( i − 1 ) s + j = a ^ + b ^ [ ( i − 1 ) s + j ] (2.12)</p><p>ii) Quadratic Curve</p><p>M ^ ( i − 1 ) s + j = a ^ + b ^ [ ( i − 1 ) s + j ] + c ^ [ ( i − 1 ) s + j ] 2 (2.13)</p><p>iii) Exponential Curve</p><p>M ^ ( i − 1 ) s + j = b ^ e c ^ [ ( i − 1 ) s + j ] (2.14)</p></sec><sec id="s2_3"><title>2.3. Assessing Performance of the Methods</title><p>To assess the performance of our estimation methods, accuracy measures are computed from the deviations of the estimates of the missing values from the actual values. The deviations of X ^ ( i − 1 ) s + j from the Actual value X ( i − 1 ) s + j is given as:</p><p>e ^ ( i − 1 ) s + j = X ( i − 1 ) s + j − X ^ ( i − 1 ) s + j (2.15)</p><p>The accuracy measures discussed are: Mean Absolute Error (MAE), Mean Absolute percentage Error (MAPE) and Root Mean Square Error (RMSE), (Makridakis and Hibon, 1995). Given a data set of size n = ms, we considered one missing value at a time for different m 0 &lt; n positions, n &gt; 1. These accuracy measures are defined as follows:</p><p>1) Mean Absolute Error (MAE)</p><p>The MAE is defined as</p><p>MAE = [ 1 m 0 ∑ k = 1 m 0 | e k | ] (2.16)</p><p>2) Mean Absolute Percentage Error (MAPE)</p><p>The MAPE is defined as:</p><p>MAPE = [ 1 m 0 ∑ k = 1 m 0 | e k X k | ] &#215; 100 (2.17)</p><p>3) Root Mean Square Error (RMSE)</p><p>This is calculated as:</p><p>RMSE = 1 m 0 ∑ k = 1 m 0 e k 2 (2.18)</p></sec></sec><sec id="s3"><title>3. Empirical Examples</title><p>This section presents some empirical examples to illustrate the application of the methods of estimating missing values discussed in Section 2. The empirical example consists of both simulated and real life data. The simulated series used consists of 106 data sets of 120 observations each simulated from the Additive model: X t = M t + S t + e t , and Multiplicative model: X t = M t &#215; S t &#215; e t , using the MINITAB 16.0 version software. The trend-cycle component M t used are 1) Linear: M t = ( a + b t ) with a = 1 and b = 2.0, 2) Quadratic: M t = a + b t + c t 2 with a = 1, b = 2.0 and c = 3 and 3) Exponential: M t = b e c t with b = 10 and c = 0.02. In the Additive model, it is assumed that e t ~ N ( 0 , 1 ) , while in the Multiplicative model, it is assumed that e t ~ N ( 1 , σ 2 ) . The seasonal indices S j , j = 1 , 2 , ⋯ , 12 are as shown in <xref ref-type="table" rid="table">Table </xref>1. The real life example used is the monthly time series data on Airline Passengers for the period of twenty (20) years. The summary of the accuracy measures for the seven methods of estimating missing values considered are shown in Tables 2-4 for the selected trending curves.</p><p>The summary of accuracy measures for the simulated Additive and Multiplicative models shown in <xref ref-type="table" rid="table">Table </xref>2 and <xref ref-type="table" rid="table">Table </xref>3 respectively indicates that DWMV has the lowest values of the accuracy measures (MAE, MAPE and RMSE) for all</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table">Table </xref>1</label><caption><title> Seasonal indices used for simulation</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >j</th><th align="center" valign="middle" >1</th><th align="center" valign="middle" >2</th><th align="center" valign="middle" >3</th><th align="center" valign="middle" >4</th><th align="center" valign="middle" >5</th><th align="center" valign="middle" >6</th><th align="center" valign="middle" >7</th><th align="center" valign="middle" >8</th><th align="center" valign="middle" >9</th><th align="center" valign="middle" >10</th><th align="center" valign="middle" >11</th><th align="center" valign="middle" >12</th></tr></thead><tr><td align="center" valign="middle" >S<sub>j</sub> (Add.)</td><td align="center" valign="middle" >−0.89</td><td align="center" valign="middle" >−1.22</td><td align="center" valign="middle" >0.1</td><td align="center" valign="middle" >−0.15</td><td align="center" valign="middle" >−0.09</td><td align="center" valign="middle" >1.16</td><td align="center" valign="middle" >2.34</td><td align="center" valign="middle" >1.95</td><td align="center" valign="middle" >0.64</td><td align="center" valign="middle" >−0.73</td><td align="center" valign="middle" >−2.14</td><td align="center" valign="middle" >−0.97</td></tr><tr><td align="center" valign="middle" >S<sub>j</sub> (Mult.)</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.88</td><td align="center" valign="middle" >1</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >0.98</td><td align="center" valign="middle" >1.12</td><td align="center" valign="middle" >1.26</td><td align="center" valign="middle" >1.2</td><td align="center" valign="middle" >1.05</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.8</td><td align="center" valign="middle" >0.9</td></tr></tbody></table></table-wrap><p>Note: S<sub>j</sub><sub>(Add)</sub> is Seasonal indices for Additive model and S<sub>j</sub><sub>(Mult)</sub> are Seasonal indices for Multiplicative model.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table">Table </xref>2</label><caption><title> Summary result of estimation of missing value for additive model</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Trend Component</th><th align="center" valign="middle"  rowspan="2"  >Accuracy Measures</th><th align="center" valign="middle"  colspan="7"  >Estimation Method</th></tr></thead><tr><td align="center" valign="middle" >MI</td><td align="center" valign="middle" >SM</td><td align="center" valign="middle" >LI</td><td align="center" valign="middle" >RI</td><td align="center" valign="middle" >CMI</td><td align="center" valign="middle" >RMI</td><td align="center" valign="middle" >DWMV</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Linear</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >67.03</td><td align="center" valign="middle" >62.29</td><td align="center" valign="middle" >7.07</td><td align="center" valign="middle" >11.26</td><td align="center" valign="middle" >63.96</td><td align="center" valign="middle" >11.89</td><td align="center" valign="middle" >2.59</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >48.84</td><td align="center" valign="middle" >99.28</td><td align="center" valign="middle" >5.79</td><td align="center" valign="middle" >11.39</td><td align="center" valign="middle" >101.56</td><td align="center" valign="middle" >11.71</td><td align="center" valign="middle" >2.24</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >76.77</td><td align="center" valign="middle" >68.53</td><td align="center" valign="middle" >9.70</td><td align="center" valign="middle" >14.59</td><td align="center" valign="middle" >73.46</td><td align="center" valign="middle" >16.58</td><td align="center" valign="middle" >3.29</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Quadratic</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >10,335.21</td><td align="center" valign="middle" >10,591.89</td><td align="center" valign="middle" >819.45</td><td align="center" valign="middle" >982.11</td><td align="center" valign="middle" >8561.86</td><td align="center" valign="middle" >851.07</td><td align="center" valign="middle" >386.83</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >66.44</td><td align="center" valign="middle" >524.74</td><td align="center" valign="middle" >5.66</td><td align="center" valign="middle" >10.75</td><td align="center" valign="middle" >437.68</td><td align="center" valign="middle" >17.55</td><td align="center" valign="middle" >2.74</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >13,185.28</td><td align="center" valign="middle" >12,226.39</td><td align="center" valign="middle" >1221.91</td><td align="center" valign="middle" >1387.31</td><td align="center" valign="middle" >10171.44</td><td align="center" valign="middle" >1114.69</td><td align="center" valign="middle" >585.70</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Exponential</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >21.61</td><td align="center" valign="middle" >22.07</td><td align="center" valign="middle" >2.37</td><td align="center" valign="middle" >3.43</td><td align="center" valign="middle" >20.74</td><td align="center" valign="middle" >4.24</td><td align="center" valign="middle" >1.03</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >39.03</td><td align="center" valign="middle" >88.17</td><td align="center" valign="middle" >5.87</td><td align="center" valign="middle" >11.20</td><td align="center" valign="middle" >75.78</td><td align="center" valign="middle" >10.62</td><td align="center" valign="middle" >2.18</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >28.03</td><td align="center" valign="middle" >26.09</td><td align="center" valign="middle" >3.17</td><td align="center" valign="middle" >4.37</td><td align="center" valign="middle" >24.03</td><td align="center" valign="middle" >5.25</td><td align="center" valign="middle" >1.54</td></tr></tbody></table></table-wrap><table-wrap id="table3" ><label><xref ref-type="table" rid="table">Table </xref>3</label><caption><title> Summary result of estimation of missing value for multiplicative model</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Trend Component</th><th align="center" valign="middle"  rowspan="2"  >Accuracy Measures</th><th align="center" valign="middle"  colspan="7"  >Estimation Method</th></tr></thead><tr><td align="center" valign="middle" >MI</td><td align="center" valign="middle" >SM</td><td align="center" valign="middle" >LI</td><td align="center" valign="middle" >RI</td><td align="center" valign="middle" >CMI</td><td align="center" valign="middle" >RMI</td><td align="center" valign="middle" >DWMV</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Linear</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >61.43</td><td align="center" valign="middle" >55.92</td><td align="center" valign="middle" >1.25</td><td align="center" valign="middle" >36.17</td><td align="center" valign="middle" >63.95784</td><td align="center" valign="middle" >5.18</td><td align="center" valign="middle" >1.04</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >48.65</td><td align="center" valign="middle" >87.54</td><td align="center" valign="middle" >1.37</td><td align="center" valign="middle" >29.48</td><td align="center" valign="middle" >105.4748</td><td align="center" valign="middle" >8.79</td><td align="center" valign="middle" >1.35</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >69.55</td><td align="center" valign="middle" >64.69</td><td align="center" valign="middle" >1.54</td><td align="center" valign="middle" >39.91</td><td align="center" valign="middle" >74.2791</td><td align="center" valign="middle" >5.87</td><td align="center" valign="middle" >1.14</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Quadratic</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >9653.18</td><td align="center" valign="middle" >10,511.27</td><td align="center" valign="middle" >3.12</td><td align="center" valign="middle" >1.47</td><td align="center" valign="middle" >18,576.20</td><td align="center" valign="middle" >851.07</td><td align="center" valign="middle" >1.07</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >66.76</td><td align="center" valign="middle" >465.10</td><td align="center" valign="middle" >0.13</td><td align="center" valign="middle" >0.07</td><td align="center" valign="middle" >414.62</td><td align="center" valign="middle" >15.81</td><td align="center" valign="middle" >0.04</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >12,547.42</td><td align="center" valign="middle" >12,120.50</td><td align="center" valign="middle" >3.48</td><td align="center" valign="middle" >1.87</td><td align="center" valign="middle" >388.72</td><td align="center" valign="middle" >1114.69</td><td align="center" valign="middle" >1.18</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Exponential</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >20.03</td><td align="center" valign="middle" >22.05</td><td align="center" valign="middle" >1.25</td><td align="center" valign="middle" >1.42</td><td align="center" valign="middle" >18.47</td><td align="center" valign="middle" >2.25</td><td align="center" valign="middle" >0.95</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >36.87</td><td align="center" valign="middle" >87.12</td><td align="center" valign="middle" >4.57</td><td align="center" valign="middle" >6.67</td><td align="center" valign="middle" >71.05</td><td align="center" valign="middle" >6.18</td><td align="center" valign="middle" >3.49</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >26.81</td><td align="center" valign="middle" >26.14</td><td align="center" valign="middle" >1.54</td><td align="center" valign="middle" >1.82</td><td align="center" valign="middle" >21.62</td><td align="center" valign="middle" >3.10</td><td align="center" valign="middle" >1.08</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table">Table </xref>4</label><caption><title> Summary result of estimation of missing value using the airline passenger data</title></caption><table><tbody><thead><tr><th align="center" valign="middle"  rowspan="2"  >Trend Component</th><th align="center" valign="middle"  rowspan="2"  >Accuracy Measures</th><th align="center" valign="middle"  colspan="7"  >Estimation Method</th></tr></thead><tr><td align="center" valign="middle" >MI</td><td align="center" valign="middle" >SM</td><td align="center" valign="middle" >LI</td><td align="center" valign="middle" >RI</td><td align="center" valign="middle" >CMI</td><td align="center" valign="middle" >RMI</td><td align="center" valign="middle" >DWMV</td></tr><tr><td align="center" valign="middle"  rowspan="3"  >Linear</td><td align="center" valign="middle" >MAE</td><td align="center" valign="middle" >83.43</td><td align="center" valign="middle" >77.63</td><td align="center" valign="middle" >15.05</td><td align="center" valign="middle" >17.92</td><td align="center" valign="middle" >72.03</td><td align="center" valign="middle" >26.75</td><td align="center" valign="middle" >7.56</td></tr><tr><td align="center" valign="middle" >MAPE</td><td align="center" valign="middle" >29.22</td><td align="center" valign="middle" >43.30</td><td align="center" valign="middle" >6.69</td><td align="center" valign="middle" >9.18</td><td align="center" valign="middle" >31.51</td><td align="center" valign="middle" >12.95</td><td align="center" valign="middle" >4.32</td></tr><tr><td align="center" valign="middle" >RMSE</td><td align="center" valign="middle" >96.91</td><td align="center" valign="middle" >94.21</td><td align="center" valign="middle" >17.92</td><td align="center" valign="middle" >22.34</td><td align="center" valign="middle" >67.89342</td><td align="center" valign="middle" >32.69</td><td align="center" valign="middle" >10.49</td></tr></tbody></table></table-wrap><p>the selected trending curves, followed by the LI. Each estimation method (in comparison with the others, using MAE, MAPE and RMSE) were consistent in their performance without being prone to minimal variations in the 106 data sets simulated for this study. This implies that the DWMV method of estimation of missing values yielded best (in terms of the accuracy measures) among other methods investigated in this work. This impressive observation may be attributable to the fact that DWMV combines the effects of both the trending curves and seasonal effect in estimating the missing values. The information that DWMV takes into account seasonality of the missing value is supported by literature. For the real life data, the DWMV method also out-performed the other methods of estimation of missing values even as the assumption of normal distribution of error terms is not met in real life data.</p></sec><sec id="s4"><title>4. Concluding Remark</title><p>The results of the analysis indicate that for all trending curves and both model structures, DWMV yielded best (in terms of the accuracy measures) estimates of the missing values when compared with both the existing methods and the two other new proposed methods (RMI and CMI). This is perhaps, because DWMV combines the effects of both the trending curves and the seasonal indices unlike the other methods. Cheema [<xref ref-type="bibr" rid="scirp.84196-ref18">18</xref>] also observed that multiple regression imputation method of handling missing data performed well when the analytical method was multiple regression because using regression-imputed data in a regression equation was like fitting a regression equation twice to predict the same dependent variable.</p><p>In view of this, it is recommended that the DWMV method be used in estimating missing values in time series analysis when one observation is missing at a time until further studies proves otherwise. It is also recommended that this study be extended to cases where more than one point data are missing at a time and to examine the effects of different sample sizes and distributions on the estimation of missing values.</p></sec><sec id="s5"><title>Cite this paper</title><p>Iwueze, I.S., Nwogu, E.C., Nlebedim, V.U., Nwosu, U.I. and Chinyem, U.E. (2018) Comparison of Methods of Estimating Missing Values in Time Series. Open Journal of Statistics, 8, 390-399. https://doi.org/10.4236/ojs.2018.82025</p></sec></body><back><ref-list><title>References</title><ref id="scirp.84196-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">David, S.C.F. (2006) Methods for the Estimation of Missing Values in Time Series. Cowan University Press, Western Australia.</mixed-citation></ref><ref id="scirp.84196-ref2"><label>2</label><mixed-citation publication-type="book" xlink:type="simple">Howell, D.C. (2007) The Analysis of Missing Data. In: Outhwaite, W. and Turner, S., Eds., Handbook of Social Science Methodology, Sage, London.  
https://doi.org/10.4135/9781848607958.n11</mixed-citation></ref><ref id="scirp.84196-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Almed, M.R. and Al-Khazaleh, A.M.H. (2008) Estimation of Missing Data by Using the Filtering Process in a Time Series Modeling.</mixed-citation></ref><ref id="scirp.84196-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Kalman, R.E. (1960) A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering, 81, 35-45. https://doi.org/10.1115/1.3662552</mixed-citation></ref><ref id="scirp.84196-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Jones, R.H. (1980) Maximum Likelihood Fitting of ARMA Models to Time Series with Missing Observations. Technimetrics, 22, 389-395.  
https://doi.org/10.1080/00401706.1980.10486171</mixed-citation></ref><ref id="scirp.84196-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Harvey, A.C. and Pierse, R.G. (1984) Estimating Missing Observations in Economic Time Series. Journal of the American Statistical Association, 79, 125-131.  
https://doi.org/10.1080/01621459.1984.10477074</mixed-citation></ref><ref id="scirp.84196-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Kohn, R. and Ansley, C.F. (1986) Estimation, Prediction, and Interpolation for ARIMA Models with Missing Data. Journal of the American Statistical Association, 81, 751-761. https://doi.org/10.1080/01621459.1986.10478332</mixed-citation></ref><ref id="scirp.84196-ref8"><label>8</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Damsleth</surname><given-names> E. </given-names></name>,<etal>et al</etal>. (<year>1979</year>)<article-title>Interpolating Missing Values in a Time Series</article-title><source> Scand Journal of Statistics</source><volume> 7</volume>,<fpage> 33</fpage>-<lpage>39</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.84196-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Robinson, P.M. and Dunsmuir, W. (1981) Estimation of Time Series Models in the Presence of Missing Data. Journal of the American Statistical Association, 76, 560-568. https://doi.org/10.1080/01621459.1981.10477687</mixed-citation></ref><ref id="scirp.84196-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Abraham, B. (1981) Missing Observations in Time Series. Communications in Statistics Theory A, 10, 1643-1653. https://doi.org/10.1080/03610928108828138</mixed-citation></ref><ref id="scirp.84196-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Ferreiro, O. (1987) Methodologies for the Estimation of Missing Observations in Time Series. Statistics and Probability Letters, 5, 65-69.  
https://doi.org/10.1016/0167-7152(87)90028-9</mixed-citation></ref><ref id="scirp.84196-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Rosen, Y. and Porat, B. (1989) Optimal ARMA Parameter Estimation Based on the Sample Covariances for Data with Missing Observations. IEEE Transactions on Information Theory, 35, 342-349. https://doi.org/10.1109/18.32128</mixed-citation></ref><ref id="scirp.84196-ref13"><label>13</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Scargle</surname><given-names> J.D. </given-names></name>,<etal>et al</etal>. (<year>1982</year>)<article-title>Studies in Astronomical Time Series Analysis II. Statistical Aspects of Spectral Analysis of Unevently Spaced Data</article-title><source> Astrophysics Journal</source><volume> 263</volume>,<fpage> 836</fpage>-<lpage>853</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.84196-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Bos, R., De walele, S. and Broersen, M.T.P. (2002) Autoregressive Spectral Estimates by Application of the Burg Algorithm to Irregularly Sampled Data. IEEE Transaction on Instrument and Measurement, 51, 1289-1294.  
https://doi.org/10.1109/TIM.2002.808031</mixed-citation></ref><ref id="scirp.84196-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Broersen, M.T.P., De Waele, S. and Bos, R. (2004) Autoregressive Spectral Analysis When Observation Are Missing. Automatica, 40, 1495-1504.  
https://doi.org/10.1016/j.automatica.2004.04.011</mixed-citation></ref><ref id="scirp.84196-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Brockwell, P.J. and Davis, R.A. (1991) Time Series: Theory and Methods. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4419-0320-4</mixed-citation></ref><ref id="scirp.84196-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Yuan, K.H., Yang-Wallentin, F. and Bentler, P.M. (2012) Maximum Likelihood versus Multiple Imputation for Missing Data with Violation of Distribution Condition. Sociological Methods &amp; Research, 41, 598-629.</mixed-citation></ref><ref id="scirp.84196-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Cheema, J.R. (2014) Some General Guidelines for Choosing Missing Data Handling Methods in Educational Research. Journal of Modern Applied Statistical Methods, 13, Article 3. https://doi.org/10.22237/jmasm/1414814520</mixed-citation></ref><ref id="scirp.84196-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Schafer, J.L. (1997) Analyses of Incomplete Multivariate Data. Chapman and Hall, New York. https://doi.org/10.1201/9781439821862</mixed-citation></ref><ref id="scirp.84196-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Witta, E.L. (2000) Effectiveness of Four Methods of Handling Missing Data Using Samples from a National Database. Paper Presented at the Annual Meeting of the American Educational Research Association, ERIC Document Reproduction Service No. ED 442 810, New Orleans, LA.</mixed-citation></ref><ref id="scirp.84196-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Chatfield, C. (2004) The Analysis of Time Series: An Introduction. 6th Edition, Chapman and Hall, London.</mixed-citation></ref><ref id="scirp.84196-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Iwueze, I.S. and Nwogu, E.C. (2004) Buys-Ballot Estimates for Time Series Decomposition. Global Journal of Mathematics, 3, 83-98.</mixed-citation></ref><ref id="scirp.84196-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Iwueze, I.S and Nwogu, E.C. (2005) Buys-Ballot Estimates for Exponential and S-Shaped Curves, for Time Series. Journal of the Nigerian Association of Mathematical Physics, 9, 357-366.</mixed-citation></ref><ref id="scirp.84196-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Iwueze, S.I., Nwogu, E.C., Ohakwe, J. and Ajaraogu, J.C. (2011) Uses of the Buys-Ballot Table in Time Series Analysis. Applied Mathematics, 2, 633-645.  
https://doi.org/10.4236/am.2011.25084</mixed-citation></ref></ref-list></back></article>