<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">OJMS</journal-id><journal-title-group><journal-title>Open Journal of Marine Science</journal-title></journal-title-group><issn pub-type="epub">2161-7384</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/ojms.2014.41004</article-id><article-id pub-id-type="publisher-id">OJMS-42005</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Earth&amp;Environmental Sciences</subject></subj-group></article-categories><title-group><article-title>
 
 
  Modeling Ocean Chlorophyll Distributions by Penalizing the Blending Technique
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>athias</surname><given-names>A. Onabid</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Simon</surname><given-names>Wood</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Maths-Computer Science, Faculty of Sciences, University of Dschang, Dschang, Cameroon</addr-line></aff><aff id="aff2"><addr-line>Department of Mathematical Sciences, University of Bath, Bath, UK</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>mathakong@yahoo.fr(AAO)</email>;<email>s.wood@bath.ac.uk(SW)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>12</day><month>12</month><year>2013</year></pub-date><volume>04</volume><issue>01</issue><fpage>25</fpage><lpage>30</lpage><history><date date-type="received"><day>September</day>	<month>6,</month>	<year>2013</year></date><date date-type="rev-recd"><day>October</day>	<month>19,</month>	<year>2013</year>	</date><date date-type="accepted"><day>November</day>	<month>12,</month>	<year>2013</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
   <b>Disparities between the in situ and satellite values at the positions where in situ values are obtained have been the main handicap to the smooth modeling of the distribution of ocean chlorophyll. The blending technique and the thin plate regression spline have so far been the main methods used in an attempt to calibrate ocean chlorophyll at positions where the in situ field could not provide value. In this paper, a combination of the two techniques has been used in order to provide improved and reliable estimates from the satellite field. The thin plate regression spline is applied to the blending technique by imposing a penalty on the differences between the satellite and in situ fields at positions where they both have observations. The objective of maximizing the use of the satellite field for prediction was outstanding in a validation study where the penalized blending method showed a re</b><b>markable improvement in its estimation potentials. It is hoped that most analysis on primary productivity and management in the ocean environment will be greatly affected by this result, since chlorophyll is one of the most important components in the formation of the ocean life cycle.</b>  
    
 
</p></abstract><kwd-group><kwd>&lt;i&gt;In Situ&lt;/i&gt;</kwd><kwd> Satellite; Ship and Buoy; Penalized Regression Spline; Penalty; Penalized Blending</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>A detailed study of the ocean environment and its constituent elements are of utmost importance in guiding decision-makers on policies regarding marine activities such as fishing and their consequences for human life and society as a whole. In the ocean food chain, phytoplankton, which are found in the upper layer of the ocean, are of extreme importance. Indeed, aquatic life and production revolve about the distribution and biomass of these unicellular algae. Thus, to better understand the ocean food chain, it is necessary to track their existence and monitor their population distribution in the ocean environment. To measure their population by cell counts is very difficult, because of their resemblance to other non-algae carbon rich particles. An alternative method of doing this is in terms of their photosynthetic pigment content, chlorophyll, which is endemic across all taxonomic groups of algae [<xref ref-type="bibr" rid="scirp.42005-ref1">1</xref>]. In fact, an appealing method of estimating primary productivity in the ocean is determined by the concentration of ocean chlorophyll [<xref ref-type="bibr" rid="scirp.42005-ref2">2</xref>] and also emphasized by [<xref ref-type="bibr" rid="scirp.42005-ref3">3</xref>]. Therefore, to better monitor and predict the abundance of this phytoplankton, it is important that the distribution of chlorophyll concentration in this environment be determined as accurately as possible. The blending technique described by [<xref ref-type="bibr" rid="scirp.42005-ref4">4</xref>] was successfully used to analyze sea surface temperature [<xref ref-type="bibr" rid="scirp.42005-ref5">5</xref>]. The pioneers of the use of this technique in the calibration of ocean chlorophyll expressed the need for further work to be done in order to improve ocean chlorophyll predictions in areas where observations could not be obtained by ship and buoy [<xref ref-type="bibr" rid="scirp.42005-ref6">6</xref>]. One problem faced when using the technique in ocean chlorophyll calibration is distortion of the blended field as one approaches the coastal land. This distortion is due to the sparseness of data obtained by ship and buoy (in situ) and the noisiness of the satellite field [<xref ref-type="bibr" rid="scirp.42005-ref7">7</xref>]. These factors have been the main handicap to the smooth calibration of ocean chlorophyll estimates from the satellite observations. The penalized regression splines are a technique that could be used to model noisy data [<xref ref-type="bibr" rid="scirp.42005-ref8">8</xref>]. The use of this statistical technique in the calibration of ocean chlorophyll was also suggested by [<xref ref-type="bibr" rid="scirp.42005-ref1">1</xref>].</p><p>The objective of this article, therefore, is to demonstrate how the principles of penalized regression could be applied to the blending process in order to obtain better estimate of ocean chlorophyll from the satellite data field. The approach would mainly address the noisiness of the data fields by introducing a penalty on the differences between their observations at positions where both fields have values. The belief is that, by penalizing the differences between the satellite and the in situ fields, the satellite will become closer to the in situ field and can thus be used to sufficiently estimate ocean chlorophyll values at positions where the in situ field could not provide values. Since the process of penalization involves smoothing, the efficiency of the technique will depend on the choices of the smoothing parameters.</p><p>Inspiration to this was drawn from the interpolation equation</p><disp-formula id="scirp.42005-formula91499"><label>(1)</label><graphic position="anchor" xlink:href="4-1470104x\a079f7f5-0502-4663-bb0c-383ba72d4f78.jpg"  xlink:type="simple"/></disp-formula><p>found in Onabid (2011) in the section dealing with the proof as to why results from the corrector factor and the smooth in-fill methods should coincide. From this equation, the term of interest is</p><p><img src="4-1470104x\0b7b491d-b0f5-4d8c-96f2-cd9c89b4f7df.jpg" /></p><p>which is the sum of the solution to the partial differential equation obtained at each boundary point k where there is a difference between the satellite and in situ value. In order to penalize these differences, the interpolation equation (1) had to be represented using basis functions. Consider the equation</p><p><img src="4-1470104x\afa00f4e-78dc-4762-86fc-1dea07272fa5.jpg" /></p><p>where <img src="4-1470104x\014a677b-9944-44a3-9131-e0f6b1bb6078.jpg" /> is actually the solution to</p><disp-formula id="scirp.42005-formula91500"><label>(3)</label><graphic position="anchor" xlink:href="4-1470104x\01dc879c-6c79-4daf-9acd-d41785869f7e.jpg"  xlink:type="simple"/></disp-formula><p>subject to the boundary conditions</p><p><img src="4-1470104x\1cd31c1b-6cc2-455b-b79c-63015777a148.jpg" />.</p><p>This equation (2) can be re-written with each of the <img src="4-1470104x\69673c85-123c-4a8a-b345-58ecf63833ce.jpg" /> separately as</p><disp-formula id="scirp.42005-formula91501"><label>(4)</label><graphic position="anchor" xlink:href="4-1470104x\8acdd2d4-db7c-46df-996a-0587c5e8ee73.jpg"  xlink:type="simple"/></disp-formula><p>where <img src="4-1470104x\81818e09-2327-471b-8928-ebec7029928e.jpg" /> is set to the difference between the in situ and satellite values at boundary point k and Δ<sub>k</sub>(x; y) representing the basis function is the solution to</p><disp-formula id="scirp.42005-formula91502"><label>(5)</label><graphic position="anchor" xlink:href="4-1470104x\abbb48d6-b7d7-4774-a8ef-7d25f82be502.jpg"  xlink:type="simple"/></disp-formula><p>with external boundary points set to zero and the internal boundary points set to zero everywhere except at the k<sup>th</sup> position where it is set to 1.0, that is the knot of the basis.</p><p>What this means is that, for each internal boundary point (knot), the blending process is performed to estimate the entire blended field with that particular boundary point acting as the only boundary point for the process. During the process, the value of this boundary point equals 1.0 and the resulting field is the basis for this knot. The blended field corresponding to this particular knot is obtained by multiplying the original knot value with its basis. Blended fields obtained from each of the knots are summed up. This sum is then added to the satellite field to obtain the final blended field which we call the basis blend.</p></sec><sec id="s2"><title>2. Penalizing the Blending Process</title><p>For the penalized regression spline to be applied, it was necessary to represent the term of interest in the blending process as a regression equation.</p>Representing Blending as a Regression Equation<p>Considering the equation (4) which is the interpolation form of the blending process represented using basis functions, also consider the fact that the objective is to control the differences between the satellite and the in situ fields, it is obvious that focus here should be on the term</p><p><img src="4-1470104x\5477eb4f-e57e-4349-9459-86dedb3d6dc8.jpg" /></p><p>from where the <img src="4-1470104x\6cc8f3de-0f1c-40ed-ad8b-2f79cd6e9183.jpg" />ss′ could be estimated by penalized least squares in order to minimize the effect of these differences and consequently maximize the use of the satellite field as estimate to ocean chlorophyll at points where in situ could not provide observations Let</p><p><img src="4-1470104x\97419af0-3bc1-4790-8ab5-a3e12c6345f4.jpg" /></p><p>be calculated for each point it K, where satellite and in situ have observations. This can be written as a regression equation of the form,</p><disp-formula id="scirp.42005-formula91503"><label>(6)</label><graphic position="anchor" xlink:href="4-1470104x\9b00ed9b-896f-4528-8d94-3f7e0cb90996.jpg"  xlink:type="simple"/></disp-formula><p>where the <img src="4-1470104x\8c61d8fd-a211-402a-8523-e05721846241.jpg" />s′ are unknown parameters to be estimated and <img src="4-1470104x\33c61a61-d609-4b81-85c2-d7ee5689d97c.jpg" /> the error term. This expression is equal to</p><p><img src="4-1470104x\3b209276-0b60-4d4f-bc29-a50b40b04165.jpg" /></p><p>Thus if Z<sub>k</sub> is expressed using the basis space, one obtains this model;</p><p><img src="4-1470104x\52b45d61-b663-4610-b7f4-4ca1a75154fc.jpg" /></p><p>Fitting this model by least squares will simply result in the interpolation scheme since there is exactly one parameter per datum, thus nonparametric techniques were then explored. The thin plate regression spline was then used to introduce a penalty to this blending regression equation.</p></sec><sec id="s3"><title>3. Penalizing the Blending Regression Equation</title><p>From Equation (6), the control of the smoothness of the differences can be achieved by either altering the basis dimension, that is changing the number of selected knots or keeping the basis dimension fixed and then adding a penalty term to the least squares objective. The later was used. Therefore the penalized least squares objective will be to minimize</p><disp-formula id="scirp.42005-formula91504"><label>(7)</label><graphic position="anchor" xlink:href="4-1470104x\0433115a-598b-4102-9739-e22a6f6d2baa.jpg"  xlink:type="simple"/></disp-formula><p>where <img src="4-1470104x\a1b6a0f4-c7cd-4142-ac0e-7d3c6c36acb3.jpg" /> is a penalty function which penalizes model wiggliness while model smoothness is controlled by the smoothing parameter<img src="4-1470104x\cfeda4a8-4a60-464f-b23e-3a193dcb37c9.jpg" />, as described by [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>]. As a first step in estimating the penalized least squares objective, the simple penalized least squares technique of ridge regression was used. In this process, the intention is to penalize each of the parameters separately by introducing a penalty to each of the estimated parameters. Following this method, the penalized least squares objective will be to minimize</p><disp-formula id="scirp.42005-formula91505"><label>(8)</label><graphic position="anchor" xlink:href="4-1470104x\e62b79b6-7b15-4b73-a7c5-1f1f24f9e617.jpg"  xlink:type="simple"/></disp-formula><p>with respect to <img src="4-1470104x\395bd6d5-97ca-48fc-a0be-e767d4ef5db0.jpg" />s′: The penalty is represented by the term <img src="4-1470104x\447364b8-30db-4fcf-8db1-0959d49476dc.jpg" /> with <img src="4-1470104x\072719cf-6c75-4c28-91da-07c907be499e.jpg" /> being the smoothing parameter to control the trade off between model fit and model smoothness. Thus the problem of estimating the degree of smoothness of the model is now the problem of estimating the smoothing parameter<img src="4-1470104x\6c9e46c8-a345-4b9d-9520-4c9267325d6e.jpg" />.</p><p>Assuming that the smoothing parameter is given, how then can the <img src="4-1470104x\b814c99e-15f8-4d3b-989c-0f703f618d71.jpg" />s′ be estimated in this penalized least squares objective?</p><p>From equation (8), the term Δ<sub>k</sub>(x<sub>k</sub>; y<sub>k</sub>) reduces to a n&#215;n identity matrix. Now, define an augmented Z, say Z; as <img src="4-1470104x\4e6f2ee4-1ee3-4d98-9523-040a7e06a195.jpg" /> (with n zeroes) which can also be augmented directly in the objective.</p><p>When this is done, equation (8) could now be written as</p><p><img src="4-1470104x\df3591bf-cc7f-47e6-b1f2-463cf824b71f.jpg" /></p><p>From here, <img src="4-1470104x\a9819aa9-0b31-45d6-a245-9ae1870a067f.jpg" />can be calculated as follows:</p><p><img src="4-1470104x\d0e13ee0-5a3b-4c49-b0a0-15e2e42fed29.jpg" /></p><p>with<img src="4-1470104x\5f0b928f-97fe-411e-a62d-22c4a3137da5.jpg" />; this implies that,</p><p><img src="4-1470104x\43f71d12-0dfe-4055-8b05-0e44be9f0c68.jpg" /></p><sec id="s3_1"><title>3.1. Choosing How Much to Smooth</title><p>This refers to the selection of the smoothing parameter<img src="4-1470104x\c30ac2ca-b087-4d27-9afa-eefeb3e0861a.jpg" />. This must be done with care such that the selected value should be suitable, so much so that if the true smooth function is f the estimated smooth function<img src="4-1470104x\801109eb-e4b5-46bc-8884-c6d58255ea8b.jpg" />, should be as close as possible to it. The reason being if <img src="4-1470104x\71584695-3cfd-4406-b16b-cf021825c68f.jpg" /> is too high, the data will be over-smoothed and if it is too low, the data will be under-smoothed hence the resulting estimate will not be close to the true function. The aim as described by [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>] will be to select a <img src="4-1470104x\fd0a3e68-62a9-45cd-8c52-60734c32fbea.jpg" /> which will minimize the difference between <img src="4-1470104x\e7b3dd15-2969-4993-adf5-15c2d052f2c2.jpg" /> and f that is to say if M is the difference, then <img src="4-1470104x\349c0587-c3b9-4634-8566-7aaa202bccfc.jpg" /> should minimize</p><p><img src="4-1470104x\f0f9b7d6-8483-4f77-a34c-fbd2318a1d45.jpg" /></p><p>This could have been easier if the true values for f existed already. Because this is not the case, the problem was approached by deriving estimates of M plus some variation. This was achieved by making use of the ordinary cross validation (OCV) technique. In this technique, a model is fitted to the rest of the data, when a datum is left out. The squared difference between the datum and its predicted value from the fitted model is calculated. This is done for all the points and the mean taken over all the data. Thus the ordinary cross validation criterion is written as</p><p><img src="4-1470104x\d674f89e-2140-4919-a1a1-fccb05c7bbf1.jpg" /></p><p>where <img src="4-1470104x\22add3f2-3ce3-4285-9abe-1144bd75c7fc.jpg" /> is the estimate from the model fitted to all data except Z<sub>i</sub>. The idea of calculating V<sub>0 </sub>each time leaving out a datum has been proven not to be efficient as described by [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>]. It can be shown that</p><p><img src="4-1470104x\66988aee-06d4-4d53-aa75-5938803a984b.jpg" /></p><p>where <img src="4-1470104x\a93eab5c-0717-4ac7-a37a-02288bbb4d99.jpg" /> is the estimate from fitting to all the data and A is the corresponding influence matrix.</p><p>[<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>] Emphasizes the fact that though OCV is a reasonable way of estimating smoothing parameters, it has the drawbacks of being computationally expensive to minimize in the case of additive models where there could be many smoothing parameters and secondly it has a slightly disturbing lack of invariance. Thus in practice, the weights 1−A<sub>ii</sub> are often replaced by the mean weight tr(I − A)/n in order to arrive at the generalized cross validation (gcv) score given as</p><p><img src="4-1470104x\cc81b809-b00a-4887-b601-37f87c94e6fb.jpg" /></p><p>This has the computational advantage over OCV and can also be argued to have some advantages in terms of invariance. Therefore, an easy way to look for the best smoothing parameter would be to search through a sequence of<img src="4-1470104x\4c161546-36d6-4256-85bf-81b10cc26223.jpg" />′s, each time fitting a penalized regression model with the new <img src="4-1470104x\a5c8e0fa-d6eb-4eda-a2ea-8cde17a8d005.jpg" /> value and calculating the gcv score. At the end, the <img src="4-1470104x\1cb86772-da3e-46f7-9740-17289f985a82.jpg" /> value corresponding to the lowest gcv score will be the optimal smoothing parameter.</p></sec><sec id="s3_2"><title>3.2. Calculating the gcv Score</title><p>Amongst the techniques of ridge regression, integrated least squares, integrated squared derivatives and efficient method used in computing the gcv score, only the efficient method herein described provided and better estimate for the gcv score.</p><sec id="s3_2_1"><title>Efficient Calculation of the gcv Score</title><p>The idea here is to provide a means of obtaining optimum values for the gcv score, the degree of freedom tr(A) and the smoothing parameter <img src="4-1470104x\fdea9b4f-447f-47d1-bc79-4caa8c621ea1.jpg" /> which will minimize the gcv score. These will be very important since the objective is to build a model that will produce estimates in the blended field which are as close as possible to the true field. The QR decomposition described in [<xref ref-type="bibr" rid="scirp.42005-ref10">10</xref>] will be used because it is believed that QR is more stable than the Cholesky decomposition. This was achieved as follows.</p><p>The objective is to minimize</p><p><img src="4-1470104x\11d0a790-b25b-4f60-baca-a1086e3ddf53.jpg" /></p><p>with respect to<img src="4-1470104x\5eafff6d-7dac-454b-9463-b33fb7494f1b.jpg" />.</p><p><img src="4-1470104x\29f29eef-9578-4c05-931a-96133ff6509e.jpg" /></p><p>where</p><p><img src="4-1470104x\5d1b0005-bc45-4708-af59-526837b43fa7.jpg" /></p><p>The corresponding gcv score for the given <img src="4-1470104x\b722b5cc-7470-47d3-89c4-36e751deb8a1.jpg" /> is then given as</p><p><img src="4-1470104x\374ab351-417d-4854-958f-add9bb87ad35.jpg" /></p><p>In order to calculate the efficient gcv score, let X = QR where R is the upper triangle and Q consist of the columns of an orthogonal matrix such that Q<sup>T</sup>Q = I but <img src="4-1470104x\1ced0159-797f-4a27-8411-80038914f798.jpg" /></p><p><img src="4-1470104x\9c5605b9-fd63-40d4-a4dd-72451d350837.jpg" /></p><p>From an eigen-decomposition</p><p><img src="4-1470104x\956b97f5-f877-499c-b0d3-2497e0ac964d.jpg" /></p><p>where D is a diagonal matrix of eigen values, the columns of U are eigenvectors and U is orthogonal.</p><disp-formula id="scirp.42005-formula91506"><label>(9)</label><graphic position="anchor" xlink:href="4-1470104x\bf526a4c-357a-466a-8d5e-fe606ee8383b.jpg"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.42005-formula91507"><label>(10)</label><graphic position="anchor" xlink:href="4-1470104x\347e32ab-3fe7-4e5f-8e71-3bb2c3b3ed07.jpg"  xlink:type="simple"/></disp-formula><p>where<img src="4-1470104x\14b5aa99-85c8-4b21-a904-031c1192b03b.jpg" />.</p><p>From equations (9) and (10) it follows that <img src="4-1470104x\31831c39-953a-4af9-8567-3a59ae8a8806.jpg" /> could be evaluated very cheaply for each new <img src="4-1470104x\a4adcf4e-f8a9-4715-8e1d-2c5046f5d397.jpg" /> since the QR and eigen-decompositions are only needed once.</p><p>The smoothing parameter <img src="4-1470104x\a8463876-26e8-4768-88b4-f0a22b8f7eb2.jpg" /> corresponding to each of these lowest gcv scores were then use in fitting the penalized regression models. The results obtained are then compared to those from the other techniques.</p></sec></sec></sec><sec id="s4"><title>4. Validating the Blended Fields Obtained from the Various Blending Methods</title><p>The strength of this method in predicting existing in situ observation was compared to that of the normal blending method. Because penalizing the blending method had to make use of the basis function, the blended field obtained from the basis function method was also compared. Since the basis function method works by the use of a basis set (knots), after the selection of the validation data set the remaining in situ observations were then used as knots for the basis function blending method. The penalized blended field was obtained by using parameters obtained from the efficient method of calculating gcv as described in Section 3.2.1. Randomly selected validation data sets each containing 175 observations from the observed in situ data for the month of May were used in a validation study. May was selected because it had the highest number of observations in the in situ field. The mean squared differences between the predicted and the observed in situ values were computed and plotted (<xref ref-type="fig" rid="fig1">Figure 1</xref>).</p><p>There was not so much difference between predictions from the basis and the penalized. Though most of the times the differences are visible only after the third or fourth decimal place, there are a few times where the differences appear very distinct between the two in favour of the penalized blending method. Penalized models are always expected to perform better than non penalized ones. The poor performance here could have been caused by the choice of the smoothing parameter which is being obtained in this case by cross validation.</p></sec><sec id="s5"><title>5. Discussions</title><p>We have been able to successfully establish a procedure for implementing smoothing on the blending process by making use of corrector factor blending technique model of [<xref ref-type="bibr" rid="scirp.42005-ref7">7</xref>]. This was achieved by expressing the interpolation formula used by the corrector factor blending technique in a form making use of the basis function. The aim of expressing the blending process using basis functions was to pave the way to implement penalization. This was implemented by adding a penalty term to the least squares objective. This term contained the penalty function which penalizes the model and a smoothing parameter to control the smoothness of the model. The main issue here was to be able to choose the right smoothing parameter such that the estimated smooth function should be as close as possible to the true function. Cross validation technique was used to obtain the smoothing parameter. To obtain the cross validation score three techniques were used, namely ridge regression, integrated least squares and the integrated squared derivative.</p><p>Calculating the cross validation score using ridge regression failed because the final expression for calculating the score did not depend on the smoothing parameter.</p><p>As described by [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>], this is not surprising since if a Z<sub>k</sub> is dropped from the model sum of squares term in equation (8), the only thing influencing the estimate of <img src="4-1470104x\3795d57d-da2e-4f1b-b3c7-60aa9ada2c68.jpg" /> would be the penalty term, which will be minimized by setting<img src="4-1470104x\68d4f74c-62af-44a4-97d1-96984ec0c7ba.jpg" />, whatever positive value the smoothing parameter takes. This complete decoupling will cause crossvalidation to fail. Thus, if a datum is left out, its corresponding estimate will always be zero since no other data has influence on it. This behavior occurs for any possible value of the smoothing parameter.</p><p>Making use of the cross-validation score calculated from the integrated least squares did not improve on the results in this research. This again, according to [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>], is not surprising because if one considers any three equally spaced points x<sub>1</sub>, x<sub>2</sub>, x<sub>3</sub> with corresponding f(x<sub>i</sub>) values to be<img src="4-1470104x\3b9074fd-3847-4bca-b62a-439a12a62751.jpg" />, <img src="4-1470104x\32ca864b-fc66-47cd-ba55-6e577f3fc0b9.jpg" />and<img src="4-1470104x\9d54bd42-864a-4add-ad35-2b30b635278e.jpg" />. Also, if <img src="4-1470104x\f3cd48d4-ea2d-43fe-a6c7-688861efa486.jpg" /> then in order to minimize</p><p><img src="4-1470104x\0623fde0-ff14-4fcf-b25b-b72ce3a4a827.jpg" /></p><p>one should set<img src="4-1470104x\baaf1f9f-84d6-408b-bb8d-4f1c2c85de93.jpg" />. This condition does not hold for the data fields used in this research since the data fields were sparse, and the missing values were replaced by pseudo zeroes, so it was not uncommon to find a set of three adjacent points with similar values. In a situation like this, [<xref ref-type="bibr" rid="scirp.42005-ref9">9</xref>] states that, if the middle point is omitted from the fitting, the action of the penalty will send its estimate to the other side of zero from its neighbors. Meaning that a better prediction of the omitted datum will only be possible with a high smoothing parameter and this will be closer to zero since the high smoothing parameter will tend to shrink the values of the other included points towards zero and hence the omitted point. With this, cross validation will also have the tendency to always select an estimate for the omitted points closer to zero from the model. This could have been the cause of the poor results obtained. The integrated squared derivative penalty is not expected to suffer from the same problems faced by the previous methods. This is because the action of the penalty is simply to try and flatten the smooth function around the vicinity of the omitted datum. If the smoothing parameter is large, it will increase the flattening and consequently pulls the estimate far away from the omitted datum. The penalty obtained by this technique had very little or no effect on the smoothing function hence the equality in results from the penalized and the basis function model.</p></sec><sec id="s6"><title>6. Conclusion</title><p>It is expected that a penalized model would be able to perform better than a non penalized model in a situation where penalization is necessary. Three techniques have been used to obtain penalty matrices in this research with the intention of improving the results from normal blending method. The penalized model was obtained by first representing the blending method by making use of the basis function which was also considered as a model on its own. Even though the results from the basis function and penalized model were relatively identical since most differences occurred at the third or fourth decimal place, it is important to know that the difference between these methods and the normal blending method is quite alarming (<xref ref-type="fig" rid="fig1">Figure 1</xref>) and therefore should be encouraged especially if more data could be obtained from ship and buoy. With the emergence of this result, it is hoped that most of the analysis on primary productivity and management in the ocean environment will be greatly affected, since chlorophyll is one of the most important components in the formation of the ocean life cycle.</p>Future Work<p>The failure of the penalized blending regression models to perform better than the basis function model could have been because the right penalty was not obtained. Therefore, more work could be done towards obtaining other penalties. Maybe, an integrated squared second derivative could be tried or one could try a combination of the first and second derivatives (double penalization). To enable the blending process to be very close to reality, the possibility of extending it to three dimensions could be looked into.</p></sec><sec id="s7"><title>REFERENCES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.42005-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">E. Clarke, D. Speirs, M. Heath, S. Wood, W. Gurney and S. Holmes, “Calibrating Remotely Sensed Chlorophyll-a Data by Using Penalized Regression Splines,” Journal of Royal Statistics Society, Series C, Vol. 55, No. 3, 2006, pp. 331-353.</mixed-citation></ref><ref id="scirp.42005-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">R. W. Eppley, E. Stewart, M. R. Abbott and U. Heyman, “Estimating Ocean Primary Production from Satellite Chlorophyll. Introduction to Regional Differences and Statistics for the Southern California Bight,” Journal of Plankton Research, Vol. 7, No. 1, 1985, pp. 57-70.http://dx.doi.org/10.1093/plankt/7.1.57</mixed-citation></ref><ref id="scirp.42005-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">D. A. Flemer, “Chlorophyll Analysis as a Method of Evaluating the Standing Crop Phytoplankton and Primary Productivity,” Chesapeake Science, Vol. 10, No. 3-4, 1969, pp. 301-306. http://dx.doi.org/10.2307/1350474</mixed-citation></ref><ref id="scirp.42005-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">A. H. Oort, “Global Atmospheric Circulation Statistics,” NOAA Prof Paper 14 180pp Nat. Oceanic and Atmospheric Administration Silver Spring, Maryland, 1983.</mixed-citation></ref><ref id="scirp.42005-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">R. W. Reynolds, “A Real-Time Global Sea Surface Temperature Analysis,” Journal of Climate, Vol. 1, No. 1, 1988, pp. 75-87.http://dx.doi.org/10.1175/1520-0442(1988)001&lt;0075:ARTGSS&gt;2.0.CO;2</mixed-citation></ref><ref id="scirp.42005-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">W. W. Gregg and M. E. Conkright, “Global Seasonal Climatologies of Ocean Chlorophyll: Blending in Situ and Satellite Data for Coastal Zone Colour Scanner Era,” Journal of Geophysical Research, Vol. 106, No. C2, 2001, pp. 2499-2515. http://dx.doi.org/10.1029/1999JC000028</mixed-citation></ref><ref id="scirp.42005-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">M. A. Onabid, “Improved Ocean Chlorophyll Estimate from Remote Sensed Data: The Modified Blending Technique,” African Journal of Environmental Science and Technology, Vol. 5, No. 9, 2001, pp. 732-747.</mixed-citation></ref><ref id="scirp.42005-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">S. N. Wood, “Thin Plate Regression Splines,” Royal Statistical Society, Series B, Vol. 65, No. 1, 2003, pp. 95-114.http://dx.doi.org/10.1111/1467-9868.00374</mixed-citation></ref><ref id="scirp.42005-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">S. N. Wood, “Generalized Additive Models: An Introduction with R,” Chapman and Hall/CRC, London, 2006.</mixed-citation></ref><ref id="scirp.42005-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">R. Scraton, “Further Numerical Methods in Basic,” Edward Arnold Ltd, Kent, 1987.</mixed-citation></ref></ref-list></back></article>