<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2016.41019</article-id><article-id pub-id-type="publisher-id">JAMP-63080</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Error Analysis of ERM Algorithm with Unbounded and Non-Identical Sampling
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Weilin</surname><given-names>Nie</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Cheng</surname><given-names>Wang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Department of Mathematics, Huizhou University, Huizhou, China</addr-line></aff><author-notes><corresp id="cor1">* E-mail:<email>niewl@hzu.edu.cn(WN)</email>;<email>wangch@hzu.edu.cn(CW)</email>;</corresp></author-notes><pub-date pub-type="epub"><day>11</day><month>01</month><year>2016</year></pub-date><volume>04</volume><issue>01</issue><fpage>156</fpage><lpage>168</lpage><history><date date-type="received"><day>9</day>	<month>November</month>	<year>2015</year></date><date date-type="rev-recd"><day>accepted</day>	<month>24</month>	<year>January</year>	</date><date date-type="accepted"><day>27</day>	<month>January</month>	<year>2016</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  A standard assumption in the literature of learning theory is the samples which are drawn independently from an identical distribution with a uniform bounded output. This excludes the common case with Gaussian distribution. In this paper we extend these assumptions to a general case. To be precise, samples are drawn from a sequence of unbounded and non-identical probability distributions. By drift error analysis and Bennett inequality for the unbounded random variables, we derive a satisfactory learning rate for the ERM algorithm.
 
</p></abstract><kwd-group><kwd>Learning Theory</kwd><kwd> ERM</kwd><kwd> Non-Identical</kwd><kwd> Unbounded Sampling</kwd><kwd> Covering Number</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>In learning theory we study the problem of looking for a function or its approximation which reflects the relationship between the input and the output via samples. It can be considered as a mathematical analysis of artificial intelligence or machine learning. Since the exact distributions of the samples are usually unknown, we can only construct algorithms based on an empirical sample set. A typical setting of learning theory in mathe- matics can be like this: the input space X is a compact metric space, and the output space <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x7.png" xlink:type="simple"/></inline-formula> for regression. (When<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x8.png" xlink:type="simple"/></inline-formula>,it can be regarded as a binary classification problem.) Then <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x9.png" xlink:type="simple"/></inline-formula> is the whole sample space. We assume a distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x10.png" xlink:type="simple"/></inline-formula> on Z, which can be decomposed to two parts: marginal distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x11.png" xlink:type="simple"/></inline-formula> on X and conditional distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x12.png" xlink:type="simple"/></inline-formula> given some<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x13.png" xlink:type="simple"/></inline-formula>. This implies</p><disp-formula id="scirp.63080-formula1"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x14.png"  xlink:type="simple"/></disp-formula><p>for any integrable function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x15.png" xlink:type="simple"/></inline-formula> [<xref ref-type="bibr" rid="scirp.63080-ref1">1</xref>] .</p><p>To evaluate the efficiency of a function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x16.png" xlink:type="simple"/></inline-formula> we can choose the generalization error:</p><disp-formula id="scirp.63080-formula2"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x17.png"  xlink:type="simple"/></disp-formula><p>Here <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x18.png" xlink:type="simple"/></inline-formula> is a loss function which measures the difference between the prediction <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x19.png" xlink:type="simple"/></inline-formula> via f and the actual output y. It can be hinge loss in SVM (support vector machine) or pinball loss in quantile learning and etc.. In this paper we focus on the classical least square loss <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x20.png" xlink:type="simple"/></inline-formula> for simplicity. [<xref ref-type="bibr" rid="scirp.63080-ref2">2</xref>] shows that</p><disp-formula id="scirp.63080-formula3"><label>(1)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x21.png"  xlink:type="simple"/></disp-formula><p>From this we can see the regression function</p><disp-formula id="scirp.63080-formula4"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x22.png"  xlink:type="simple"/></disp-formula><p>is our goal minimizing the generalization error. The empirical risk minimization (ERM) algorithm aims to find a function which approximates the goal function <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x23.png" xlink:type="simple"/></inline-formula> well. While <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x24.png" xlink:type="simple"/></inline-formula> is always unknown beforehand, a sample set <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x23.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x24.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x25.png" xlink:type="simple"/></inline-formula> is accessible. Then ERM algorithm can be described as</p><disp-formula id="scirp.63080-formula5"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x26.png"  xlink:type="simple"/></disp-formula><p>where function space <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x27.png" xlink:type="simple"/></inline-formula> is the hypothesis space which will be chosen to be a compact subset of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x27.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x28.png" xlink:type="simple"/></inline-formula>.</p><p>Then the error produced by ERM algorithm is<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x29.png" xlink:type="simple"/></inline-formula>. We expect it is close to the optimal one<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x30.png" xlink:type="simple"/></inline-formula>, which means the excess generalization error <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x29.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x30.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x31.png" xlink:type="simple"/></inline-formula> should be small, while the sample size m tends to infinity.</p><p>Dependent sampling has considered in some literature such as [<xref ref-type="bibr" rid="scirp.63080-ref3">3</xref>] for concentration inequality and [<xref ref-type="bibr" rid="scirp.63080-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.63080-ref5">5</xref>] for learning. More recently, in [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] and [<xref ref-type="bibr" rid="scirp.63080-ref7">7</xref>] , the authors studied learning with non-identical sampling and dependent sampling, and obtained satisfactory learning rates.</p><p>In this paper we concentrate on the non-identical setting that each sample <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula> is drawn according to a different distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula> on Z. And each <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula> can also be decomposed to marginal distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula> and conditional distribution<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula>. Assume they are elements of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x38.png" xlink:type="simple"/></inline-formula> respectively, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x39.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x39.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x40.png" xlink:type="simple"/></inline-formula> are H&#246;lder spaces with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x39.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x40.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x41.png" xlink:type="simple"/></inline-formula>. H&#246;lder spaces <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x32.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x33.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x34.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x35.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x36.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x37.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x38.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x39.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x40.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x41.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x42.png" xlink:type="simple"/></inline-formula> is the set of continuous functions with finite norm</p><disp-formula id="scirp.63080-formula6"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x43.png"  xlink:type="simple"/></disp-formula><p>where</p><disp-formula id="scirp.63080-formula7"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x44.png"  xlink:type="simple"/></disp-formula><p>We assume a polynomial convergence condition for both sequences <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x45.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x46.png" xlink:type="simple"/></inline-formula>, i.e., there exist <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x47.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x45.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x46.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x47.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x48.png" xlink:type="simple"/></inline-formula>, such that</p><disp-formula id="scirp.63080-formula8"><label>(2)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x49.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.63080-formula9"><label>(3)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x50.png"  xlink:type="simple"/></disp-formula><p>Power index b measures quantitatively differences between the non-identical setting and the i.i.d. case. The distributions are more similar as b is larger, and when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x51.png" xlink:type="simple"/></inline-formula> it is indeed i.i.d. sampling, i.e. <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x52.png" xlink:type="simple"/></inline-formula>and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x53.png" xlink:type="simple"/></inline-formula> for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x51.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x52.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x53.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x54.png" xlink:type="simple"/></inline-formula>. The following example is taken from [<xref ref-type="bibr" rid="scirp.63080-ref8">8</xref>] .</p><p>Example 1. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x55.png" xlink:type="simple"/></inline-formula> be a sequence of bounded functions on X such that<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x56.png" xlink:type="simple"/></inline-formula>. Then the sequence <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x57.png" xlink:type="simple"/></inline-formula> defined by <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x57.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x58.png" xlink:type="simple"/></inline-formula> satisfies (2) for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x55.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x56.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x57.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x58.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x59.png" xlink:type="simple"/></inline-formula>.</p><p>On the other hand, most literature assume the output space is uniformly bounded, that is, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x60.png" xlink:type="simple"/></inline-formula>for some positive constant M and almost surely with respect to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x60.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x61.png" xlink:type="simple"/></inline-formula>. A typical kernel dependent result for the least-squares regularization algorithm under this assumption is [<xref ref-type="bibr" rid="scirp.63080-ref9">9</xref>] . There the authors get a learning rate close to 1 under some capacity condition for the hypothesis space. However, the most common distribution-Gaussian distribution is not bounded. This requirement is from the bounded condition in Bernstein inequality and limits the application of algorithms. In [<xref ref-type="bibr" rid="scirp.63080-ref10">10</xref>] - [<xref ref-type="bibr" rid="scirp.63080-ref13">13</xref>] , some unbounded conditions for the output space are discussed in different forms, which extends the classical bounded condition. Here we will follow the latter one which is more generalized and simple in expression, and this is the second novelty of this paper. We assume the moment incremental condition for the output space, an extension of that we proposed in [<xref ref-type="bibr" rid="scirp.63080-ref11">11</xref>] :</p><disp-formula id="scirp.63080-formula10"><label>(4)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x62.png"  xlink:type="simple"/></disp-formula><p>and</p><disp-formula id="scirp.63080-formula11"><label>(5)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x63.png"  xlink:type="simple"/></disp-formula><p>We can see the Gaussian distribution satisfies this setting.</p><p>Example 2. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula>. If for each <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula> and the condition distribution <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x67.png" xlink:type="simple"/></inline-formula> is a normal distribution with variance <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x68.png" xlink:type="simple"/></inline-formula> bounded by<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x68.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x69.png" xlink:type="simple"/></inline-formula>, then (4) is satisfied with <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x68.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x70.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x64.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x65.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x66.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x67.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x68.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x69.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x70.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x71.png" xlink:type="simple"/></inline-formula>.</p><p>Next we need to introduce the covering number and interpolation space.</p><p>Definition 1. The covering number <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula> for a subset <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x73.png" xlink:type="simple"/></inline-formula> of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x73.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x74.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x73.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x74.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x75.png" xlink:type="simple"/></inline-formula> is defined to be the minimal integer N such that there exist N balls with radius <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x73.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x74.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x75.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x76.png" xlink:type="simple"/></inline-formula> covering<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x72.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x73.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x74.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x75.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x76.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x77.png" xlink:type="simple"/></inline-formula>.</p><p>Let the hypothesis space<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x78.png" xlink:type="simple"/></inline-formula>, be a compact Banach space with inclusion <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x78.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x79.png" xlink:type="simple"/></inline-formula> bounded and compact. We follow the assumption [<xref ref-type="bibr" rid="scirp.63080-ref14">14</xref>] [<xref ref-type="bibr" rid="scirp.63080-ref15">15</xref>] that there exist some constants <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x78.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x79.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x80.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x78.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x79.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x80.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x81.png" xlink:type="simple"/></inline-formula>, such that the hypothesis space satisfies the capacity condition</p><disp-formula id="scirp.63080-formula12"><label>(6)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x82.png"  xlink:type="simple"/></disp-formula><p>where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x83.png" xlink:type="simple"/></inline-formula>. Capacity condition describes the amount of functions in the hypothesis space.</p><p>The sample error will decrease but approximation error will increase when covering number of H is larger (or simply say H is larger). So how to choose an appropriate hypothesis space is the key problem of ERM algorithm. We will demonstrate this in our main theorem.</p><p>Definition 2. The interpolation space <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x84.png" xlink:type="simple"/></inline-formula> is a function space consists of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x84.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x85.png" xlink:type="simple"/></inline-formula> with norm</p><disp-formula id="scirp.63080-formula13"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x86.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x87.png" xlink:type="simple"/></inline-formula> is the K-functional defined as</p><disp-formula id="scirp.63080-formula14"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x88.png"  xlink:type="simple"/></disp-formula><p>Interpolation space is used to characterize the position of the regression function, and it is related with the approximation error. Now we can state our main result as follow.</p><p>Theorem 1. If <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula> with bounded inclusion<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula>, and satisfies (6) with r, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula>for some<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula>, the sample distribution satisfies (2), (3) for some <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x95.png" xlink:type="simple"/></inline-formula>, (4) and (5). For any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x95.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x96.png" xlink:type="simple"/></inline-formula>, choose the hypothesis space <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x95.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x96.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x97.png" xlink:type="simple"/></inline-formula> to be the ball of H centered at 0 with radius<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x95.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x96.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x97.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x98.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x89.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x90.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x91.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x92.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x93.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x94.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x95.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x96.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x97.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x98.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x99.png" xlink:type="simple"/></inline-formula> and</p><disp-formula id="scirp.63080-formula15"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x100.png"  xlink:type="simple"/></disp-formula><p>Moreover, we assume all functions in H and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x101.png" xlink:type="simple"/></inline-formula> are H&#246;lder continuous of order s, i.e., there is a constant<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x101.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x102.png" xlink:type="simple"/></inline-formula>, such that</p><disp-formula id="scirp.63080-formula16"><label>(7)</label><graphic position="anchor" xlink:href="http://html.scirp.org/file/13-1720370x103.png"  xlink:type="simple"/></disp-formula><p>Then for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x104.png" xlink:type="simple"/></inline-formula>, with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x104.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x105.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula17"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x106.png"  xlink:type="simple"/></disp-formula><p>Here <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x107.png" xlink:type="simple"/></inline-formula> is a constant independent with m and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x107.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x108.png" xlink:type="simple"/></inline-formula>.</p><p>Remark 1. In [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] , the authors pointed out that if we choose the hypothesis space to be the reproducing kernel Hilbert space (RKHS) <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x109.png" xlink:type="simple"/></inline-formula>on<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x109.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x110.png" xlink:type="simple"/></inline-formula>, and the kernel<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x109.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x110.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x111.png" xlink:type="simple"/></inline-formula>, then our assumption (7) will hold true. In particular, if the kernel is chosen to be Gaussian kernel<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x109.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x110.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x112.png" xlink:type="simple"/></inline-formula>, then (7) holds for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x109.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x110.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x111.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x112.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x113.png" xlink:type="simple"/></inline-formula>. [<xref ref-type="bibr" rid="scirp.63080-ref16">16</xref>] discussed this in detail.</p><p>In all, we extend the polynomial convergence condition on the conditional distribution sequense and accordingly, set the moment inremental condition for the sequence in the least squares ERM algorithm. By error decomposition, truncate technique and unbounded concentration inequality, we can finally obtain the total error bound Theorem 1.</p><p>Compared with the non-identical settings in [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] and [<xref ref-type="bibr" rid="scirp.63080-ref17">17</xref>] , our setting is more general since the conditional distribution sequence <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x114.png" xlink:type="simple"/></inline-formula> is also a polynomially convergence sequence, but not identical as in their settings. This together with unbounded y lead to the main difficulty for the error analysis in this paper.</p><p>For the classical i.i.d. and bounded conditions, [<xref ref-type="bibr" rid="scirp.63080-ref9">9</xref>] indicates that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x115.png" xlink:type="simple"/></inline-formula> and kernel <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x116.png" xlink:type="simple"/></inline-formula> while<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x117.png" xlink:type="simple"/></inline-formula>, the rate of least square regularization algorithm is <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x118.png" xlink:type="simple"/></inline-formula> for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x115.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x116.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x117.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x118.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x119.png" xlink:type="simple"/></inline-formula>. [<xref ref-type="bibr" rid="scirp.63080-ref17">17</xref>] shows that</p><p>under some conditions on kernel, object function<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x120.png" xlink:type="simple"/></inline-formula>, exponential convergence condition for distribution sequence and choose some special parameters, the optimal rate of online learning algorithm is close to</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula>. In [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] , the best case occurs when <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x122.png" xlink:type="simple"/></inline-formula> and kernel<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x123.png" xlink:type="simple"/></inline-formula>. The rate of least square regularization algorithm can be close to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x124.png" xlink:type="simple"/></inline-formula>. However, our result implicates that while<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x125.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x121.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x122.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x123.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x124.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x125.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x126.png" xlink:type="simple"/></inline-formula></p><p>tends to 1 and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x127.png" xlink:type="simple"/></inline-formula> tends to 0, since p can be any integer, the learning rate can be arbitrarily close to<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x128.png" xlink:type="simple"/></inline-formula>, which is the same as in i.i.d. case [<xref ref-type="bibr" rid="scirp.63080-ref9">9</xref>] , and better than the former results with non-identical settings. With this result, we can extend the application of learning algorithm to more situations and still keep the best learning rate. The explicit expression of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x127.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x128.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x129.png" xlink:type="simple"/></inline-formula> in the theorem can be found through the proof of the theorem below.</p></sec><sec id="s2"><title>2. Error Decomposition</title><p>Our aim, the error <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x130.png" xlink:type="simple"/></inline-formula> is hard to bound directly, we need a transitional function for analyzing. By the compactness of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x131.png" xlink:type="simple"/></inline-formula> and continuity of functional<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x130.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x131.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x132.png" xlink:type="simple"/></inline-formula>, we can denote</p><disp-formula id="scirp.63080-formula18"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x133.png"  xlink:type="simple"/></disp-formula><p>Then the generalization error can be written as</p><disp-formula id="scirp.63080-formula19"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x134.png"  xlink:type="simple"/></disp-formula><p>The first term on the right hand side is the sample error, and the second term <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x135.png" xlink:type="simple"/></inline-formula> is called approximation error which is independent with samples. [<xref ref-type="bibr" rid="scirp.63080-ref18">18</xref>] analyzed the approximation error by approxi- mation theory. In the following we mainly study the sample error bound.</p><p>Now we break the sample error to some parts which can be bounded using truncate technique and unbounded concentration inequality. We refer the error decomposition <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x136.png" xlink:type="simple"/></inline-formula> to [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] . Denote</p><disp-formula id="scirp.63080-formula20"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x137.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.63080-formula21"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x138.png"  xlink:type="simple"/></disp-formula><p>then <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x139.png" xlink:type="simple"/></inline-formula> and we have</p><disp-formula id="scirp.63080-formula22"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x140.png"  xlink:type="simple"/></disp-formula><p>In the following, we call the first and fourth brackets drift errors, and the left sample errors. We will bound the two types of errors respectively in the following sections, and finally obtain the total error bounds.</p></sec><sec id="s3"><title>3. Drift Errors</title><p>Firstly we consider the drift error involving <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x141.png" xlink:type="simple"/></inline-formula> in this section.To avoid handling two polynomial convergence sequences simultaneously, we break the drift errors to two parts. Meanwhile, a truncate technique is used to deal with the unbounded assumption. Since <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x142.png" xlink:type="simple"/></inline-formula> is a subset of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x143.png" xlink:type="simple"/></inline-formula>, functions in <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x144.png" xlink:type="simple"/></inline-formula> is uniformly bounded. Then we have</p><p>Proposition 1. Assume <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x145.png" xlink:type="simple"/></inline-formula> for some<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x146.png" xlink:type="simple"/></inline-formula>, for any <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x147.png" xlink:type="simple"/></inline-formula></p><disp-formula id="scirp.63080-formula23"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x148.png"  xlink:type="simple"/></disp-formula><p>Proof. From the definition of <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x149.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x149.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x150.png" xlink:type="simple"/></inline-formula>, we know that</p><disp-formula id="scirp.63080-formula24"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x151.png"  xlink:type="simple"/></disp-formula><p>Since<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x152.png" xlink:type="simple"/></inline-formula>, we can bound the first term inside the bracket as follow.</p><disp-formula id="scirp.63080-formula25"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x153.png"  xlink:type="simple"/></disp-formula><p>But for any <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x154.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x154.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x155.png" xlink:type="simple"/></inline-formula>, there holds</p><disp-formula id="scirp.63080-formula26"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x156.png"  xlink:type="simple"/></disp-formula><p>From (3.12) in [<xref ref-type="bibr" rid="scirp.63080-ref6">6</xref>] , we have</p><disp-formula id="scirp.63080-formula27"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x157.png"  xlink:type="simple"/></disp-formula><p>Then we can bound the sum of the first term as</p><disp-formula id="scirp.63080-formula28"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x158.png"  xlink:type="simple"/></disp-formula><p>Choose K to be<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x159.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula29"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x160.png"  xlink:type="simple"/></disp-formula><p>For the second term, notice<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x161.png" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x161.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x162.png" xlink:type="simple"/></inline-formula> so</p><disp-formula id="scirp.63080-formula30"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x163.png"  xlink:type="simple"/></disp-formula><p>Therefore</p><disp-formula id="scirp.63080-formula31"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x164.png"  xlink:type="simple"/></disp-formula><p>Combining the two bounds, we have</p><disp-formula id="scirp.63080-formula32"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x165.png"  xlink:type="simple"/></disp-formula><p>And this is indeed the proposition.</p><p>For the drift error involving<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x166.png" xlink:type="simple"/></inline-formula>, we have the same result since <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x166.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x167.png" xlink:type="simple"/></inline-formula> as well, i.e.,</p><p>Proposition 2. Assume <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x168.png" xlink:type="simple"/></inline-formula> for some<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x169.png" xlink:type="simple"/></inline-formula>, for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x170.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula33"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x171.png"  xlink:type="simple"/></disp-formula></sec><sec id="s4"><title>4. Sample Error Estimate</title><p>We devote this section to the analysis of the sample errors. For the sample error term involving<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x172.png" xlink:type="simple"/></inline-formula>, we will use the Bennett inequality as in [<xref ref-type="bibr" rid="scirp.63080-ref11">11</xref>] and [<xref ref-type="bibr" rid="scirp.63080-ref19">19</xref>] , which is initially introduced in [<xref ref-type="bibr" rid="scirp.63080-ref20">20</xref>] . Since two polynomial convergence conditions are posed on the marginal and conditional distribution sequences, we have to modify the</p><p>Bennett inequality to fit our setting. Denote <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x173.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x173.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x174.png" xlink:type="simple"/></inline-formula> for an integrable function g, the lemma can be stated as follow.</p><p>Lemma 1. Assume <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x175.png" xlink:type="simple"/></inline-formula> holds for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x175.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x176.png" xlink:type="simple"/></inline-formula> and some constants <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x175.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x177.png" xlink:type="simple"/></inline-formula> , then we have</p><disp-formula id="scirp.63080-formula34"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x178.png"  xlink:type="simple"/></disp-formula><p>For our non-identical setting, we can have a similar result from the same idea of proof. By denoting <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x179.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x180.png" xlink:type="simple"/></inline-formula>, the following lemma holds.</p><p>Lemma 2. Assume <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x181.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x182.png" xlink:type="simple"/></inline-formula> for some constants <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x183.png" xlink:type="simple"/></inline-formula> and any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x184.png" xlink:type="simple"/></inline-formula>, then we have</p><disp-formula id="scirp.63080-formula35"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x185.png"  xlink:type="simple"/></disp-formula><p>Now we can bound the sample error term <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x186.png" xlink:type="simple"/></inline-formula> by applying this lemma.</p><p>Proposition 3. Under the moment incremental condition (4), (5) and notations above, with probability at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x187.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula36"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x188.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x189.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x189.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x190.png" xlink:type="simple"/></inline-formula> is the approximation error.</p><p>Proof. Let</p><p><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x191.png" xlink:type="simple"/></inline-formula>, then</p><disp-formula id="scirp.63080-formula37"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x192.png"  xlink:type="simple"/></disp-formula><p>Since<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x193.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula38"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x194.png"  xlink:type="simple"/></disp-formula><p>for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x195.png" xlink:type="simple"/></inline-formula>, where 1and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x196.png" xlink:type="simple"/></inline-formula>. In the same way, we have the following bounds</p><disp-formula id="scirp.63080-formula39"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x197.png"  xlink:type="simple"/></disp-formula><p>as well. Then from Lemma 2 above, we have</p><disp-formula id="scirp.63080-formula40"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x198.png"  xlink:type="simple"/></disp-formula><p>Set the right hand side to be<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x199.png" xlink:type="simple"/></inline-formula>, we can solve that</p><disp-formula id="scirp.63080-formula41"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x200.png"  xlink:type="simple"/></disp-formula><p>Therefore with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x201.png" xlink:type="simple"/></inline-formula>, there holds</p><disp-formula id="scirp.63080-formula42"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x202.png"  xlink:type="simple"/></disp-formula><p>This proves the proposition.</p><p>For the sample error term involving<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x203.png" xlink:type="simple"/></inline-formula>, analysis will be more involved since we need a concentration inequality for a set of functions. Firstly we have to introduce the ratio inequality [<xref ref-type="bibr" rid="scirp.63080-ref9">9</xref>] .</p><p>Lemma 3. Denote <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula> for<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x205.png" xlink:type="simple"/></inline-formula>, which satisfies <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x205.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x206.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x205.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x207.png" xlink:type="simple"/></inline-formula> for some constants <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x205.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x208.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x205.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x206.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x207.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x208.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x209.png" xlink:type="simple"/></inline-formula>, then we have</p><disp-formula id="scirp.63080-formula43"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x210.png"  xlink:type="simple"/></disp-formula><p>Proof. Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x211.png" xlink:type="simple"/></inline-formula> to be <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x211.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x212.png" xlink:type="simple"/></inline-formula> in the Lemma 2, from the proof of the last proposition, we can conclude that</p><disp-formula id="scirp.63080-formula44"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x213.png"  xlink:type="simple"/></disp-formula><p>Note that <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x214.png" xlink:type="simple"/></inline-formula> and the lemma is proved.</p><p>Then we have the following result.</p><p>Lemma 4. For a set of functions <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x215.png" xlink:type="simple"/></inline-formula> with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x216.png" xlink:type="simple"/></inline-formula>, construct functions <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x217.png" xlink:type="simple"/></inline-formula> for<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x218.png" xlink:type="simple"/></inline-formula>, with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x215.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x216.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x217.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x218.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x219.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula45"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x220.png"  xlink:type="simple"/></disp-formula><p>where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x221.png" xlink:type="simple"/></inline-formula> for any <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x221.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x222.png" xlink:type="simple"/></inline-formula></p><p>Proof. Since <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x223.png" xlink:type="simple"/></inline-formula> is an element of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x223.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x224.png" xlink:type="simple"/></inline-formula>, from Lemma 3 we have</p><disp-formula id="scirp.63080-formula46"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x225.png"  xlink:type="simple"/></disp-formula><p>then there holds</p><disp-formula id="scirp.63080-formula47"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x226.png"  xlink:type="simple"/></disp-formula><p>Set the right hand side to be <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x227.png" xlink:type="simple"/></inline-formula> and we have with probability at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x227.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x228.png" xlink:type="simple"/></inline-formula>,</p><disp-formula id="scirp.63080-formula48"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x229.png"  xlink:type="simple"/></disp-formula><p>Here<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x230.png" xlink:type="simple"/></inline-formula>. And this proves the lemma.</p><p>Now by a covering number argument we can bound the sample error term involving<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x231.png" xlink:type="simple"/></inline-formula>.</p><p>Proposition 4. If <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x232.png" xlink:type="simple"/></inline-formula> for some<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x232.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x233.png" xlink:type="simple"/></inline-formula>, where H satisfies the capacity condition, for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x232.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x233.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x234.png" xlink:type="simple"/></inline-formula>, with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x232.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x233.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x234.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x235.png" xlink:type="simple"/></inline-formula>, there holds</p><disp-formula id="scirp.63080-formula49"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x236.png"  xlink:type="simple"/></disp-formula><p>Proof. Denote <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula> where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x238.png" xlink:type="simple"/></inline-formula> is to be determined, then we can find an <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x238.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x239.png" xlink:type="simple"/></inline-formula>-net <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x238.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x239.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x240.png" xlink:type="simple"/></inline-formula> of<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x238.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x239.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x240.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x241.png" xlink:type="simple"/></inline-formula>, and there exist a function<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x237.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x238.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x239.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x240.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x241.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x242.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula50"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x243.png"  xlink:type="simple"/></disp-formula><p>For the first term, since <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x244.png" xlink:type="simple"/></inline-formula> for all<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x244.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x245.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula51"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x246.png"  xlink:type="simple"/></disp-formula><p>And for the third term,</p><disp-formula id="scirp.63080-formula52"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x247.png"  xlink:type="simple"/></disp-formula><p>we need to bound</p><disp-formula id="scirp.63080-formula53"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x248.png"  xlink:type="simple"/></disp-formula><p>Let <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x249.png" xlink:type="simple"/></inline-formula> and then</p><disp-formula id="scirp.63080-formula54"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x250.png"  xlink:type="simple"/></disp-formula><p>From Lemma 1 we have</p><disp-formula id="scirp.63080-formula55"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x251.png"  xlink:type="simple"/></disp-formula><p>Set the right hand side to be <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x252.png" xlink:type="simple"/></inline-formula> and with confidence at least <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x252.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x253.png" xlink:type="simple"/></inline-formula> we have</p><disp-formula id="scirp.63080-formula56"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x254.png"  xlink:type="simple"/></disp-formula><p>And this means,</p><disp-formula id="scirp.63080-formula57"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x255.png"  xlink:type="simple"/></disp-formula><p>with probability at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x256.png" xlink:type="simple"/></inline-formula>.</p><p>The second term can be bounded by 4 above. That is, with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x257.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula58"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x258.png"  xlink:type="simple"/></disp-formula><p>Since <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x259.png" xlink:type="simple"/></inline-formula> by assumption, and</p><disp-formula id="scirp.63080-formula59"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x260.png"  xlink:type="simple"/></disp-formula><p>combining the three parts above, we have the following bound with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x261.png" xlink:type="simple"/></inline-formula>,</p><disp-formula id="scirp.63080-formula60"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x262.png"  xlink:type="simple"/></disp-formula><p>By choosing <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x263.png" xlink:type="simple"/></inline-formula> for balancing, we have</p><disp-formula id="scirp.63080-formula61"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x264.png"  xlink:type="simple"/></disp-formula><p>with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x265.png" xlink:type="simple"/></inline-formula>, this proves the proposition.</p></sec><sec id="s5"><title>5. Approximation Error and Total Error</title><p>Combining the results above, we can derive the error bound for the generalization error<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x266.png" xlink:type="simple"/></inline-formula>.</p><p>Proposition 5. Under the moment condition for the distribution of the sample and capacity condition for the hypothesis space<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x267.png" xlink:type="simple"/></inline-formula>, for any <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x268.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x268.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x269.png" xlink:type="simple"/></inline-formula>, with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x267.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x268.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x269.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x270.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula62"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x271.png"  xlink:type="simple"/></disp-formula><p>where<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x272.png" xlink:type="simple"/></inline-formula>.</p><p>What is left to be determined in the proposition is the approximation error<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x273.png" xlink:type="simple"/></inline-formula>. By the choice of hypothesis space we can get our main result.</p><p>Proof of Theorem 1. Let</p><disp-formula id="scirp.63080-formula63"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x274.png"  xlink:type="simple"/></disp-formula><p>and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x275.png" xlink:type="simple"/></inline-formula>, assume <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x275.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x276.png" xlink:type="simple"/></inline-formula> without loss of generality, and<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x275.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x276.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x277.png" xlink:type="simple"/></inline-formula>, Proposition 5 indicates that</p><disp-formula id="scirp.63080-formula64"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x278.png"  xlink:type="simple"/></disp-formula><p>holds with confidence at least <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x279.png" xlink:type="simple"/></inline-formula> for any<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x279.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x280.png" xlink:type="simple"/></inline-formula>, where <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x279.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x280.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x281.png" xlink:type="simple"/></inline-formula> is a constant independent on m or<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x279.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x280.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x281.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x282.png" xlink:type="simple"/></inline-formula>.</p><p>For the approximation error<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x283.png" xlink:type="simple"/></inline-formula>, we can bound it by Theorem 3.1 of [<xref ref-type="bibr" rid="scirp.63080-ref18">18</xref>] . Since the hypothesis space<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x283.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x284.png" xlink:type="simple"/></inline-formula>, and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x283.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x284.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x285.png" xlink:type="simple"/></inline-formula> with<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x283.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x284.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x285.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x286.png" xlink:type="simple"/></inline-formula>, we have</p><disp-formula id="scirp.63080-formula65"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x287.png"  xlink:type="simple"/></disp-formula><p>The upper bound B is now chosen to be <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x288.png" xlink:type="simple"/></inline-formula> since<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x288.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x289.png" xlink:type="simple"/></inline-formula>, then with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x288.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x289.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x290.png" xlink:type="simple"/></inline-formula>,</p><disp-formula id="scirp.63080-formula66"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x291.png"  xlink:type="simple"/></disp-formula><p>By choosing</p><disp-formula id="scirp.63080-formula67"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x292.png"  xlink:type="simple"/></disp-formula><p>we have</p><disp-formula id="scirp.63080-formula68"><graphic  xlink:href="http://html.scirp.org/file/13-1720370x293.png"  xlink:type="simple"/></disp-formula><p>holds with confidence at least<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x294.png" xlink:type="simple"/></inline-formula>. Denote<inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x294.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x295.png" xlink:type="simple"/></inline-formula>, then the theorem is obtained.</p></sec><sec id="s6"><title>6. Summary and Future Work</title><p>We investigate the least squares ERM algorithm with non-identical and unbounded sample, i.e., polynomial convergence for <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x296.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x296.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="http://html.scirp.org/file/13-1720370x297.png" xlink:type="simple"/></inline-formula> and moment inremental condition for the latter ones. Analogue</p><p>error decomposition as classical analysis for least sqaures regularization [<xref ref-type="bibr" rid="scirp.63080-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.63080-ref11">11</xref>] is conducted. Truncate techni- que is introduced for handling unbounded setting, and Bennett concentration inequality is used for the sample error. By the above analysis we finally get the error bound and learning rate.</p><p>However, our work only considers the ERM algorithm. It is neccesary for us to extend this to the regulari- zation algorithms which are more widely used in practice. A more recent relative reference can be found in [<xref ref-type="bibr" rid="scirp.63080-ref21">21</xref>] . Another interesting topic in future study is dependent sampling [<xref ref-type="bibr" rid="scirp.63080-ref7">7</xref>] .</p></sec><sec id="s7"><title>Cite this paper</title><p>Weilin Nie,Cheng Wang, (2016) Error Analysis of ERM Algorithm with Unbounded and Non-Identical Sampling. Journal of Applied Mathematics and Physics,04,156-168. doi: 10.4236/jamp.2016.41019</p></sec><sec id="s8"><title>NOTES</title></sec></body><back><ref-list><title>References</title><ref id="scirp.63080-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Cucker, F. and Zhou, D.X. (2007) Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge. http://dx.doi.org/10.1017/CBO9780511618796</mixed-citation></ref><ref id="scirp.63080-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Cucker, F. and Smale, S. (2002) On the Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 39, 1-49.</mixed-citation></ref><ref id="scirp.63080-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Dehling, H., Mikosch, T. and Sorensen, M. (2002) Empirical Process Techniques for Dependent Data. Birkhauser Boston, Inc., Boston. http://dx.doi.org/10.1007/978-1-4612-0099-4</mixed-citation></ref><ref id="scirp.63080-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Steinwart, I., Hush, D. and Scovel, C. (2009) Learning from Dependent Observations. Journal of Multivariate Analysis, 100, 175-194.</mixed-citation></ref><ref id="scirp.63080-ref5"><label>5</label><mixed-citation publication-type="book" xlink:type="simple">Steinwart, I. and Christmann, A. (2009) Fast Learning from Non-i.i.d. Observations. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I. and Culotta, A., Eds., Advances in Neural Information Processing Systems 22, Curran and Associates, Inc., Yellowknife, 1768-1776.</mixed-citation></ref><ref id="scirp.63080-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Xiao, Q.W. and Pan, Z.W. (2010) Learning from Non-Identical Sampling for Classification. Advances in Computational Mathematics, 33, 97-112.</mixed-citation></ref><ref id="scirp.63080-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Pan, Z.W. and Xiao, Q.W. (2009) Least-Square Regularized Regression with Non-i.i.d. Sampling. Journal of Statistical Planning and Inference, 139, 3579-3587.</mixed-citation></ref><ref id="scirp.63080-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Hu, T. and Zhou, D.X. (2009) Online Learning with Samples Drawn from Non-identical Distributions. Journal of Machine Learning Research, 10, 2873-2898.</mixed-citation></ref><ref id="scirp.63080-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Wu, Q., Ying, Y. and Zhou, D.X. (2006) Learning Rates of Least-Square Regularized Regression. Foundations of Computational Mathematics, 6, 171-192.</mixed-citation></ref><ref id="scirp.63080-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Capponnetto, A. and De Vito, E. (2007) Optimal Rates for the Regularized Least Squares Algorithm. Foundations of Computational Mathematics, 7, 331-368.</mixed-citation></ref><ref id="scirp.63080-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Wang, C. and Zhou, D.X. (2011) Optimal Learning Rates for Least Squares Regularized Regression with Unbounded Sampling. Journal of Complexity, 27, 55-67.</mixed-citation></ref><ref id="scirp.63080-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Guo, Z.C. and Zhou, D.X. (2013) Concentration Estimates for Learning with Unbounded Sampling. Advances in Computational Mathematics, 38, 207-223.</mixed-citation></ref><ref id="scirp.63080-ref13"><label>13</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>He</surname><given-names> F. </given-names></name>,<etal>et al</etal>. (<year>2014</year>)<article-title>Optimal Convergence Rates of High Order Parzen Windows with Unbounded Sampling</article-title><source> Statistics &amp; Probability Letters</source><volume> 92</volume>,<fpage> 26</fpage>-<lpage>32</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.63080-ref14"><label>14</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhou</surname><given-names> D.X. </given-names></name>,<etal>et al</etal>. (<year>2002</year>)<article-title>The Covering Number in Learning Theory</article-title><source> Journal of Complexity</source><volume> 18</volume>,<fpage> 739</fpage>-<lpage>767</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.63080-ref15"><label>15</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhou</surname><given-names> D.X. </given-names></name>,<etal>et al</etal>. (<year>2003</year>)<article-title>Capacity of Reproducing Kernel Spaces in Learning Theory</article-title><source> IEEE Transactions on Information Theory</source><volume> 49</volume>,<fpage> 1743</fpage>-<lpage>1752</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.63080-ref16"><label>16</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Zhou</surname><given-names> D.X. </given-names></name>,<etal>et al</etal>. (<year>2008</year>)<article-title>Derivative Reproducing Properties for Kernel Methods in Learning Theory</article-title><source> Journal of Computational and Applied Mathematics</source><volume> 220</volume>,<fpage> 456</fpage>-<lpage>463</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.63080-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Smale, S. and Zhou, D.X. (2009) Online Learning with Markov Sampling. Analysis and Applications, 7, 87-113.</mixed-citation></ref><ref id="scirp.63080-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Smale, S. and Zhou, D.X. (2003) Estimating the Approximation Error in Learning Theory. Analysis and Applications, 1, 17-41.</mixed-citation></ref><ref id="scirp.63080-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Wang, C. and Guo, Z.C. (2012) ERM Learning with Unbounded Sampling. Acta Mathematica Sinica, English Series, 28, 97-104.</mixed-citation></ref><ref id="scirp.63080-ref20"><label>20</label><mixed-citation publication-type="journal" xlink:type="simple"><name name-style="western"><surname>Bennett</surname><given-names> G. </given-names></name>,<etal>et al</etal>. (<year>1962</year>)<article-title>Probability Inequalities for the Sum of Independent Random Variables</article-title><source> Journal of the American Statistical Association</source><volume> 57</volume>,<fpage> 33</fpage>-<lpage>45</lpage>.<pub-id pub-id-type="doi"></pub-id></mixed-citation></ref><ref id="scirp.63080-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Cai, J. (2013) Coefficient-Based Regression with Non-Identical Unbounded Sampling. Abstract and Applied Analysis, 2013, Article ID: 134727. http://dx.doi.org/10.1155/2013/134727</mixed-citation></ref></ref-list></back></article>