<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2017.54079</article-id><article-id pub-id-type="publisher-id">JAMP-76021</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Error Analysis and Variable Selection for Differential Private Learning Algorithm
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Weilin</surname><given-names>Nie</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Cheng</surname><given-names>Wang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Huizhou University, Huizhou, China</addr-line></aff><pub-date pub-type="epub"><day>12</day><month>04</month><year>2017</year></pub-date><volume>05</volume><issue>04</issue><fpage>900</fpage><lpage>911</lpage><history><date date-type="received"><day>14,</day>	<month>February</month>	<year>2017</year></date><date date-type="rev-recd"><day>27,</day>	<month>April</month>	<year>2017</year>	</date><date date-type="accepted"><day>30,</day>	<month>April</month>	<year>2017</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  In this paper, we construct a modified least squares regression algorithm which can provide privacy protection. A new concentration inequality is applied and the expected error bound is derived by error decomposition. Furthermore, via the error analysis, we find a method to choose an appropriate parameter to balance the error and privacy.
 
</p></abstract><kwd-group><kwd>Differential Privacy</kwd><kwd> Least Squares Regularization</kwd><kwd> Concentration Inequality</kwd><kwd> Error Decomposition</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Privacy protection attracts much attention in many branches of computer sci- ence. To deal with this, Dwork et al. proposed differential privacy in [<xref ref-type="bibr" rid="scirp.76021-ref1">1</xref>] . Soon [<xref ref-type="bibr" rid="scirp.76021-ref2">2</xref>] builds an exponential mechanism which is a useful approach to construct a differential private algorithm. The concept is introduced into learning theory in [<xref ref-type="bibr" rid="scirp.76021-ref3">3</xref>] . There, the authors consider output perturbation and object perturbation for ERM algorithms. Analysis of privacy and generalization for those algorithms also has been conducted. P. Jain and his collaborators have done a lot of work on differential private learning afterwards [<xref ref-type="bibr" rid="scirp.76021-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref5">5</xref>] and etc. Recently, in [<xref ref-type="bibr" rid="scirp.76021-ref6">6</xref>] , the authors find that the empirical average of the output from a differential private algorithm can converge to its expectation. And [<xref ref-type="bibr" rid="scirp.76021-ref7">7</xref>] provides another analysis of this convergence, which motivates our work.</p><p>In this paper, we consider the following statistical learning model (see [<xref ref-type="bibr" rid="scirp.76021-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref9">9</xref>] for more details): The input space X is a compact metric space, and the output space is Y ⊂ ℝ as a regression problem. Throughout the paper, we assume the output Y is uniformly bounded, i.e., | y | ≤ M for some M &gt; 0 almost surely. On the sample space Z : = X &#215; Y , we try to find a function f : X → Y via some algorithms A , reflecting the relationship between the input and output. Algorithm A relies on the random chosen sample z = { z i } i = 1 m = { ( x i , y i ) } i = 1 m , while the sample is drawn according to a distribution function ρ on Z . Furthermore, we assume there is a marginal distribution ρ X on X and conditional distribution ρ ( y | x ) on Y given some x .</p><p>Now we expect the algorithm can provide some privacy protection. We assume A satisfies the ( ϵ , γ ) differential private condition [<xref ref-type="bibr" rid="scirp.76021-ref1">1</xref>] . Denoting the Hamming distance between two sample sets { z 1 , z 2 } is</p><p>d ( z 1 , z 2 ) = # { i = 1 , ⋯ , m : z 1 , i ≠ z 2 , i } ,</p><p>i.e., there is only one element is different. Then ( ϵ , γ ) -differential private is defined as follows:</p><p>Definition 1 A random algorithm A : Z m → H is ( ϵ , γ ) -differential private if for every two data sets z 1 , z 2 satisfying d ( z 1 , z 2 ) = 1 , and every sets O ∈ H we have</p><p>Pr { A ( z 1 ) ∈ O } ≤ e ϵ ⋅ Pr { A ( z 2 ) ∈ O } + γ .</p><p>Here H is a function space from X to Y , which is called the hypothesis space. In the sequel, we focus on the ( ϵ ,0 ) -differential privacy with some 0 &lt; ϵ &lt; 1 , which is always called ϵ -differential privacy for simplicity. How to choose an appropriate ϵ is a fundamental problem in differential private algorithms [<xref ref-type="bibr" rid="scirp.76021-ref10">10</xref>] , and we will provide a method during our error estimation in the following sections.</p></sec><sec id="s2"><title>2. Concentration Inequality</title><p>In this section, we study the error between average and expectation for an algorithm A providing ϵ -differential privacy. Our first result can be stated as follow:</p><p>Theorem 1 If an algorithm A provides ϵ -differential privacy, and outputs a positive function g z , A : X &#215; Y → ℝ with bounded expectation E z , A g z , A ≤ G for some G &gt; 0 , where the expectation is taken over the sample via the algorithm output. Then</p><p>E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) − ∫ Z g z , A ( z ) d ρ ) ≤ 2 G ϵ ,</p><p>and</p><p>E z , A ( ∫ Z g z , A ( z ) d ρ − 1 m ∑ i = 1 m g z , A ( z i ) ) ≤ 2 G ϵ .</p><p>Denote sample sets w j = { z 1 , z 2 , ⋯ , z j − 1 , z ′ j , z j + 1 , ⋯ , z m } for j ∈ { 1 , 2 , ⋯ , m } . We observe that</p><p>E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) ) = 1 m ∑ i = 1 m E z E A ( g z , A ( z i ) ) = 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ Pr A { g z , A ( z i ) ≥ t } d t</p><p>≤ 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ e ϵ Pr A { g w i , A ( z i ) ≥ t } d t = e ϵ 1 m ∑ i = 1 m E w i E z i E A ( g w i , A ( z i ) ) = e ϵ 1 m ∑ i = 1 m E w i , A E z i ( g w i , A ( z i ) ) = e ϵ 1 m ∑ i = 1 m E w i , A ∫ Z g w i , A ( z ) d ρ = e ϵ 1 m ∑ i = 1 m E z , A ∫ Z g z , A ( z ) d ρ = e ϵ E z , A ∫ Z g z , A ( z ) d ρ .</p><p>Then</p><p>E z , A ( 1 m ∑ i = 1 m g z , A ( z i ) − ∫ Z g z , A ( z ) d ρ ) ≤ ( e ϵ − 1 ) E z , A ( ∫ Z g z , A ( z ) d ρ ) ≤ 2 G ϵ .</p><p>On the other hand,</p><p>E z , A ∫ Z g z , A ( z ) d ρ = 1 m ∑ i = 1 m E z E A ∫ Z g z , A ( z ) d ρ = 1 m ∑ i = 1 m E w i E A ∫ Z g w i , A ( z ) d ρ = 1 m ∑ i = 1 m E w i E A ∫ Z g w i , A ( z i ) d ρ ( z i ) = 1 m ∑ i = 1 m E w i E z i E A ( g w i , A ( z i ) ) = 1 m ∑ i = 1 m E z E z ′ i ∫ 0 + ∞ Pr A { g w i , A ( z i ) ≥ t } d t ≤ 1 m ∑ i = 1 m E z E z ′ i e ϵ ∫ 0 + ∞ Pr A { g z , A ( z i ) ≥ t } d t = e ϵ 1 m ∑ i = 1 m E z E A ( g z , A ( z i ) ) = e ϵ E z , A 1 m g z , A ( z i ) .</p><p>This leads to</p><p>E z , A ( ∫ Z g z , A ( z ) d ρ − 1 m ∑ i = 1 m g z , A ( z i ) ) = ( e ϵ − 1 ) E z , A 1 m ∑ i = 1 m g z , A ( z j ) ≤ 2 G ϵ .</p><p>These verify our results.</p><p>Remark 1 Similar results are proposed in [<xref ref-type="bibr" rid="scirp.76021-ref6">6</xref>] and [<xref ref-type="bibr" rid="scirp.76021-ref7">7</xref>] . However, there the authors limits the function to take value in [ 0 , 1 ] or { 0 , 1 } , our result here extends theirs to the function taking value in ℝ + . This makes our following error analysis implementable.</p></sec><sec id="s3"><title>3. Differential Private Learning Algorithm</title><p>In this section we consider the differential private least squares regularization algorithm. For a Mercer kernel K defined on X &#215; X , the function space H K : = span { K ( x , ⋅ ) , x ∈ X } &#175; is the induced reproducing kernel Hilbert space (RKHS). Denote K x ( y ) = K ( x , y ) for any x , y ∈ X , and κ = sup x , y ∈ X K ( x , y ) . It is well known that f ( x ) = 〈 f , K x 〉 K as the reproducing property. In the sequel, we always assume | y | ≤ M for some constant M &gt; 0 . The least squares regularization algorithm, which has been extensively studied in such as [<xref ref-type="bibr" rid="scirp.76021-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref11">11</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref12">12</xref>] and etc. is:</p><p>f z , λ = arg min f ∈ H K 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 + λ ‖ f ‖ K 2 . (1)</p><p>Denote π as a projection operator as we did in [<xref ref-type="bibr" rid="scirp.76021-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref14">14</xref>] :</p><p>π ( f ( x ) ) = { M , f ( x ) &gt; M f ( x ) , − M ≤ f ( x ) ≤ M − M , f ( x ) &lt; − M .</p><p>Then we add a noise term b in the original algorithm (1) like the output perturbation algorithm in [<xref ref-type="bibr" rid="scirp.76021-ref3">3</xref>] :</p><p>f z , A ( x ) = π ( f z , λ ( x ) ) + b (2)</p><p>where the density of b is independent with z which will be clarified in the following analysis. Moreover, we take the following notation for simplicity:</p><p>E ( f ) = ∫ Z ( f ( x ) − y ) 2 d ρ , E z ( f ) = 1 m ∑ i = 1 m ( f ( x i ) − y i ) 2 .</p><p>Definition 2 We denote Δ f z as the maximum infinite norm of difference when changing one sample point in z , i.e., if d ( z , z ′ ) = 1 ,</p><p>Δ f z = sup z , z ′ ‖ f z − f z ′ ‖ ∞ .</p><p>Then we have the following result:</p><p>Lemma 1 Assume Δ π ( f z , λ ( x ) ) is bounded, and b has density function</p><p>proportion to exp { − ϵ | b | Δ π ( f z , λ ) } , then algorithm (2) provides ϵ -differential</p><p>privacy.</p><p>The proof is just as Theorem 4 in [<xref ref-type="bibr" rid="scirp.76021-ref15">15</xref>] . For all possible function r , and z , z ′ differ in one element, then</p><p>Pr { f z , A = r } = Pr b { b = r − π ( f z , λ ) } ∝ exp ( − ϵ ‖ r − π ( f z , λ ) ‖ ∞ Δ π ( f z , λ ) ) ,</p><p>and</p><p>Pr { f z ′ , A = r } = Pr b { b = r − π ( f z ′ , λ ) } ∝ exp ( − ϵ ‖ r − π ( f z ′ , λ ) ‖ ∞ Δ π ( f z ′ , λ ) ) .</p><p>So</p><p>Pr { f z , A = r } ≤ Pr { π ( f z ′ , A ) = r } &#215; e ϵ ‖ π ( f z , λ ) − π ( f z ′ , λ ) ‖ ∞ Δ π ( f z , λ ) ≤ e ϵ Pr { f z ′ , A = r } .</p><p>Then the lemma is proved by a union bound.</p><p>Now we will bound the term Δ f z , λ .</p><p>Lemma 2 For the function f z , λ obtained from algorithm (1), assume ‖ f z , λ ‖ K ≤ R for any z ∈ Z m for some R ≥ M , and 0 &lt; λ ≤ 1 , we have</p><p>Δ f z , λ ≤ 2 R κ 2 ( κ + 1 ) λ m .</p><p>Assume f z , λ and f z ′ , λ are two results derived via algorithm (1) given any sample set z , z ′ satisfying d ( z , z ′ ) = 1 . Without loss of generality, we set z ′ = ( z 1 , z 2 , ⋯ , z m − 1 , z m ′ ) . Since the two functions are both the optimizer of algorithm (1), take derivative for f we have</p><p>2 m ∑ i = 1 m ( f z , λ ( x i ) − y i ) K x i + 2 λ f z , λ = 0</p><p>and</p><p>2 m ∑ i = 1 m − 1 ( f z ′ , λ ( x i ) − y i ) K x i + 2 m ( f z ′ , λ ( x ′ m ) − y ′ m ) K x m + 2 λ f z ′ , λ = 0.</p><p>These lead to</p><p>1 m ∑ i = 1 m ( f z , λ ( x i ) − f z ′ , λ ( x i ) ) K x i + λ ( f z , λ − f z ′ , λ ) = 1 m [ ( f z ′ , λ ( x ′ m ) − y ′ m ) K x ′ m − ( f z , λ ( x m ) − y m ) K x m ] .</p><p>Take inner product with f z , λ − f z ′ , λ by both sides we have</p><p>1 m ∑ i = 1 m ( f z , λ ( x i ) − f z ′ , λ ( x i ) ) 2 + λ ‖ f z , λ − f z ′ , λ ‖ K 2 = 1 m [ ( f z ′ , λ ( x ′ m ) − y ′ m ) ( f z , λ ( x ′ m ) − f z ′ , λ ( x ′ m ) ) − ( f z , λ ( x m ) − y m ) ( f z , λ ( x m ) − f z ′ , λ ( x m ) ) ] .</p><p>This means</p><p>λ ‖ f z , λ − f z ′ , λ ‖ K 2 ≤ 1 m [ | f z ′ , λ ( x ′ m ) − y ′ m | + | f z , λ ( x m ) − y m | ] ⋅ ‖ f z , λ − f z ′ , λ ‖ ∞ ≤ 1 m ( ‖ f z ′ , λ ‖ ∞ + ‖ f z , λ ‖ ∞ + 2 M ) κ ‖ f z , λ − f z ′ , λ ‖ K .</p><p>The last inequality is from the fact that</p><p>‖ f ‖ ∞ = sup x ∈ X f ( x ) = sup x ∈ X 〈 f , K x 〉 K ≤ ‖ K x ‖ K ⋅ ‖ f ‖ K ≤ κ ‖ f ‖ K .</p><p>Since ‖ f z , λ ‖ K ≤ R , then ‖ f z ′ , λ ‖ K ≤ R as well. Therefore,</p><p>‖ f z , λ − f z ′ , λ ‖ K ≤ 1 λ m ( 2 R κ + 2 M ) κ ≤ 2 R κ ( κ + 1 ) λ m</p><p>for any 0 &lt; λ ≤ 1 . So</p><p>‖ f z , λ − f z ′ , λ ‖ ∞ ≤ 2 R κ 2 ( κ + 1 ) λ m</p><p>for any z , z ′ , and our lemma holds.</p><p>It can be easily verified by discussion that</p><p>‖ π ( f z , λ ) − π ( f z ′ , λ ) ‖ ∞ ≤ ‖ f z , λ − f z ′ , λ ‖ ∞</p><p>for any z , z ′ , so we have the choice of noise b and the result for algorithm (2).</p><p>Proposition 1 Assume ‖ f z , λ ‖ K ≤ R for any z ∈ Z m for some R ≥ M , and b takes value in ( − ∞ , + ∞ ) , we choose the density of b to be</p><p>1 α exp ( − λ m ϵ | b | 2 R κ 2 ( κ + 1 ) ) , where α = 4 R κ 2 ( κ + 1 ) λ m ϵ , then the algorithm (2) pro-</p><p>vides ϵ -differential privacy.</p><p>The proof is by combining the two lemmas and the inequality above. And by simply calculation we can get the expression of α .</p></sec><sec id="s4"><title>4. Error Analysis for Differential Private Learning Algorithm</title><p>In this section, we will study the expectation of the error between E ( f z , A ) − E ( f ρ ) , where f ρ = ∫ Y y d ρ ( y | x ) is the regression function which minimizes E ( f ) . Firstly we shall introduce the error decomposition:</p><p>E ( f z , A ) − E ( f ρ ) ≤ E ( f z , A ) − E ( f ρ ) + λ ‖ f z , λ ‖ K 2 ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) )     + E z ( π ( f z , λ ) ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) )     + E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ≤ E ( f z , A ) − E z ( f z , A ) + E z ( f z , A ) − E z ( π ( f z , λ ) )     + E z ( f λ ) + λ ‖ f λ ‖ K 2 − E ( f ρ ) ≤ R 1 + R 2 + S + D ( λ ) , (3)</p><p>where f λ is a function in H K to be determined and</p><p>R 1 = E ( f z , A ) − E z ( f z , A ) ,</p><p>R 2 = E z ( f z , A ) − E z ( π ( f z , λ ) ) ,</p><p>S = E z ( f λ ) − E ( f λ ) ,</p><p>D ( λ ) = E ( f λ ) − E ( f ρ ) + λ ‖ f λ ‖ K 2 .</p><p>Here R 1 and R 2 involve the function f z , A from random algorithm (2) so we call them random errors. S and D ( λ ) are similar as classical ones in the past literature in learning theory and we still call them sample error and approximation error. In the following, we will study these errors respectively.</p><sec id="s4_1"><title>4.1. Error Bounds for Random Errors</title><p>Proposition 2 For function f z , A obtained from algorithm (2) with density of b as described in Proposition 1, we have</p><p>E z , A R 1 ≤ 8 ϵ ( 2 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + M 2 ) .</p><p>Note that</p><p>R 1 = ∫ Z ( f z , A ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f z , A ( x i ) − y i ) 2 ,</p><p>analogous analysis to the proof of Theorem 1 tells us that</p><p>E z , A ( ∫ Z ( f z , A ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f z , A ( x i ) − | y i ) 2 ) ≤ ( e ϵ − 1 ) E z E A 1 m ∑ i = 1 m ( π ( f z , λ ( x i ) ) + b − y i ) 2 d ρ = 2 ϵ E z E b ( b 2 + b ( π ( f z , λ ( x i ) ) − y i ) + ( π ( f z , λ ( x i ) ) − y i ) 2 ) ≤ 2 ϵ ( 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 4 M 2 ) ,</p><p>which verifies the proposition.</p><p>For the term R 2 , we have the same analysis.</p><p>Proposition 3 For function f z , A obtained from algorithm (2) with density of b as described in Proposition 1, we have</p><p>E z , A R 2 ≤ 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 .</p><p>Since</p><p>R 2 = E z ( f z , A ) − E z ( π ( f z , λ ) ) = 1 m ∑ i = 1 m [ ( f z , A ( x i ) − y i ) 2 − ( π ( f z , λ ( x i ) ) − y i ) 2 ] = 1 m ∑ i = 1 m b ( b + 2 π ( f z , λ ( x i ) ) − 2 y i ) = b 2 + 2 b ⋅ 1 m ∑ i = 1 m ( π ( f z , λ ( x i ) ) − y i ) ,</p><p>we have</p><p>E z , A R 2 = E z E b b 2 ≤ 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 .</p><p>And the proposition is proved.</p></sec><sec id="s4_2"><title>4.2. Error Estimates for Sample Error and Approximation Error</title><p>Error estimates for sample error and approximation error have been extensively studied since [<xref ref-type="bibr" rid="scirp.76021-ref8">8</xref>] . Here we provide the proof for completeness. It is known that f λ in the error decomposition (3) can be arbitrarily chosen in H K in [<xref ref-type="bibr" rid="scirp.76021-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref14">14</xref>] and etc. Here we simply choose it to be the classical one</p><p>f λ = arg min f ∈ H K E ( f ) + λ ‖ f ‖ K 2 .</p><p>From [<xref ref-type="bibr" rid="scirp.76021-ref16">16</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref17">17</xref>] we have the expression of f λ is</p><p>f λ = ( L K + λ ) − 1 L K f ρ ,</p><p>where L K is the operator defined on L ρ X 2 as</p><p>L K f ( t ) = ∫ X f ( x ) K ( x , t ) d ρ X .</p><p>[<xref ref-type="bibr" rid="scirp.76021-ref8">8</xref>] told us that L K has a eigenvalue sequence { μ i } i ≥ 1 satisfies μ i &gt; 0 μ i → 0 when i → ∞ , and ‖ L K ‖ ≤ κ 2 . Now we recall the Hoeffding inequality [<xref ref-type="bibr" rid="scirp.76021-ref18">18</xref>] .</p><p>Lemma 3 Let ξ be a random variable on a probability space Z satisfying | ξ ( z ) − E ξ | ≤ B for some B &gt; 0 for almost all z ∈ Z , then</p><p>Pr { | 1 m ∑ i = 1 m ξ ( z i ) − E ξ ≥ ε | } ≤ 2 exp { − m ε 2 2 B 2 } .</p><p>Then we have the following analysis.</p><p>Proposition 4 For f λ and f ρ defined as above, assume f ρ ∈ L K r ( L ρ X 2 ) , we have</p><p>E z , A S + D ( λ ) ≤ 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>Firstly we bound the sample error.</p><p>S = E ( f λ ) − E z ( f λ ) = ∫ Z ( f λ ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f λ ( x i ) − y i ) 2 .</p><p>Let ξ ( z ) = − ( f λ ( x ) − y ) 2 , since | f ρ ( x ) | = | ∫ Y y d ρ ( y | x ) | ≤ M , and</p><p>‖ f λ ‖ ∞ = ‖ ( L K + λ I ) − 1 L K f ρ ‖ ∞ ≤ ‖ ( L K + λ I ) − 1 L K ‖ ⋅ ‖ f ρ ‖ ∞ ≤ M ,</p><p>we have | ξ − E ξ | ≤ 8 M 2 . So from Hoeffding inequality there holds</p><p>Pr z { | ∫ Z ( f λ ( x ) − y ) 2 d ρ − 1 m ∑ i = 1 m ( f λ ( x i ) − y i ) 2 | ≥ ε } ≤ 2 exp { − m ε 2 128 M 4 } .</p><p>Then we have</p><p>E z , A S ≤ E z | S | = ∫ 0 + ∞ Pr z { | S | ≥ t } d t = ∫ 0 + ∞ 2 exp { − m t 2 128 M 4 } d t ≤ 8 2 π M 2 m .</p><p>For the approximation error, note that E ( f λ ) − E ( f ρ ) = ‖ f λ − f ρ ‖ ρ 2 [<xref ref-type="bibr" rid="scirp.76021-ref9">9</xref>]</p><p>which is independent with z and b , we have</p><p>E z , A E ( f λ ) − E ( f ρ ) = ‖ f λ − f ρ ‖ ρ 2 = ‖ ( L K + λ I ) − 1 ( L K − ( L K + λ I ) ) f ρ ‖ ρ 2 = λ 2 ‖ ( L K + λ I ) − 1 L K r L K − r f ρ ‖ ρ 2 ≤ λ 2 ‖ ( L K + λ I ) − 1 L K r ‖ 2 ‖ L K − r f ρ ‖ ρ 2 ≤ { λ 2 r ‖ L K − r f ρ ‖ ρ 2 , r ≤ 1 λ 2 κ 4 ( r − 1 ) ‖ L K − r f ρ ‖ ρ 2 , r &gt; 1 ≤ λ min { 2 r , 2 } ( κ 4 ( r − 1 ) + 1 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>On the other hand, in [<xref ref-type="bibr" rid="scirp.76021-ref8">8</xref>] , the authors pointed out that ‖ f ‖ K = ‖ L K − 1 2 f ‖ ρ for</p><p>any f ∈ H K . So</p><p>E z , A λ ‖ f λ ‖ K 2 = λ ‖ ( L K + λ I ) − 1 L K f ρ ‖ K 2 = λ ‖ ( L K + λ I ) − 1 L K 1 2 f ρ ‖ ρ 2 ≤ λ ‖ ( L K + λ I ) − 1 L K 1 2 + r ‖ 2 ⋅ ‖ L K − r f ρ ‖ ρ 2 ≤ { λ 2 r ‖ L K − r f ρ ‖ ρ 2 , r ≤ 1 2 λ ⋅ κ 4 r − 2 ‖ L K − r f ρ ‖ ρ 2 , r &gt; 1 2 ≤ λ min { 2 r , 1 } ( κ 4 r − 2 + 1 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>Combining the 3 bounds above, we can verify the proposition.</p></sec><sec id="s4_3"><title>4.3. Convergence Result with Fixed ϵ</title><p>In our analysis for E z , A R 1 above, we indeed have the following result</p><p>E z , A R 1 ≤ 16 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ + 2 ϵ E z ε z ( π ( f z , λ ) ) .</p><p>Therefore, the error decomposition can be</p><p>E z , A ( E ( f z , A ) − ( 1 + 2 ϵ ) E ( f ρ ) ) = E z , A ( R 1 + R 2 + S + D ( λ ) − 2 ϵ E ( f ρ ) ) ≤ 16 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ + 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + E z 2 ϵ ( E z ( π ( f z , λ ) ) − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 2 ϵ E z ( E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 2 ϵ E z ( E z ( f λ ) + λ ‖ f λ ‖ K 2 − E ( f ρ ) ) + E z ( S + D ( λ ) ) ≤ 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + ( 1 + 2 ϵ ) E z ( S + D ( λ ) ) ≤ 24 M 2 κ 4 ( κ + 1 ) 2 λ 3 m 2 ϵ 2 + 3 2 π M 2 ( 1 + 2 ϵ ) m + λ min { 1 , 2 r } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>Then by choosing λ = ( 1 m ) 2 / min { 4 , 3 + 2 r } for balance we have the following</p><p>result.</p><p>Theorem 2 Let f z , A derived from algorithm (2), f z , λ , f λ defined in the</p><p>above subsections, and assume f ρ ∈ L K r ( L ρ X 2 ) , take λ = ( 1 m ) 2 / min { 4 , 3 + 2 r } ,</p><p>there holds</p><p>E z , A ( E ( f z , A ) − ( 1 + 2 ϵ ) E ( f ρ ) ) ≤ C ϵ ( 1 m ) min { 1 2 , 4 r 3 + 2 r } ,</p><p>where constant</p><p>C ϵ = 24 M 2 κ 4 ( κ + 1 ) 2 ϵ 2 + 8 2 π M 2 ( 1 + 2 ϵ ) + ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .</p></sec><sec id="s4_4"><title>4.4. Selection of ϵ and Total Error Bound</title><p>From the analysis for random error, sample error and approximation error above, we can obtain the whole error bound as follow.</p><p>Theorem 3 Let f z , A derived from algorithm (2), f z , λ , f λ defined in the</p><p>above subsections, and assume f ρ ∈ L K r ( L ρ X 2 ) , take</p><p>λ = ( 1 m ϵ ) 2 / min { 4 , 3 + 2 r } ,</p><p>and</p><p>ϵ = ( 1 m ) min { 1 / 3 , 4 r / ( 3 + 6 r ) }</p><p>we have</p><p>E z , A ( E ( f z , A ) − E ( f ρ ) ) ≤ C ˜ ( 1 m ) min { 1 3 , 4 r 3 + 6 r } ,</p><p>where constant</p><p>C ˜ = 8 ( 1 + 2 π ) M 2 + 24 M 2 κ 4 ( κ + 1 ) 2             + ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>It can be seen from error decomposition (3) that</p><p>E z , A ( E ( f z , A ) − E ( f ρ ) ) ≤ E z , A ( E ( f z , A ) − E ( f ρ ) + λ ‖ f z , λ ‖ K 2 ) ≤ E z , A ( R 1 + R 2 + S + D ( λ ) ) ≤ 8 ϵ ( 2 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + M 2 ) + 8 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 ≤ 8 M 2 ϵ + 24 R 2 κ 4 ( κ + 1 ) 2 λ 2 m 2 ϵ 2 + 8 2 π M 2 m + λ min { 2 r , 1 } ( κ 4 r − 2 + κ 4 r − 4 + 2 ) ‖ L K − r f ρ ‖ ρ 2 .</p><p>Since λ ‖ f z , λ ‖ K 2 ≤ E z ( f z , λ ) + λ ‖ f z , λ ‖ K 2 ≤ E z ( 0 ) ≤ M 2 , we have ‖ f z , λ ‖ K ≤ M λ , i.e., we can choose R = M λ . Now take λ = ( 1 m ϵ ) 2 / min { 4 , 3 + 2 r } and ϵ = ( 1 m ) min { 1 / 3 , 4 r / ( 3 + 6 r ) } for balance, and the result is proved.</p></sec></sec><sec id="s5"><title>5. Conclusions</title><p>Theorem 2, where ϵ is taken as a constant, reveals that the generalization error E ( π ( f z , A ) ) converges not to the one of regression function E ( f ρ ) , but a little different one ( 1 + 2 ϵ ) E ( f ρ ) in expectation.</p><p>It can be seen from the definition of differential privacy that algorithms will provide more privacy when ϵ tends to 0. However, Theorem 3 shows that ϵ cannot be too small, since the expected error will be very large accordingly. Hence our choice can be regarded as a balance between privacy protection and the expected error. In [<xref ref-type="bibr" rid="scirp.76021-ref19">19</xref>] , the authors announce that ϵ also needs tend to 0 in some rates to keep generalization which matches our result.</p><p>Compared with previous learning theory results [<xref ref-type="bibr" rid="scirp.76021-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref20">20</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref21">21</xref>] [<xref ref-type="bibr" rid="scirp.76021-ref22">22</xref>] and etc., our learning rate is not so good since a perturbation term is introduced. However, in our result Theorem 1, we did not need a capacity condition as what we did in classical error analysis, i.e., conditions on covering numbers, VC or Vg dimensions. Instead the ϵ -differential private condition is adopted. So it may be capable and interesting for us to apply such condition to other learning algorithms.</p></sec><sec id="s6"><title>Acknowledgements</title><p>This work is supported by NSFC (Nos. 11326096, 11401247), NSF of Guangdong Province in China (No. 2015A030313674), National Social Science Fund in China (No. 15BTJ024), Planning Fund Project of Humanities and Social Science Research in Chinese Ministry of Education (No. 14YJAZH040), Foundation for Distinguished Young Talents in Higher Education of Guangdong, China (No. 2016KQNCX162) and the Major Incubation Research Project of Huizhou University (No. hzux1201619).</p></sec><sec id="s7"><title>Cite this paper</title><p>Nie, W.L. and Wang, C. (2017) Error Analysis and Variable Selection for Differential Private Learning Algorithm. Journal of Applied Mathematics and Physics, 5, 900-911. https://doi.org/10.4236/jamp.2017.54079</p></sec></body><back><ref-list><title>References</title><ref id="scirp.76021-ref1"><label>1</label><mixed-citation publication-type="book" xlink:type="simple">Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006) Calibrating Noise to Sensitivity in Private Data Analysis. In: Halevi, S. and Rabin, T., Eds., Theory of Cryptography, Springer, Berlin, 265-284.</mixed-citation></ref><ref id="scirp.76021-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">McSherry, F. and Talwar, K. (2007) Mechanism Design via Differential Privacy. Proceedings of the 48th Annual Symposium on Foundations of Computer Science, Providence, 21-23 October 2007, 94-103. https://doi.org/10.1109/focs.2007.66</mixed-citation></ref><ref id="scirp.76021-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Chaudhuri, K., Monteleoni, C. and Sarwate, A.D. (2011) Differentially Private Empirical Risk Minimization. Journal of Machine Learning Research, 12, 1069-1109.</mixed-citation></ref><ref id="scirp.76021-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Jain, P. and Thakurta, A.G. (2013) Differentially Private Learning with Kernels. JMLR: Workshop and Conference Proceedings, 28, 118-126.</mixed-citation></ref><ref id="scirp.76021-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Jain, P. and Thakurta, A.G. (2014) Dimension Independent Risk Bounds for Differentially Private Learning. Proceedings of the 31st International Conference on Machine Learning, Beijing, 21-26 June 2014, 476-484.</mixed-citation></ref><ref id="scirp.76021-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O. and Roth, A. (2015) Preserving Statistical Validity in Adaptive Data Analysis. ACM Symposium on the Theory of Computing, Portland, 14-17 June 2015, 117-126. https://doi.org/10.1145/2746539.2746580</mixed-citation></ref><ref id="scirp.76021-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Bassily, R., Nissim, K., Smith, A., Steinke, T., Stemmer, U. and Ullman, J. (2015) Algorithmic Stability for Adaptive Data Analysis.</mixed-citation></ref><ref id="scirp.76021-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Cucker, F. and Smale, S. (2002) On the Mathematical Foundations of Learning. Bulletin of the AMS, 39, 1-49. https://doi.org/10.1090/S0273-0979-01-00923-5</mixed-citation></ref><ref id="scirp.76021-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Cucker, F. and Zhou, D.X. (2007) Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511618796</mixed-citation></ref><ref id="scirp.76021-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Dwork, C. (2008) Differential Privacy: A Survey of Results. International Conference on Theory and Applications of Models of Computation, Xi’an, 25-29 April 2008, 1-19.</mixed-citation></ref><ref id="scirp.76021-ref11"><label>11</label><mixed-citation publication-type="book" xlink:type="simple">Steinwart, I., Hush, D. and Scovel, C. (2009) Optimal Rates for Regularized Least Squares Regression. In: Dasgupta, S. and Klivans, A., Eds., Proceedings of the 22nd Annual Conference on Learning Theory, Montreal, 18-21 June 2009, 79-93.</mixed-citation></ref><ref id="scirp.76021-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Wu, Q., Ying, Y. and Zhou, D.X. (2006) Learning Rates of Least-Square Regularized Regression. Foundations of Computational Mathematics, 6, 171-192. https://doi.org/10.1007/s10208-004-0155-9</mixed-citation></ref><ref id="scirp.76021-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Nie, W.L. and Wang, C. (2015) Constructive Analysis for Coefficient Regularization Regression Algorithms. Journal of Mathematical Analysis and Applications, 431, 1153-1171.</mixed-citation></ref><ref id="scirp.76021-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Wang, C. and Nie, W.L. (2014) Constructive Analysis for Least Squares Regression with Generalized K-Norm Regularization. Abstract and Applied Analysis, 2014, Article ID: 458459. https://doi.org/10.1155/2014/458459</mixed-citation></ref><ref id="scirp.76021-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Dwork, C. (2006) Differential Privacy. Springer, Berlin, 1-12.</mixed-citation></ref><ref id="scirp.76021-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Smale, S. and Zhou, D.X. (2003) Estimating the Approximation Error in Learning Theory. Analysis and Applications, 1, 17-41. https://doi.org/10.1142/S0219530503000089</mixed-citation></ref><ref id="scirp.76021-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Smale, S. and Zhou, D.X. (2007) Learning Theory Estimates via Integral Operators and Their Applications. Constructive Approximation, 26, 153-172. https://doi.org/10.1007/s00365-006-0659-y</mixed-citation></ref><ref id="scirp.76021-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Hoeffding, W. (1963) Probability Inequalities for Sums of Bounded Random Variables. Journal of the American Statistical Association, 58, 13-30. https://doi.org/10.1080/01621459.1963.10500830</mixed-citation></ref><ref id="scirp.76021-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Y.-X., Lei, J. and Fienberg, S.E. (2015) Learning with Differential Privacy: Stability, Learn Ability and the Sufficiency and Necessity of ERM Principle.</mixed-citation></ref><ref id="scirp.76021-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Wang, C. and Zhou, D.X. (2011) Optimal Learning Rates for Least Squares Regularized Regression with Unbounded Sampling. Journal of Complexity, 27, 55-67.</mixed-citation></ref><ref id="scirp.76021-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Hu, T., Fan, J., Wu, Q. and Zhou, D.X. (2015) Regularization Schemes for Minimum Error Entropy Principle. Analysis and Applications, 13, 437-455. https://doi.org/10.1142/S0219530514500110</mixed-citation></ref><ref id="scirp.76021-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Christmann, A. and Zhou, D.X. (2016) Learning Rates for the Risk of Kernel-Based Quantile Regression Estimators in Additive Models. Analysis and Applications, 14, 449-477. https://doi.org/10.1142/S0219530515500050</mixed-citation></ref></ref-list></back></article>