1. Introduction

JAMP

Journal of Applied Mathematics and Physics

2327-4352

Scientific Research Publishing

10.4236/jamp.2024.124069

JAMP-132553

Articles

Physics&Mathematics

Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach

Xuexue

Ran

¹^*Baohuai

Sheng

Department of Mathematics, Shaoxing University, Shaoxing, China

11042024

1204110111257, March 202416, April 2024 19, April 2024

2014

This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/

We provide a kernel-regularized method to give theory solutions for Neumann boundary value problem on the unit ball. We define the reproducing kernel Hilbert space with the spherical harmonics associated with an inner product defined on both the unit ball and the unit sphere, construct the kernel-regularized learning algorithm from the view of semi-supervised learning and bound the upper bounds for the learning rates. The theory analysis shows that the learning algorithm has better uniform convergence according to the number of samples. The research can be regarded as an application of kernel-regularized semi-supervised learning.

Neumann Boundary Value Kernel-Regularized Approach Reproducing Kernel Hilbert Space The Unit Ball The Unit Sphere

1. Introduction

It is known that approximation theory and skills have been used to give the analytic solution for PDE with boundary value problems and form the method of fundamental solutions (see e.g. [1] - [6] ; Appendices 1-3). Recently, the kernel-based collocation method for solving several PDEs with boundary problems has been developed (see e.g. [7] [8] [9] [10] [11] ) from the view of minimal norm interpolation of reproducing kernel Hilbert spaces (RKHSs), the existent theorem and the representation theorem for the numerical solutions are shown qualitatively. It is suggested by [12] that kernel-regularized gradient learning may be used to give numerical solution for PDE. Indeed, some kernel-regularized learning algorithms have been used to study the PDE with Dirichlet boundary value problem quantitatively (see e.g. [13] [14] [15] [16] ). For a given domain D ⊂ R d , the pairwise distinct collocation points chosen in the kernel-based collocation method are:

X D = { x 1 , ⋯ , x N } ⊂ D ,

X ∂ D = { x N + 1 , ⋯ , x N + M } ⊂ ∂ D ,

and the sample values from the PDE (where P and B are given differential operators):

{ P u = f * , in D ⊂ R d , B u = g , on ∂ D (1)

at the given collocation, points are (see [7] [11] ):

y j = f * ( x j ) + η x j , j = 1,2, ⋯ , N ,

y k = g ( x N + k ) , k = 1 , 2 , ⋯ , M ,

where the random variable η → : = ( η x 1 , ⋯ , η x N ) Τ ~ N ( 0 → , Ψ → ∂ D ) .

There are two typical PDE problems (see Chapter 1 of [17] ). When P = ∇ = ∑ j = 1 d ∂ 2 ∂ x j 2 is the Laplace operator and B = I is the unit operator, we have the Dirichlet problem:

{ Δ u = f * , in D ⊂ R d , u = g , on ∂ D (2)

When P = I is the unit operator and B = ∂ ∂ n → is the directional derivative along the outward normal vector n → , i.e. B u = ∂ u ∂ n → = ∇ u ⋅ n → , problem (1) become the Neumann boundary problem:

{ u = f * , in D ⊂ R d , ∂ u ∂ n → = g , on ∂ D . (3)

The observations { ( x j , y j ) } j = 1 N can be regarded as the supervised labeled observations in the setting of semi-supervised learning and y k ( k = 1,2, ⋯ , M ) maybe regarded as the unlabeled ones. This similarity encourages us to construct kernel-regularized learning algorithms to give numerical solution for problem (3) referring to the kernel semi-supervised learning frameworks (see e.g. [18] [19] [20] [21] ). Along this line, we constructed in [15] a kind of kernel-regularized learning algorithm for solving problem (2) and showed the convergence rate. In the present paper, we shall construct a kind of kernel-regularized regression algorithm to solve problem (2) in case that D is the unit ball and ∂ D is the unit sphere. To this aim, we restate problem (3) in the setting of Sobolev spaces.

Let Ω ⊂ R d be a given bounded closed domain with a smoothness surface (i.e. the outward normal derivative is continuous) and ρ Ω be a Borel measure on Ω. Let C ( s ) ( Ω ) denote the set of functions such that ∂ x α f ( x ) ∈ C ( Ω ) and | α | ≤ s , where for α = ( α 1 , ⋯ , α d ) ∈ Z + d we define | α | = ∑ i = 1 d α i and

∂ α f ( x ) : = ∂ x α f ( x ) = ∂ 1 α 1 ⋯ ∂ d α d f ( x ) = ∂ | α | f ( x ) ∂ ( x 1 ) α 1 ⋯ ( x d ) α d .

For a K ∈ C ( s ) ( Ω × Ω ) , we define:

∂ x α ∂ y β K ( x , y ) = ∂ | α | + | β | K ( x , y ) ∂ ( x 1 ) α 1 ⋯ ( x d ) α d ∂ ( y 1 ) β 1 ⋯ ( y d ) β d , x = ( x 1 , ⋯ , x d ) , y = ( y 1 , ⋯ , y d ) .

Denoted by W 1 ( ρ Ω ) , the set of all functions whose 1-order partial derivatives are all in L 2 ( ρ Ω ) , i.e.

W 1 ( ρ Ω ) = { f : ‖ f ‖ W 1 ( ρ Ω ) = ( ∑ | α | ≤ 1 ‖ D α f ‖ 2, ρ Ω 2 ) 1 / 2 < + ∞ } ,

and L 2 ( ρ Ω ) = { f ( x ) : ‖ f ‖ 2, ρ Ω = ( ∫ Ω | f ( x ) | 2 d ρ Ω ) 1 2 < + ∞ } . Defined by ∂ ∂ n → , the outward normal derivative operator, i.e. ∂ f ( x ) ∂ n → : = ∂ x f ( x ) ∂ n → = ∇ f ( x ) ⋅ n → , where ∇ f ( x ) = ( ∂ f ∂ x 1 , ⋯ , ∂ f ∂ x d ) and n → is the outward normal vector at x = { x 1 , ⋯ , x d } ∈ ∂ Ω .

To borrow the setting of learning theory (see [22] ), we rewrite the problem (3). Let f * be an unknown function and g be a given function. Then, the collocation points z = { x 1 , ⋯ , x m } ⊂ i n t Ω and ν = { x m + 1 , ⋯ , x m + u } ⊂ ∂ Ω for the Neumann boundary problem:

{ f ( x ) = f * ( x ) , x ∈ Ω , ∂ x f ( x ) ∂ n → = g ( x ) , x ∈ ∂ Ω (4)

are scattered points with values y x i = f * ( x i ) + η x i , i = 1,2, ⋯ , m , and y x i = g ( x i ) , i = m + 1, m + 2, ⋯ , m + u , where for a given x i = ( x i 1 , ⋯ , x i d ) ∈ Ω , ξ x i is a random variable subject to a condition distribution ρ ( y | x i ) satisfying | ρ ( y | x i ) | ≤ B , B is a given constant number, E ρ ( ⋅ | x ) ( η x ) = ∫ [ − B , B ] η x ( y ) d ρ ( y | x ) = 0 and σ = ( ∑ i = 1 m σ x i 2 ) 1 2 < + ∞ and σ x 2 = E ρ ( ⋅ | x ) ( η x 2 ) . The correspondence of (4) is:

{ f ( x i ) = y x i , i = 1,2, ⋯ , m , ∂ x f ( x ) ∂ n → | x = x i = g ( x i ) , i = m + 1, ⋯ , m + u (5)

We shall give an investigation on the convergence analysis of problem (5) when Ω is the unit ball B d = { x ∈ R d : ‖ x ‖ ≤ 1 } and ∂ Ω is the unit sphere S d − 1 = { x ∈ R d : ‖ x ‖ = 1 } . The paper is organized as follows. In Section 2.1, we shall provide some notions on the kernel-regularized regression learning, which contain the concept of reproducing kernel Hilbert spaces, the concept of kernel-regularized regression learning model and the kernel-regularized semi-supervised regression learning model. In Section 2.2, we shall provide some notions and results on spherical analysis. In particular, we shall define an RKHS with spherical harmonics, which have the reproducing property for the outward normal vector operator. The Neumann boundary value problem with the RKHS as the hypothesis space is defined. Based on these notions, we define in Section 2.3 a kind of kernel-regularized learning algorithm for solving the Neumann boundary value problem, show the representation theorem and give the error decomposition, with which we give the learning rates (i.e. the main Theorem 2.1) in Section 2.4. In Section 3, we shall give some lemmas, which will be used to show Theorem 2.1 in Section 4. Section 5 is the appendices containing some knowledge about the convex function, a kind of RKHS H K n → ( ρ Ω ) defined in a Sobolev space H n → ( ρ Ω ) associating a general domain Ω and its boundary ∂ Ω , and a probability inequality defined on a general RKHS.

Throughout the paper, we denote by A = O ( B ) the fact that there is a constant C independent of A and B such that A ≤ C B . We say a ~ B if both A = O ( B ) and B = O ( A ) .

2. Kernel-Regularized Regression and Error Analysis

We first provide some notions and results about the kernel-regularized regression learning problem.

2.1. Notions and Results on Kernel-Regularized Regression

Follow all the definitions and notions in Section 1. Let K x ( y ) = K ( x , y ) : Ω × Ω → R be a Mercer kernel (i.e. it is continuous, symmetry and positive semi-definite, i.e. for any given integer l ≥ 1 and any given set { x 1 , x 2 , ⋯ , x l } ⊂ Ω , the matrix ( K ( x i , x j ) ) i , j = 1,2, ⋯ , l is positive definite) satisfying ( ∫ Ω × Ω | K ( x , y ) | 2 d ρ Ω ( x ) d ρ Ω ( y ) ) 1 2 < + ∞ . The reproducing kernel Hilbert space ( H K , ‖ ⋅ ‖ K ) associated with K ( x , y ) is a Hilbert space consisting of all the real functions defined on Ω such that:

f ( x ) = 〈 f , K x 〉 K , ∀ x ∈ Ω , ∀ f ∈ H K .

Define an operator L K ( f , x ) as:

L K ( f , x ) = ∫ Ω f ( y ) K x ( y ) d ρ Ω , x ∈ Ω .

Then, L : L 2 ( ρ Ω ) → L 2 ( ρ Ω ) . Denoted by λ k the k-th eigenvalue associated with eigenfunction φ k . Then, we have by the Mercer theorem that:

K ( x , y ) = ∑ l = 0 + ∞ λ l φ l ( x ) φ l ( y ) , x , y ∈ Ω ,

where we assume the convergence on the right side is absolute (for every x , y ∈ Ω ) and uniform on x , y ∈ Ω . If { φ k } k = 0 + ∞ forms an orthonormal system in L 2 ( ρ Ω ) , then:

H K = L K 1 2 ( L 2 ( ρ Ω ) ) = { f ( x ) = ∑ l = 0 + ∞ a l ( f ) φ l ( x ) : ‖ f ‖ K = ( ∑ l = 0 + ∞ | a l ( f ) | 2 λ l ) 1 2 < + ∞ } ,

where L K = L K 1 2 ∘ L K 1 2 .

Let z = { ( x k , y k ) } k = 1 m be a set of observations about an unknown function f * . Then, to obtain an good approximation of f * , one usually borrow the kernel-regularized regression learning model:

f z = arg min f ∈ H K ( E z ( f ) + λ ‖ f ‖ K 2 ) , (6)

where E z ( f ) = 1 m ∑ k = 1 m ( y k − f ( x k ) ) 2 is the empirical variance. In learning theory, the convergence analysis is sum up to bound the convergence rate for the error ( [23] or [24] ):

‖ f z − f * ‖ L 2 ( ρ Ω ) . (7)

When the observation set z has the form z = { ( x k , y k ) } k = 1 m ∪ { x m + l } l = 1 u , i.e. the nodes { x m + l } l = 1 u have no labeled observation values { y m + l } l = 1 u , we call this case the semi-supervised learning. In practical applications, most of the observations belong to semi-supervised samples since data is precious. Many mathematicians have paid their attentions to this field (see e.g. [18] [19] [21] [25] [26] [27] [28] ). The main ideas of dealing with this problem are add a term to make the use of unlabeled data, i.e. we need to modify (8) as the form of:

f z , λ , γ = arg min f ∈ H K ( E z ( f ) + γ Ω u , m ( f ) + λ ‖ f ‖ K 2 ) , γ > 0. (8)

For example, in [19] , one choose γ Ω u , m ( f ) = γ ( u + m ) 2 ∑ i , j = 1 l + u ( f ( x i ) − f ( x j ) ) 2 W i , j , where W i , j are edge weights in the data adjacency graph. Also, in [21] , one chooses:

γ Ω u , m ( f ) = γ ( m + u ) ( m + u − 1 ) ∑ i , j = 1, i ≠ j m + u w i j ( σ ) × ( f ( x i ) − f ( x j ) ) 2 , w i j ( σ ) = exp { − d K ( x i , x j ) σ } ,

where σ > 0 and

d K ( x , y ) = ‖ K x − K y ‖ K = K ( x , x ) + K ( y , y ) − 2 K ( x , y ) .

These choices encourage us to choose suitable γ Ω u , m ( f ) to give good approximation solution for problem (5).

2.2. Some Results on Spherical Analysis

In this subsection, we shall define a kind of RKHS with spherical harmonics, with which define a kernel-regularized regression learning algorithm for solving problem (5) when Ω is the unit ball B d and show the learning rates.

Let B d = { x ∈ R d : ‖ x ‖ ≤ 1 } denote the unit ball in d-dimensional Euclidean space R d with the usual inner product 〈 x , y 〉 , and ‖ x ‖ = 〈 x , x 〉 is the usual Euclidean norm. For weight W μ ( x ) = ( 1 − ‖ x ‖ 2 ) μ , μ > − 1 , we denote by L p , μ ( B d ) ≡ L p ( B d , W μ ) , 1 ≤ p < + ∞ , the space of measurable functions defined on B d with:

‖ f ‖ L p , μ ( B d ) = ( ∫ B d | f ( x ) | p W μ ( x ) d x ) 1 2 < + ∞ , 1 ≤ p < + ∞

and for p = + ∞ , we assume that L ∞ , μ denotes the space C ( B d ) of continuous functions on B d with the uniform norm.

We denote by Π n d the space of all polynomials in d variables of degree at most n, and by ν n d ( W μ ) the space of all polynomials of degree n which are orthogonal to polynomials of low degree in L 2, μ ( B d ) . The ν n d ( W μ ) is mutually orthogonal in L 2, μ ( B d ) and (see [29] ):

L 2, μ ( B d ) = ⊕ n = 0 ∞ ν n d ( W μ ) , Π n d = ⊕ k = 0 n ν k d ( W μ ) .

Let d σ denote the Lebesgue measure on S d − 1 = { x : ‖ x ‖ = 1 } and denote the area of S d − 1 by σ d , σ d = ∫ S d − 1 d σ = 2 π d / 2 / Γ ( d / 2 ) . Let H n d denote the space of homogeneous harmonic polynomials of degree n, which are homogeneous polynomials of degree n satisfying equation Δ p = ∑ k = 1 d ∂ 2 p ∂ ( x k ) 2 = 0 . Also, we denote by P n d the set of homogeneous polynomials of degree n. It is well known that:

a n d : = dim H n d = ( n + d − 1 n − 1 ) − ( n + d − 3 n − 2 ) .

Let W 1 ( B d ) μ denote the set of functions whose 1-th derivatives are all in L 2, μ ( B d ) , i.e.

W 1 ( B d ) μ = { f : ‖ f ‖ W 2 ( B d ) μ = ( ∑ | α | ≤ 1 ‖ ∂ α f ‖ L 2, μ ( B d ) 2 ) 1 / 2 < + ∞ } .

In this case, S d − 1 = { x : ‖ x ‖ 2 = ∑ i = 1 d ( x i ) 2 = 1 } and

∂ x f ∂ n → ( x ) = ∑ i = 1 d x i ∂ f ∂ x i ( x ) , x = ( x 1 , ⋯ , x d ) ∈ S d − 1 . (9)

Define a subclass of W 1 ( B d ) μ as:

H μ n → = { f ∈ W 1 ( B d ) μ : ‖ f ‖ H μ n → = ( ∫ B d | f ( x ) | 2 W μ ( x ) d x + ∫ S d − 1 | ∂ ξ f ∂ n → ( ξ ) | 2 d σ ( ξ ) ) 1 2 < + ∞ } .

An inner product defined on H μ n → is:

〈 f , g 〉 H μ n → = ∫ B d f ( x ) g ( x ) W μ ( x ) d x + ∫ S d − 1 ∂ f ∂ n → ( ξ ) ∂ g ∂ n → ( ξ ) d σ ( ξ ) .

Denoted by ν n d ( W μ , S ) the space of orthogonal polynomials with respect to 〈 ⋅ , ⋅ 〉 H μ n → . Then, by Theorem 3 of [30] , we know ν n d ( W μ , S ) contains a mutually orthonormal basis { Q n , k ≡ Q n , k d : k = 1,2, ⋯ , a n d } with respect to 〈 ⋅ , ⋅ 〉 H μ n → . Then, there holds the expansion:

f ( x ) ~ ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( f ) Q k , l ( x ) , x ∈ B d , f ∈ H μ n → ,

where a k , l ( f ) = 〈 f , Q k , l 〉 μ S and by the Bessel inequality, we have:

( ∑ k = 0 ∞ ∑ l = 1 d k d | a k , l ( f ) | 2 ) 1 2 ≤ ‖ f ‖ H μ n → < + ∞ .

Let K μ ( x , y ) : B d × B d → R be a Mercer kernel with the form:

K x μ ( y ) = K μ ( x , y ) : = ∑ k = 0 ∞ λ k P k ( x , y ) , x ∈ B d , y ∈ B d , (10)

where P k ( x , y ) = ∑ l = 1 a k d Q k , l ( x ) Q k , l ( y ) and ∑ k = 0 ∞ λ k c k < + ∞ with λ k > 0 being defined as sup x ∈ B d P k ( x , x ) = c k ( k = 0 , 1 , 2 , ⋯ ) .

Define

L K μ ( f ) ( x ) = L K μ ( f , x ) : = 〈 f , K x ( ⋅ ) 〉 H μ n → = ∑ k = 0 ∞ λ k ∑ l = 1 a k d a k , l ( f ) Q k , l ( x ) , x ∈ B d

and H K μ n → = L K μ 1 2 ( H μ n → ) . Then,

H K μ n → = { ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( f ) Q k , l ( x ) : ‖ f ‖ K μ = ( ∑ k = 0 ∞ ∑ l = 1 a k d | a k , l ( f ) | 2 λ k ) 1 2 < + ∞ } .

For f ( x ) = ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( f ) Q k , l ( x ) ∈ H K μ n → and g ( x ) = ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( g ) Q k , l ( x ) ∈ H K μ n → , we define:

〈 f , g 〉 K μ = ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( f ) a k , l ( g ) λ k .

Then, we shall show in Proposition 2.1 that H K μ n → is an RKHS associated with kernel (10).

To give quantitatively description for the kernel K μ , we give two assumptions.

Assumption A. Assume K μ ∈ C ( 1 ) ( B d × B d ) .

Assumption B. Assume sup x ∈ S d − 1 | ∂ x P k ( x , x ) ∂ n → | = c ′ k ( k = 0,1,2, ⋯ ) and

∑ k = 0 ∞ λ k c ′ k < + ∞ . (11)

Since { Q n , k ≡ Q n , k d : k = 1,2, ⋯ , a n d } are algebraic polynomials, c k and c ′ k must exist. The real numbers λ k satisfy (11) are also existent, for example, we can take λ k = e − ( c k + c ′ k ) ( k = 0,1,2, ⋯ ) .

If (11) holds, then by Theorem 2.4 in [31] , or Theorem 4.2 in [32] or Proposition 6.2 in [33] that:

∂ x α ( ∂ y β K x μ ( y ) ) = K μ ( x , y ) : = ∑ k = 0 ∞ λ k ∂ x α ( ∂ y β P k ( x , y ) ) , x ∈ S d − 1 , y ∈ S d − 1 , (12)

and the convergence in the right-side of (12) is absolute and uniform on S d − 1 × S d − 1 .

Proposition 2.1. Assume above Assumptions A and B hold. Then,

1) There holds the reproducing property:

f ( x ) = 〈 f , K x μ ( ⋅ ) 〉 K μ , f ∈ H K μ n → , x ∈ B d . (13)

2) There holds the reproducing property for the outward normal vector operator, i.e.

∂ x f ( x ) ∂ n → = 〈 f , ∂ x K x μ ( ⋅ ) ∂ n → 〉 K μ , f ∈ H K μ n → , x ∈ S d − 1 . (14)

3) Define

k = sup x ∈ B d ‖ K x μ ( ⋅ ) ‖ K μ + sup x ∈ S d − 1 ‖ ∂ x ∂ n → K x μ ( ⋅ ) ‖ K μ .

Then, for all x ∈ B d and y ∈ S d − 1 , we have:

| f ( x ) | ≤ k ‖ f ‖ K μ , | ∂ x f ( y ) ∂ n → | ≤ k ‖ f ‖ K μ . (15)

Proof. See the proofs in Section 4.

Let { x i } i = 1 m + l be observations drawn i.i.d. according to ρ B d , y i = f * ( x i ) + η x i , i = 1 , 2 , ⋯ , m and for a given x i ∈ B d ξ x i is a random variable subject to a condition distribution ρ B d ( y | x i ) satisfying | ρ ( y | x i ) | ≤ B (B is a given constant number), E ρ ( ⋅ | x ) ( η x ) = ∫ [ − B , B ] η x ( y ) d ρ B d ( y | x ) = 0 , σ = ( ∫ B d σ x 2 d ρ B d ) 1 2 < + ∞ and σ x 2 = E ρ B d ( ⋅ | x ) ( η x 2 ) . Then, z = { ( x i , y i ) } i = 1 m can be regarded as observations drawn i.i.d. according to ρ ( x , y ) = ρ B d ( x ) ρ B d ( y | x ) and { x i } i = m + 1 m + l be samples drawn i.i.d. according to ρ S d − 1 . The correspondence of problem (5) then is:

{ f ( x i ) = y x i , i = 1 , 2 , ⋯ , m , ∂ f ( x i ) ∂ n → = ∑ k = 1 d x i k ∂ f ( x i ) ∂ x k = g ( x i ) , i = m + 1 , ⋯ , m + l (16)

We shall give an investigation on the numerical solutions of problem (16) with kernel-regularized approaches. A kernel learning algorithm with H K μ n → being the hypothesis space will be defined in Section 2.2. The representation theorem for it is provided and an error decomposition for its error analysis is given, from which a learning rate for Algorithm (16) is shown.

2.3. Kernel-Regularized Regression

With above notions in hand, we now give following kernel-regularized learning algorithms for giving solutions for problem (16):

f z , λ = arg min f ∈ H K μ n → E z ( f ) + 1 l ∑ i = m m + l ( ∂ x f ( x i ) ∂ n → − g ( x i ) ) 2 + λ ‖ f ‖ K μ 2 , (17)

where l ~ m , i.e. there exist c 1 > 0 , c 2 > 0 such that c 1 ≤ l m ≤ c 2 and

E z ( f ) = 1 m ∑ i = 1 m ( f ( x i ) − y x i ) 2 .

Corresponding to (17), we define a model for the observations without the disturbances ξ x i by:

f X ¯ , λ = arg min f ∈ H K μ n → E X ¯ ( f ) + 1 l ∑ i = m m + l ( ∂ x f ( x i ) ∂ n → − g ( x i ) ) 2 + λ ‖ f ‖ K μ 2 , (18)

where

E X ¯ ( f ) = 1 m ∑ i = 1 m ( f ( x i ) − f * ( x i ) ) 2 .

The integral model for (18) is defined as:

f λ = arg min f ∈ H K μ n → E ( f ) + ∫ S d − 1 ( ∂ x f ( x ) ∂ n → − g ( x ) ) 2 d ρ S d − 1 + λ ‖ f ‖ K μ 2 , (19)

where

E ( f ) = ∫ B d ( f ( x ) − f * ( x ) ) 2 d ρ B d .

Proposition 2.2. (Representation theorem). Assume above Assumptions A and B hold. Then,

1) Algorithm (17) has unique solution f z , λ and there are coefficients { a i } i = 1 m + l depending upon λ , m , l , f z , λ , z , f * and g such that:

f z , λ ( ⋅ ) = ∑ i = 1 m a i K x i μ ( ⋅ ) + ∑ k = 1 l a m + k ∂ x K x m + k μ ( ⋅ ) ∂ n → . (20)

2) Algorithm (18) has unique solution f X ¯ , λ and there are coefficients { b i } i = 1 m + l depending upon λ , m , l , f X ¯ , λ , X ¯ , f * and g such that:

f X ¯ , λ ( ⋅ ) = ∑ i = 1 m b i K x i μ ( ⋅ ) + ∑ k = 1 l b m + k ∂ x K x m + k μ ( ⋅ ) ∂ n → . (21)

3) Algorithm (19) has unique solution f λ and there is a function G λ , f * ( x ) depending upon λ , f * and a function p λ , g ( x ) depending upon λ and g such that:

f λ ( ⋅ ) = ∫ B d G λ , f * ( x ) K x μ ( ⋅ ) d ρ B d + ∫ S d − 1 p λ , g ( x ) ∂ x K x μ ( ⋅ ) ∂ n → d ρ S d − 1 . (22)

Proof. See the proof in Section 4.

(20) shows that Algorithm (17) can be replaced by some coefficient regularized models and is a new topic, such kind of research can be found from literature [34] [35] .

We give the following error decomposition:

‖ f z , λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − g ‖ L 2 ( ρ S d − 1 ) ≤ ‖ f z , λ − f X ¯ , λ ‖ L 2 ( ρ B d ) + ‖ f X ¯ , λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − ∂ f X ¯ , λ ∂ n → ‖ L 2 ( ρ S d − 1 ) + ‖ ∂ f X ¯ , λ ∂ n → − g ‖ L 2 ( ρ S d − 1 ) ≤ ‖ f z , λ − f X ¯ , λ ‖ L 2 ( ρ B d ) + ‖ f X ¯ , λ − f λ ‖ L 2 ( ρ B d ) + ‖ f λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − ∂ f X ¯ , λ ∂ n → ‖ L 2 ( ρ S d − 1 ) + ‖ ∂ f X ¯ , λ ∂ n → − ∂ f λ ∂ n → ‖ L 2 ( ρ S d − 1 ) + ‖ ∂ f λ ∂ n → − g ‖ L 2 ( ρ S d − 1 )

≤ ( ‖ f z , λ − f X ¯ , λ ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − ∂ f X ¯ , λ ∂ n → ‖ L 2 ( ρ S d − 1 ) ) + ( ‖ f X ¯ , λ − f λ ‖ L 2 ( ρ B d ) + ‖ ∂ f X ¯ , λ ∂ n → − ∂ f λ ∂ n → ‖ L 2 ( ρ S d − 1 ) ) + 2 K ( f * , g , λ ) ≤ k ( ‖ f z , λ − f X ¯ , λ ‖ K μ + ‖ f X ¯ , λ − f λ ‖ K μ ) + 2 K ( f * , g , λ ) , (23)

where in the last derivation, we have used (15) and

K ( f * , g , λ ) = inf f ∈ H K μ n → ( ‖ f − f * ‖ L 2 ( ρ B d ) 2 + ∫ S d − 1 ( ∂ f ( x ) ∂ n → − g ( x ) ) 2 d ρ S d − 1 + λ ‖ f ‖ K μ 2 ) ,

which controls the approximation errors. Then, to bound the error:

E ( f z , λ , f * , g ) = ‖ f z , λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − g ‖ L 2 ( ρ S d − 1 ) ,

we need only to bound upper bounds for the sample errors ‖ f z , λ − f X ¯ , λ ‖ K μ and ‖ f X ¯ , λ − f ρ , λ ‖ K μ respectively.

2.4. Learning Rates

Theorem 2.1. Let the above Assumptions A and B hold and let f z , λ be the solution of (17) and let f * ∈ C ( B d ) and g ∈ L 2 ( ρ S d − 1 ) . Then, for any δ ∈ ( 0,1 ) , with confidence 1 − δ , holds:

‖ f z , λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − g ‖ L 2 ( ρ S d − 1 ) = O ( σ λ m δ + ( K ( f * , g , λ ) λ 3 2 m + ‖ f * ‖ C ( B d ) λ m ) log 4 δ + K ( f * , g , λ ) λ l δ + K ( f * , g , λ ) ) . (24)

If g = ∂ f * ( x ) ∂ n → for x ∈ S d − 1 , then

E ( f z , λ , f * , g ) = ‖ f z , λ − f * ‖ H n → ( ρ B d )

and in this case,

K ( f * , g , λ ) = K ( f * , λ ) = inf f ∈ H K μ n → ( ‖ f − f * ‖ H n → ( ρ B d ) 2 + λ ‖ f ‖ K μ 2 ) , λ > 0 ,

where the norm ‖ ⋅ ‖ H n → ( ρ B d ) is defined in Section 2.2. The decay for K ( f * , λ ) has recently been discussed in [36] .

We then have the following Corollary 2.1.

Corollary 2.1. Let Assumptions A and B hold and let f z , λ be the solution of (17) and f * ∈ C ( B d ) . If g ( x ) = ∂ f * ( x ) ∂ n → for x ∈ S d − 1 . Then, for any δ ∈ ( 0,1 ) , with confidence 1 − δ , holds:

‖ f z , λ − f * ‖ H n → ( ρ B d ) = O ( σ λ m δ + ( K ( f * , λ ) λ 3 2 m + ‖ f * ‖ C ( B d ) λ m ) log 4 δ + K ( f * , g , λ ) λ l δ + K ( f * , λ ) ) . (25)

By (25), we know if λ = λ ( m ) ↓ 0 + ( m → + ∞ ) is chosen in such a way that l i m λ → 0 + K ( f * , λ ) = 0 , λ 3 2 m → + ∞ when m → + ∞ , then, with confidence 1 − δ , holds (since m ~ l ):

l i m m → + ∞ ‖ f z , λ − f * ‖ H n → ( ρ B d ) = 0. (26)

2.5. Further Discussions

We now give some explanation on the results and the assumptions.

1) There are three reasons encouraging us to choose D = B d and ∂ D = S d − 1 to show the kernel-regularized regression model for solving the Neumann boundary problem (5).

i) When D = B d and ∂ D = S d − 1 , we can easily give explicit representation for outward normal derivatives (see (9)). Therefore, we can extend the method used in this paper to the domains whose outward normal derivatives may be computed easily.

ii) By the statement in Section 2.1, we know a tool for us to construct a reproducing kernel Hilbert space is the orthonormal basis. We notice that function orthogonal basis theory has been developed not only on the unit ball B d (see [37] ) and the unit sphere S d − 1 (see [29] ), but also on the some Sobolev space associated with both B d and S d − 1 (see e.g. [30] ). These facts also encourage us to choose D = B d and ∂ D = S d − 1 for problem (5).

iii) In learning theory, the learning rate estimate for the kernel-regularized learning algorithm are sum up to bound the error bounds, which belong to the scope of approximation theory. One can make use of the rich spherical approximation theory skills to bound the learning rates (see [24] ).

2) In the present paper, we give two Assumptions A and B. They are reasonable.

Since p k ( x , y ) is a bivariate polynomial on B d × B d for a given k, ∂ x p k ( x , x ) ∂ n → is still a polynomial whose sup can be attained on S d − 1 . The Assumption B is reasonable. By the same way, we know d k = sup x , y ∈ B d ∂ α p k ( x , y ) can be attained. If we choose λ l = e − k β ( β > 0 ) , then, we can have K μ ∈ C ( 1 ) ( B d × B d ) . So, the Assumption A is reasonable as well.

3) The convergence in the present paper are uniform convergence (see (26)), which admits the reliability for the proposed method.

3. Lemmas

To give the feature description of the optimal solutions of Algorithms (17)-(19), we need the concept of Gâteaux derivative.

Let ( H , ‖ ⋅ ‖ H ) be a Hilbert space, F ( f ) : H → R ∪ { ∓ ∞ } be a real function. We say F is Gâteaux differentiable at f ∈ H if there is a ξ ∈ H such that for any g ∈ H there holds:

l i m t → 0 F ( f + t g ) − F ( f ) t = 〈 g , ξ 〉 H

and write ∇ f F ( f ) = ξ as the Gâteaux derivative of F ( f ) at f.

To prove Theorem 2.1, we need some lemmas.

Lemma 3.1. There hold the following equations:

1) f z , λ satisfies equation:

1 m ∑ i = 1 m ( f z , λ ( x i ) − y x i ) K x i μ ( ⋅ ) + 1 l ∑ k = 1 l ( ∂ f z , λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → + λ f z , λ ( ⋅ ) = 0. (27)

2) f X ¯ , λ satisfies equation:

1 m ∑ i = 1 m ( f X ¯ , λ ( x i ) − f * ) ( x i ) K x i μ ( ⋅ ) + 1 l ∑ k = 1 l ( ∂ f X ¯ , λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → + λ f X ¯ , λ ( ⋅ ) = 0. (28)

3) f λ satisfies equation:

∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d + ∫ S d − 1 ( ∂ f λ ( x ) ∂ n → − g ( x ) ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 + λ f λ ( ⋅ ) = 0. (29)

Proof. Proof of 1). Define Ω z ( f ) as:

Ω z ( f ) = E z ( f ) + 1 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x m + k ) ) 2 + λ ‖ f ‖ K μ 2 , f ∈ H K μ n → .

Then,

l i m t → 0 + Ω z ( f + t h ) − Ω z ( f ) t = l i m t → 0 + E z ( f + t h ) − E z ( f ) t + 2 λ 〈 h , f 〉 K μ , + 1 l ∑ k = 1 l l i m t → 0 + ( ∂ f ( x k + m ) ∂ n → + t ∂ h ( x k + m ) ∂ n → − g ( x k + m ) ) 2 − ( ∂ f ( x k + m ) ∂ n → − g ( x k + m ) ) 2 t ,

where

l i m t → 0 + E z ( f + t h ) − E z ( f ) t = 1 m ∑ i = 1 m l i m t → 0 + ( f ( x i ) + t h ( x i ) − y x i ) 2 − ( f ( x i ) − y x i ) 2 t = 2 m ∑ i = 1 m ( f ( x i ) − y x i ) h ( x i ) = 2 m ∑ i = 1 m ( f ( x i ) − y x i ) 〈 h , K x i μ ( ⋅ ) 〉 K μ = 〈 h , 2 m ∑ i = 1 m ( f ( x i ) − y x i ) K x i μ ( ⋅ ) 〉 K μ

and

1 l ∑ k = 1 l l i m t → 0 + ( ∂ f ( x k + m ) ∂ n → + t ∂ h ( x k + m ) ∂ n → − g ( x k + m ) ) 2 − ( ∂ f ( x k + m ) ∂ n → − g ( x k + m ) ) 2 t = 〈 h , 2 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x k + m ) ) ∂ K x k + m μ ( ⋅ ) ∂ n → 〉 K μ . (30)

It follows

l i m t → 0 + Ω z ( f + t h ) − Ω z ( f ) t = 〈 h , 2 m ∑ i = 1 m ( f ( x i ) − y x i ) K x i μ ( ⋅ ) + 2 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x k + m ) ) ∂ K x k + m μ ( ⋅ ) ∂ n → + 2 λ f 〉 K μ .

By the definition of Gâteaux derivative, we have:

∇ f Ω z ( f ) = 2 m ∑ i = 1 m ( f ( x i ) − y x i ) K x i μ ( ⋅ ) + 2 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x k + m ) ) ∂ K x k + m μ ( ⋅ ) ∂ n → + 2 λ f ( ⋅ ) .

By Fermat’s rule (see 1) in Proposition A1) and the definition of f z , λ , we have ∇ f Ω z ( f ) | f = f z , λ = 0 , i.e. (27) holds.

Lemma 3.2. There hold the inequality:

‖ f z , λ − f X ¯ , λ ‖ K μ ≤ 2 λ m ‖ ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) ‖ K μ (31)

and the inequality:

‖ f X ¯ , λ − f ρ , λ ‖ K μ ≤ 2 λ ( ‖ ∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d − 1 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) ‖ K μ + ‖ ∫ S d − 1 ( ∂ f ( x ) ∂ n → − g ( x ) ) K x μ ( ⋅ ) d ρ S d − 1 − 1 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x m + k ) ) K x m + k μ ( ⋅ ) ‖ K μ ) . (32)

Proof. The definition of f z , λ and the inequality (43) give:

0 ≥ Ω z ( f z , λ ) − Ω z ( f X ¯ , λ ) ≥ 2 m ∑ i = 1 m ( f X ¯ , λ ( x i ) − y x i ) ( f z , λ ( x i ) − f X ¯ , λ ( x i ) ) + 2 l ∑ k = 1 l ( ∂ f X ¯ , λ ( x m + k ) ∂ n → − g ( x m + k ) ) ( ∂ f z , λ ( x m + k ) ∂ n → − ∂ f X ¯ , λ ( x m + k ) n → ) + λ ( ‖ f z , λ ‖ K μ 2 − ‖ f X ¯ , λ ‖ K μ 2 )

≥ 〈 f z , λ − f X ¯ , λ , 2 m ∑ i = 1 m ( f X ¯ , λ ( x i ) − y x i ) K x i μ ( ⋅ ) + 2 l ∑ k = 1 l ( ∂ f X ¯ , λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ + 〈 f z , λ − f X ¯ , λ , 2 λ f X ¯ , λ 〉 K μ + λ ‖ f z , λ − f X ¯ , λ ‖ K μ 2 ,

where we have used (44). Since (28) and y x i = f * ( x i ) + η x i , we have:

0 ≥ Ω z ( f z , λ ) − Ω z ( f X ¯ , λ ) ≥ 〈 f z , λ − f X ¯ , λ , 2 m ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) 〉 K μ + λ ‖ f z , λ − f X ¯ , λ ‖ K μ 2 .

It follows:

λ ‖ f z , λ − f X ¯ , λ ‖ K μ 2 ≤ 〈 f X ¯ , λ − f z , λ , 2 m ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) 〉 K μ ≤ ‖ 2 m ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) ‖ K μ × ‖ f X ¯ , λ − f z , λ ‖ K μ .

(31) thus holds.

Proof of (32). Define Ω X ¯ ( f ) as:

Ω X ¯ ( f ) = E X ¯ ( f ) + 1 l ∑ k = 1 l ( ∂ f ( x k + m ) ∂ n → − g ( x m + k ) ) 2 + λ ‖ f ‖ K μ 2 , f ∈ H K μ n → .

Then, by the definition of f X ¯ , λ and inequalities (43) and (44), we have:

0 ≥ Ω X ¯ ( f X ¯ , λ ) − Ω X ¯ ( f λ ) ≥ 〈 f X ¯ , λ − f λ , 2 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) 〉 K μ + 〈 f X ¯ , λ − f λ , 2 l ∑ k = 1 l ( ∂ f λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ + 〈 f X ¯ , λ − f λ , 2 λ f λ 〉 K μ + λ ‖ f X ¯ , λ − f λ ‖ K μ 2 .

By (29), we have:

0 ≥ 〈 f X ¯ , λ − f λ , ∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d − 1 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) + ∫ S d − 1 ( ∂ f λ ( x ) ∂ n → − g ( x ) ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ k = 1 l ( ∂ f λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ + λ ‖ f X ¯ , λ − f λ ‖ K μ 2 .

It follows:

λ ‖ f X ¯ , λ − f λ ‖ K μ 2 ≤ 〈 f X ¯ , λ − f λ , 1 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) − ∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d − ∫ S d − 1 ( ∂ f λ ( x ) ∂ n → − g ( x ) ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 + 1 l ∑ k = 1 l ( ∂ f λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ

≤ ( ‖ 1 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) − ∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d ‖ K μ × ‖ f X ¯ , λ − f λ ‖ K μ + ‖ ∫ S d − 1 ( ∂ f λ ( x ) ∂ n → − g ( x ) ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ k = 1 l ( ∂ f λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → ‖ K μ ) .

Above inequality gives (32).

Lemma 3.3. There hold following inequalities.

1) For any δ ∈ ( 0,1 ) , with confidence 1 − δ , holds:

‖ f z , λ − f X ¯ , λ ‖ K μ ≤ 2 k σ λ m δ . (33)

2) For any δ ∈ ( 0,1 ) , with confidence 1 − δ , holds:

‖ f X ¯ , λ − f λ ‖ K μ = ( 2 k 2 K ( f * , g , λ ) λ m λ + 2 k ‖ f * ‖ C ( B d ) λ m ) log 2 δ + k K ( f * , g , λ ) l δ . (34)

Proof. Proof of (33). The definition of ‖ ⋅ ‖ K μ ( B d ) and (14) give:

‖ 1 m ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) ‖ K μ 2 = 〈 1 m ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) , 1 m ∑ j = 1 m η x j ( y ) K x j μ ( ⋅ ) 〉 K μ = 1 m 2 ∑ i , j = 1 m η x i ( y ) η x j ( y ) K μ ( x i , x j ) .

By Markov inequality, we have:

P ( ‖ f z , λ − f X ¯ , λ ‖ K μ > ε ) ≤ E ( ‖ f z , λ − f X ¯ , λ ‖ K μ 2 ) ε 2 . (35)

Then, by (31), we have:

‖ f z , λ − f X ¯ , λ ‖ K μ 2 ≤ 4 λ 2 m 2 ‖ ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) ‖ K μ 2 ≤ 4 λ 2 m 2 〈 ∑ i = 1 m η x i ( y ) K x i μ ( ⋅ ) , ∑ j = 1 m η x j ( y ) K x j μ ( ⋅ ) 〉 K μ = 4 λ 2 m 2 ∑ i , j = 1 m η x i ( y ) η x j ( y ) K μ ( x i , x j ) .

Since E ρ ( ⋅ | x ) ( η x ) = 0 , we have:

E ( ‖ f z , λ − f X ¯ , λ ‖ K μ 2 ) ≤ 4 k 2 λ 2 m ∫ B d η x 2 d ρ B d = 4 k 2 σ 2 λ 2 m .

By (35), we have:

P ( ‖ f z , λ − f X ¯ , λ ‖ K μ ≤ ε ) ≥ 1 − 4 k 2 σ 2 λ 2 m ε 2 .

Taking δ = 4 k 2 σ 2 λ 2 m ε 2 . We have ε = 2 k σ λ m δ and (33) thus holds.

We now show (34). Take

A = ‖ ∫ B d ( f λ ( x ) − f * ( x ) ) K x μ ( ⋅ ) d ρ B d − 1 m ∑ i = 1 m ( f λ ( x i ) − f * ( x i ) ) K x i μ ( ⋅ ) ‖ K μ

and

B = ‖ ∫ S d − 1 ( ∂ f λ ( x ) ∂ n → − g ( x ) ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ k = 1 l ( ∂ f λ ( x m + k ) ∂ n → − g ( x m + k ) ) ∂ K x m + k μ ( ⋅ ) ∂ n → ‖ K μ .

Then,

‖ f X ¯ , λ − f ρ , λ ‖ K μ ≤ 2 λ ( A + B ) , (36)

where

A ≤ ‖ ∫ B d f λ ( x ) K x μ ( ⋅ ) d ρ B d − 1 m ∑ i = 1 m f λ ( x i ) K x i μ ( ⋅ ) ‖ K μ + ‖ ∫ B d f * ( x ) K x μ ( ⋅ ) d ρ B d − 1 m ∑ i = 1 m f * ( x i ) K x i μ ( ⋅ ) ‖ K μ .

Since

‖ f λ ( x ) K x μ ( ⋅ ) ‖ K μ = | f λ ( x ) | K x μ ( x ) ≤ k | f λ ( x ) | ≤ k 2 ‖ f ‖ K μ ≤ k 2 K ( f * , g , λ ) λ

and ‖ f * ( x ) K x μ ( ⋅ ) ‖ K μ ≤ k | f * ( x ) | ≤ k ‖ f * ‖ C ( B d ) , we have by (47) that, with confidence 1 − δ , holds:

A ≤ ( 2 k 2 K ( f * , g , λ ) λ m λ + 2 k ‖ f * ‖ C ( B d ) λ m ) log 2 δ . (37)

On the other hand, take ξ ( x ) = ( ∂ f λ ( x ) ∂ n → − g ( x ) ) . Then,

B = ‖ ∫ S d − 1 ξ ( x ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ k = 1 l ξ ( x m + k ) ∂ K x m + k μ ( ⋅ ) ∂ n → ‖ K μ .

By the definition of ‖ ⋅ ‖ K μ , we have:

B 2 = 〈 ∫ S d − 1 ξ ( x ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ k = 1 l ξ ( x m + k ) ∂ K x m + k μ ( ⋅ ) ∂ n → , ∫ S d − 1 ξ ( u ) ∂ K u μ ( ⋅ ) ∂ n → d ρ S d − 1 − 1 l ∑ i = 1 l ξ ( x m + i ) ∂ K x m + i μ ( ⋅ ) ∂ n → 〉 K μ = ∫ S d − 1 ∫ S d − 1 ξ ( x ) ξ ( u ) 〈 ∂ K x μ ( ⋅ ) ∂ n → , ∂ K u μ ( ⋅ ) ∂ n → 〉 K μ d ρ S d − 1 ( x ) d ρ S d − 1 ( u ) − 2 ∫ S d − 1 ξ ( x ) ( 1 l ∑ i = 1 l ξ ( x m + i ) 〈 ∂ K x μ ( ⋅ ) ∂ n → , ∂ K x m + i μ ( ⋅ ) ∂ n → 〉 K μ ) d ρ S d − 1 ( x ) + 1 l 2 ∑ k = 1 l ξ 2 ( x m + k ) 〈 ∂ K x m + k μ ( ⋅ ) ∂ n → , ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ + 1 l 2 ∑ k , i = 1 , k ≠ i l ξ ( x m + i ) ξ ( x m + j ) 〈 ∂ K x m + k μ ( ⋅ ) ∂ n → , ∂ K x m + i μ ( ⋅ ) ∂ n → 〉 K μ .

Since (14), we have:

〈 ∂ K x μ ( ⋅ ) ∂ n → , ∂ K u μ ( ⋅ ) ∂ n → 〉 K μ = ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) ,

〈 ∂ K x μ ( ⋅ ) ∂ n → , ∂ K x m + i μ ( ⋅ ) ∂ n → 〉 K μ = ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) | u = x m + i

and

〈 ∂ K x m + k μ ( ⋅ ) ∂ n → , ∂ K x m + k μ ( ⋅ ) ∂ n → 〉 K μ = ∂ u ∂ n → ∂ u ∂ n → K u μ ( u ) | u = x m + k ,

〈 ∂ K x m + k μ ( ⋅ ) ∂ n → , ∂ K x m + i μ ( ⋅ ) ∂ n → 〉 K μ = ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) | u = x m + i , x = x m + k .

It follows that:

B 2 = ∫ S d − 1 ∫ S d − 1 ξ ( x ) ξ ( u ) ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) d ρ S d − 1 ( x ) d ρ S d − 1 ( u ) − 2 ∫ S d − 1 ξ ( x ) ( 1 l ∑ i = 1 l ξ ( x m + i ) ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) | u = x m + i ) d ρ S d − 1 ( x ) + 1 l 2 ∑ k = 1 l ξ 2 ( x m + k ) ∂ u ∂ n → ∂ u ∂ n → K u μ ( u ) | u = x m + k + 1 l 2 ∑ k , i = 1 , k = i l ξ ( x m + i ) ξ ( x m + j ) ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) | u = x m + i , x = x m + k .

Since ( x m + 1 , x m + 2 , ⋯ , x m + l ) are i.i.d. according to ρ ρ S d − 1 , we have:

E ( B 2 ) = 1 l ( ∫ S d − 1 ξ 2 ( x ) ∂ x ∂ n → ∂ x ∂ n → K x μ ( x ) d ρ S d − 1 ( x ) − ∫ S d − 1 ∫ S d − 1 ξ ( x ) ξ ( u ) ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) d ρ S d − 1 ( x ) d ρ S d − 1 ( u ) ) ≤ 1 l ( ∫ S d − 1 ξ 2 ( x ) ∂ x ∂ n → ∂ x ∂ n → K x μ ( x ) d ρ S d − 1 ( x ) ) ,

where we have used the fact that:

∫ S d − 1 ∫ S d − 1 ξ ( x ) ξ ( u ) ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) d ρ S d − 1 ( x ) d ρ S d − 1 ( u ) ≥ 0

since ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) is a positive definition function about u and x. According to Assumption B, we have:

| ∂ u ∂ n → ∂ x ∂ n → K x μ ( u ) | ≤ k 2 ,

It follows that:

E ( B 2 ) ≤ k 2 l ( ∫ S d − 1 ξ 2 ( x ) d ρ S d − 1 ( x ) ) .

By Markov inequality, we have:

P ( B > ε ) ≤ E ( B 2 ) ε 2 ≤ k 2 l ε 2 ( ∫ S d − 1 ξ 2 ( x ) d ρ S d − 1 ( x ) ) ≤ k 2 l ε 2 K ( f * , g , λ ) .

Take δ = k 2 l ε 2 K ( f * , g , λ ) . Then, ε = k K ( f * , g , λ ) l δ . It follows that with confidence 1 − δ :

B ≤ k K ( f * , g , λ ) l δ . (38)

Collecting (38), (37) and (36), we arrive at (34).

4. Proofs

Proof of Proposition 2.1. Proof of 1). For any f ( x ) = ∑ k = 0 ∞ ∑ l = 1 a k d a k , l ( f ) Q k , l ( x ) ∈ H K μ n → . We rewrite K μ ( x , y ) as:

K μ ( x , y ) = K y μ ( x ) = ∑ k = 0 ∞ ∑ l = 1 a k d ( λ k , l Q k , l ( y ) ) Q k , l ( x ) .

Then,

〈 f , K y μ 〉 K μ = ∑ k = 0 ∞ ∑ l = 1 a k d ( λ k , l Q k , l ( y ) ) a k , l ( f ) λ k , l = f ( y ) ,

which yields (13).

Proof of 2). Since (9) and (46), we have:

∂ f ( x ) ∂ n → = 〈 f , ∑ i = 1 d x i ∂ K ( x , ⋅ ) ∂ x i ( x ) 〉 K μ = 〈 f , ∂ ∂ n → K ( x , ⋅ ) 〉 K μ , x = ( x 1 , ⋯ , x d ) ∈ S d − 1 .

(14) thus holds.

Proof of 3). By (14), we have:

| ∂ f ( x ) ∂ n → | = | 〈 f , ∂ K x μ ( ⋅ ) ∂ n → 〉 K μ | ≤ ‖ f ‖ K μ ‖ ∂ K x μ ( ⋅ ) ∂ n → ‖ K μ ≤ k ‖ f ‖ K μ .

By the same method, we have by (13) that:

| f ( x ) | = | 〈 f , K x μ ( ⋅ ) 〉 K μ | ≤ ‖ f ‖ K μ ‖ K x μ ( ⋅ ) ‖ K μ ≤ k ‖ f ‖ K μ .

(15) thus holds.

Proof of Proposition 2.2. By (27), we have:

f z , λ ( ⋅ ) = 1 λ m ∑ i = 1 m ( y x i − f z , λ ( x i ) ) K x i μ ( ⋅ ) + 1 λ l ∑ k = 1 l ( g ( x m + k ) − ∂ f z , λ ( x m + k ) ∂ n → ) ∂ K x m + k μ ( ⋅ ) ∂ n → . (39)

Taking a i = 1 λ m ( y x i − f z , λ ( x i ) ) and a m + k = 1 λ l ( g ( x m + k ) − ∂ f z , λ ( x m + k ) ∂ n → ) into (40), we have (20).

By (29), we have:

f X ¯ , λ ( ⋅ ) = 1 λ m ∑ i = 1 m ( f * ( x i ) − f X ¯ , λ ( x i ) ) K x i μ ( ⋅ ) + 1 λ l ∑ k = 1 l ( g ( x m + k ) − ∂ f X ¯ , λ ( x m + k ) n → ) ∂ K x m + k μ ( ⋅ ) ∂ n → . (40)

Taking b i = 1 λ m ( f * ( x i ) − f X ¯ , λ ( x i ) ) and b m + k = 1 λ l ( g ( x m + k ) − ∂ f X ¯ , λ ( x m + k ) n → ) into (40), we have (21).

By (29), we have:

f λ ( ⋅ ) = 1 λ ∫ B d ( f * ( x ) − f λ ( x ) ) K x μ ( ⋅ ) d ρ B d + 1 λ ∫ S d − 1 ( g ( x ) − ∂ f λ ( x ) n → ) ∂ K x μ ( ⋅ ) ∂ n → d ρ S d − 1 . (41)

Taking G λ , f * ( x ) = 1 λ ( f * ( x ) − f λ ( x ) ) and p λ , g ( x ) = 1 λ ( g ( x ) − ∂ f λ ( x ) ∂ n → ) into (41), we have (22).

Proof of Theorem 2.1. Collecting (33), (34) and (23), we have:

‖ f z , λ − f * ‖ L 2 ( ρ B d ) + ‖ ∂ f z , λ ∂ n → − g ‖ L 2 ( ρ S d − 1 ) ≤ 2 k 2 σ λ m δ + k ( K ( f * , g , λ ) λ 3 2 m + ‖ f * ‖ C ( B d ) λ m ) log 4 δ + k K ( f * , g , λ ) l δ + 2 K ( f * , g , λ ) . (42)

(42) yields (24).

Founding

Supported partially by the NSF (Project No. 61877039), the NSFC/RGC Joint Research Scheme (Project No. 12061160462 and N_CityU102/20) of China.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Cite this paper

Ran, X.X. and Sheng, B.H. (2024) Solving Neumann Boundary Problem with Kernel-Regularized Learning Approach. Journal of Applied Mathematics and Physics, 12, 1101-1125. https://doi.org/10.4236/jamp.2024.124069

AppendicesAppendix 1. Gâteaux Derivative and the Convex Function

Following Proposition A1 can be found from the Proposition 17.4, Proposition 17.10 and Proposition 17.12 of [38] .

Proposition A1. Let ( H , ‖ ⋅ ‖ H ) be a Hilbert space and F ( f ) : H → R ∪ { ∓ ∞ } be a function defined on H . Then,

1) If F ( f ) is a convex function, then, F ( f ) attains minimal value at f 0 if and only if ∇ f F ( f 0 ) = 0 .

2) If F ( f ) : H → R ∪ { ∓ ∞ } is a Gâteaux differentiable function, then F ( f ) is a convex on H if and only if for any f , g ∈ H , there holds:

F ( g ) − F ( f ) ≥ 〈 g − f , ∇ f F ( f ) 〉 H .

In particular, we have:

x 2 − y 2 ≥ 2 y ( x − y ) , ∀ x , y ∈ R . (43)

3) For function F ( f ) = ‖ f ‖ H 2 , there holds ∇ f F ( f ) = 2 f and there holds equality:

‖ f ‖ H 2 − ‖ g ‖ H 2 = 〈 f − g , 2 g 〉 H + ‖ f − g ‖ H 2 , (44)

i.e.

‖ f ‖ H 2 − ‖ g ‖ H 2 = 〈 f − g , ∇ g F ( g ) 〉 H + ‖ f − g ‖ H 2 .

Appendix 2. Derivatives Reproducing Property

Let Ω ⊂ R d be a compact subset which is the closure of its nonempty interior Ω 0 . Let K ( x , y ) be a Mercer kernel on Ω × Ω having the expansion (see e.g. [39] ):

K ( x , y ) = ∑ k = 0 + ∞ λ k φ k ( x ) φ k ( y ) , x , y ∈ Ω , (45)

where the convergence is absolute (for each x , y ∈ Ω ) and uniform on Ω × Ω . Then, we have a proposition.

Proposition A2. Let K ( x , y ) be a Mercer kernel of form (45) and K ∈ C ( 1 ) ( Ω × Ω ) . If H K is a reproducing kernel Hilbert space such that:

f ( x ) = 〈 f , K ( ⋅ , x ) 〉 H K , f ∈ H K , x ∈ Ω .

Then,

∂ x α f ( x ) = 〈 f , ∂ x α K ( ⋅ , x ) 〉 H K n → ( ρ Ω ) , | α | ≤ 1 , f ∈ H K , x ∈ Ω , (46)

where | α | = ∑ i = 1 d α i ≤ 1 .

Proof. It can be found from Theorem 1 in [12] , or see (v) in Theorem 4.7 in [40] .

Appendix 3. A Probability Inequality

Proposition A4. [41] Let ξ be a random variable taking values in a real separable Hilbert space H on a probability space ( Ω , F , P ) . Assume that there are a positive constant L such that ‖ ξ ‖ H ≤ L . Then, for all n ≥ 1 and 0 < η < 1 , it holds, with confidence 1 − η , that:

‖ 1 n ∑ i = 1 n ξ ( Ω i ) − E ( ξ ) ‖ H ≤ 4 L n log 2 η . (47)

References1

Atkinson, K., Hansen, O. and Chien, D. (2011) A Spectral Method for Elliptic Equations: The Neumann Problem. Advances in Computational Mathematics, 34, 295-317. https://doi.org/10.1007/s10444-010-9154-3

Atkinson, K., Chien, D. and Hansen, O. (2010) A Spectral Method for Elliptic Equation: The Dirichlet Problem. Advances in Computational Mathematics, 33, 169-189. https://doi.org/10.1007/s10444-009-9125-8

Atkinson, K. and Hansen, O. (2010) A Spectral Method for the Eigenvalue Problem for Elliptic Equations. Electronic Transactions on Numerical Analysis, 37, 386-412.

Li, X. (2009) Approximation of Potential Integral by Radial Bases for Solutions of Helmholtz Equation. Advances in Computational Mathematics, 30, 201-230. https://doi.org/10.1007/s10444-008-9065-8

Li, X. (2008) Rate of Convergence of the Method of Fundamental Solutions and Hyperinterpolation for Modified Helmholtz Equations on the Unit Ball. Advances in Computational Mathematics, 29, 393-413. https://doi.org/10.1007/s10444-007-9056-1

Li, X. (2008) Convergence of the Method of Fundamental Solutions for Poisson’s Equation on the Unit Sphere. Advances in Computational Mathematics, 28, 269-282. https://doi.org/10.1007/s10444-006-9022-3

Cialenco, I., Fasshauer, G.E. and Ye, Q. (2012) Approximation of Stochastic Partial Differential Equations by a Kernel-Based Collocation Method. International Journal of Computer Mathematics, 89, 2543-2561. https://doi.org/10.1080/00207160.2012.688111

Ding, L.L., Liu, Z.Y. and Xu, Q.Y. (2021) Multilevel RBF Collocation Method for the Fourth-Order Thin Plate Problem. International Journal of Wavelets, Multiresolution and Information Processing, 19, Article ID: 2050079. https://doi.org/10.1142/S0219691320500794

Fasshauer, G.E. and Ye, Q. (2013) Kernel-Based Collocation Methods versus Galerkin Finite Element Methods for Approximating Elliptic Stochastic Partial Differential Equation. In: Griebel, M. and Schweitzer, M., Eds., Meshfree Methods for Partial Differential Equations VI, Springer, Berlin, 155-170. https://doi.org/10.1007/978-3-642-32979-1_10

Fasshauer, G.E. and Ye, Q. (2012) A Kernel-Based Collocation Method for Elliptic Partial Differential Equations with Random Coefficients. In: Dick, J., Kuo, F., Peters, G. and Sloan, I., Eds., Monte Carlo and Quasi-Monte Carlo Methods 2012, Springer, Berlin, 331-347. https://doi.org/10.1007/978-3-642-41095-6_14

Ye, Q. (2014) Approximation of Nonlinear Stochastic Partial Differential Equations by a Kernel-Based Collocation Method. International Journal of Applied Nonlinear Science, 1, 156-172. https://doi.org/10.1504/IJANS.2014.061018

Zhou, D.X. (2008) Derivative Reproducing Properties for Kernel Methods in Learning Theory. Journal of Computational and Applied Mathematics, 220, 456-463. https://doi.org/10.1016/j.cam.2007.08.023

Bao, K.J., Qian, X., Liu, Z.Y. and Song, S.B. (2022) An Operator Learning Approach via Function—Valued Reproducing Kernel Hilbert Space for Diferential Equations. arXiv: 2202.09488.

Mo, Y. and Qian, T. (2014) Support Vector Machine Adapted Tikhonov Regularization Method to Solve Dirichlet Problem. Applied Mathematics and Computation, 245, 509-519. https://doi.org/10.1016/j.amc.2014.07.089

Sheng, B.H., Zhou, D.P. and Wang, S.H. (2022) The Kernel Regularized Learning Algorithm for Solving Laplace Equationn with Dirichlet Boundary. International Journal of Wavelets, Multiresolution and Information Processing, 20, Article ID: 2250031. https://doi.org/10.1142/S021969132250031X

Stepaniants, G. (2023) Learning Partial Differential Equations in Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 24, 1-72

Harosko, D.D. and Triebel, H. (2008) Distributions, Sobolev Spaces, Elliptic Equations. European Mathematical Society, Helsinki. https://doi.org/10.4171/042

Belkin, M. and Niyogi, P. (2004) Semi-Supervised Learning on Riemannian Manifolds. Machine Learning, 56, 209-239. https://doi.org/10.1023/B:MACH.0000033120.25363.1e

Belkin, M., Niyogi, P. and Sindhwani, V. (2006) Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples. Journal of Machine Learning Research, 7, 2399-2434.

Niyogi, P. (2013) Manifold Regularization and Semi-Supervised Learning: Some Theoretical Analysis, Journal of Machine Learning Research, 14, 1229-1250.

Sheng, B.H. and Zhu, H.C. (2018) The Convergence Rate of Semi-Supervised Regression with Quadratic Loss. Applied Mathematics and Computation, 321, 11-24. https://doi.org/10.1016/j.amc.2017.10.033

Smale, S. and Zhou, D.X. (2004) Shannon Sampling and Function Reconstruction from Point Values. Bulletin of the American Mathematical Society, 41, 279-305. https://doi.org/10.1090/S0273-0979-04-01025-0

Cucker, F. and Smale, S. (2002) On the Mathematical Foundations of Learning Theory. Bulletin of the American Mathematical Society, 39, 1-49. https://doi.org/10.1090/S0273-0979-01-00923-5

Cucker, F. and Zhou, D.X. (2007) Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511618796

Chen, H., Zhou, Y.C., Tang, Y.Y., Li, L.Q. and Pan, Z.B. (2013) Convergence Rate of the Semi-Supervised Greedy Algorithm. Neural Networks, 44, 44-50. https://doi.org/10.1016/j.neunet.2013.03.001

Sheng, B.H. and Xiang, D.H. (2017) The Performance of Semi-Supervised Laplacian Regularized Regression with Least Square Loss. International Journal of Wavelets, Multiresolution and Information Processing, 15, Article ID: 1750016. https://doi.org/10.1142/S0219691317500163

Sheng, B.H., Xiang, D.H. and Ye, P.X. (2015) Convergence Rate of Semi-Supervised Gradient Learning. International Journal of Wavelets, Multiresolution and Information Processing, 13, Article ID: 1550021. https://doi.org/10.1142/S0219691315500216

Sheng, B.H. and Zhang, H.Z. (2020) Performance Analysis of the LapRSSLG Algorithmin Learning Theory. Analysis and Applications, 18, 79-108. https://doi.org/10.1142/S0219530519410033

Wang, K.Y. and Li, L.Q. (2000) Harmonic Analysis and Approximation on the Unit Sphere. Science Press, Beijing.

Delgado, A.M., Fernández, L., Lubinsky, D., Pérez, T.E. and Piñar, M.A. (2016) Sobolev Orthogonal Polynomials on the Unit Ball via Outward Normal Derivatives. Journal of Mathematical Analysis and Applications, 440, 716-740. https://doi.org/10.1016/j.jmaa.2016.03.041

Jordao, T. and Menegatto, V.A. (2012) Reproducing Properties of Differentiable Mercer-Like Kernels on the Sphere. Numerical Functional Analysis and Optimization, 33, 1221-1243. https://doi.org/10.1080/01630563.2012.660590

Castro, M.H., Menegatto, V.A. and Oliveira, C.P. (2013) Laplace-Beltrami Differentiability of Positive Definite Kernels on the Sphere. Acta Mathematica Sinica, English Series, 29, 93-104. https://doi.org/10.1007/s10114-012-1067-2

Ferreira, J.C. and Menegatto, V.A. (2013) Positive Definiteness, Reproducing Kernel Hilbert Spaces and Beyond. Annals of Functional Analysis, 4, 64-88. https://doi.org/10.15352/afa/1399899838

Sun, H.W. and Wu, Q. (2011) Least Square Regression with Independent Kernels and Coefficient Regularization. Applied and Computational Harmonic Analysis, 30, 96-109. https://doi.org/10.1016/j.acha.2010.04.001

Zhang, J., Wang, J.L. and Sheng, B.H. (2011) Learning from Regularized Regression Algorithms with P-Order Markov Chain Sampling. Applied Mathematics: A Journal of Chinese Universities, 226, 295-306. https://doi.org/10.1007/s11766-011-2701-y

Sheng, B.H. and Wang, J.L. (2024) Moduli of Smoothness, K-Functionals and Jackson-Type Inequalities Associated with Kernel Function Approximation in Learning Theory. Analysis and Applications. https://doi.org/10.1142/S021953052450009X

Dai, F. and Xu, Y. (2013) Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4614-6660-4

Bauschke, H.H. and Combettes, P.L. (2010) Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer-Verlag, New York. https://doi.org/10.1007/978-1-4419-9467-7

Aronszajn, N. (1950) Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68, 337-404. https://doi.org/10.1090/S0002-9947-1950-0051437-7

Ferreira, J.C. and Menegatto, V.A. (2012) Reproducing Properties of Differentiable Mercer-Like Kernels. Mathematische Nachrichten, 285, 959-973. https://doi.org/10.1002/mana.201100072

Smale, S. and Zhou, D.X. (2007) Learning Theory Estimates via Integral Operators and Their Applications. Constructive Approximation, 26, 153-172. https://doi.org/10.1007/s00365-006-0659-y