<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JDAIP</journal-id><journal-title-group><journal-title>Journal of Data Analysis and Information Processing</journal-title></journal-title-group><issn pub-type="epub">2327-7211</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jdaip.2020.83012</article-id><article-id pub-id-type="publisher-id">JDAIP-102423</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Computer Science&amp;Communications</subject><subject> Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Hierarchical Representations Feature Deep Learning for Face Recognition
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Haijun</surname><given-names>Zhang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yinghui</surname><given-names>Chen</given-names></name><xref ref-type="aff" rid="aff2"><sup>2</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>Guangdong Provincial Key Laboratory of Conservation and Precision Utilization of Characteristic Agricultural Resources in Moun-tainous Areas, Meizhou, China</addr-line></aff><aff id="aff2"><addr-line>School of Mathematics, Jiaying University, Meizhou, China</addr-line></aff><pub-date pub-type="epub"><day>02</day><month>07</month><year>2020</year></pub-date><volume>08</volume><issue>03</issue><fpage>195</fpage><lpage>227</lpage><history><date date-type="received"><day>3,</day>	<month>August</month>	<year>2020</year></date><date date-type="rev-recd"><day>22,</day>	<month>August</month>	<year>2020</year>	</date><date date-type="accepted"><day>25,</day>	<month>August</month>	<year>2020</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  Most modern face recognition and classification systems mainly rely on hand-crafted image feature descriptors. In this paper, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) as a natural source for obtaining additional, complementary hierarchical representations, which helps to relieve us from the complicated hand-crafted feature-design step. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner and then makes more efficient recognition with Softmax regress by supervised learning. As a comparison with the algorithms only based on supervised learning, we again propose and design many kinds of classifiers: BP, HBPNNs, RBF, HRBFNNs, SVM and multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates; second, the algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms; third, hybrid neural networks have better effect than single model neural network; fourth, the average recognition rate and variance of these algorithms in order of the largest to the smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR; at last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artificial intelligent tasks.
 
</p></abstract><kwd-group><kwd>Face Recognition</kwd><kwd> Unsupervised</kwd><kwd> Hierarchical Representations</kwd><kwd> Hybrid Neural Networks</kwd><kwd> RBM</kwd><kwd> Deep Belief Network</kwd><kwd> Deep Learning</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Face recognition (FR) is one of the main areas of investigation in biometrics and computer vision. It has a wide range of applications, including access control, information security, law enforcement and surveillance systems. FR has caught the great attention from large numbers of research groups and has also achieved a great development in the past few decades [<xref ref-type="bibr" rid="scirp.102423-ref1">1</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref3">3</xref>]. However, FR suffers from some difﬁculties because of varying illumination conditions, different poses, disguise and facial expressions and so on [<xref ref-type="bibr" rid="scirp.102423-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref6">6</xref>]. A plenty of FR algorithms have been designed to alleviate these difﬁculties [<xref ref-type="bibr" rid="scirp.102423-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref9">9</xref>]. FR includes three key steps: image preprocessing, feature extraction and classiﬁcation. Image preprocessing is essential process before feature extraction and also is the important step in the process of FR. Feature extraction is mainly to give an effective representation of each image, which can reduce the computational complexity of the classiﬁcation algorithm and enhance the separability of the images to get a higher recognition rate. While classiﬁcation is to distinguish those extracted features with a good classiﬁer. Therefore, an effective face recognition system greatly depends on the appropriate representation of human face features and the good design of classiﬁer [<xref ref-type="bibr" rid="scirp.102423-ref10">10</xref>].</p><p>To select the features that can highlight classification, many kinds of feature selection methods have been presented, such as: spectral feature selection (SPEC) [<xref ref-type="bibr" rid="scirp.102423-ref11">11</xref>], multi-cluster feature selection (MCFS) [<xref ref-type="bibr" rid="scirp.102423-ref12">12</xref>], minimum redundancy spectral feature selection (MRSF) [<xref ref-type="bibr" rid="scirp.102423-ref13">13</xref>], and joint embedding learning and sparse regression (JELSR) [<xref ref-type="bibr" rid="scirp.102423-ref14">14</xref>]. In addition, wavelet transform is popular and widely applied in face recognition system for its multi-resolution character, such as 2-dimensional discrete wavelet transform [<xref ref-type="bibr" rid="scirp.102423-ref15">15</xref>], discrete wavelet transform [<xref ref-type="bibr" rid="scirp.102423-ref16">16</xref>], fast beta wavelet networks [<xref ref-type="bibr" rid="scirp.102423-ref17">17</xref>], and wavelet based feature selection [<xref ref-type="bibr" rid="scirp.102423-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref19">19</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref20">20</xref>].</p><p>After extracting the features, the following work is to design an effective classiﬁer. Classification aims to obtain the face type for the input signal. Typically used classification approaches include polynomial function, HMM [<xref ref-type="bibr" rid="scirp.102423-ref21">21</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref22">22</xref>], GMM [<xref ref-type="bibr" rid="scirp.102423-ref23">23</xref>], K-NN [<xref ref-type="bibr" rid="scirp.102423-ref23">23</xref>], SVM [<xref ref-type="bibr" rid="scirp.102423-ref24">24</xref>], and Bayesian classifier [<xref ref-type="bibr" rid="scirp.102423-ref25">25</xref>]. In addition, random weight network (RWN) is proposed in some articles [<xref ref-type="bibr" rid="scirp.102423-ref26">26</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref27">27</xref>] and there are also other kinds of neural networks used as the classiﬁer for FR [<xref ref-type="bibr" rid="scirp.102423-ref28">28</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref29">29</xref>].</p><p>In this paper, we first make image preprocessing to eliminate the interference of noise and redundant information, reduce the effects of environmental factors on images and highlight the important information of images. At the same time, in order to compensate the deﬁciency of geometric features, it is well known that the original face images often need to be well represented instead of being input into the classiﬁer directly because of the huge computational cost. So PCA and 2D-PCA are used to extract geometric features from preprocessed images, reduce their dimensionality for computation and attain a higher level of separability. At last, we propose a novel deep learning algorithm combining unsupervised and supervised learning named deep belief network embedded with Softmax regress (DBNESR) to learn hierarchical representations for FR; as a comparison with the algorithms only based on supervised learning, again design many kinds of other classifiers and make experiments to validate the effectiveness of the algorithm.</p><p>The proposed DBNESR has several important properties, which are summarized as follows: 1) Through special learning, DBNESR can provide effective hierarchical representations [<xref ref-type="bibr" rid="scirp.102423-ref30">30</xref>]. For example, it can capture the intuition that if a certain image feature (or pattern) is useful in some locations of the image, then the same image feature can also be useful in other locations or it can capture higher-order statistics such as corners and contours, and can be tuned to the statistics of the speciﬁc object classes being considered (e.g., faces). 2) DBNESR is similar to the multiple nonlinear functions mapping, which can extract complex statistical dependencies from high-dimensional sensory inputs (e.g., faces) and efﬁciently learn deep hierarchical representations by re-using and combining intermediate concepts, allowing it to generalize well across a wide variety of computer vision (CV) tasks, including face recognition, image classiﬁcation, and many others. 3) Further, an end system making use of deep learning hierarchical representations features can be more readily adapted to new domains.</p><p>The analysis and experiments are performed on the precise rate of face recognition. The conducted experiments validate: Firstly, the proposed DBNESR is optimal for face recognition with the highest and most stable recognition rates; Second, the deep learning algorithm combining unsupervised and supervised learning has better effect than all supervised learning algorithms; Third, hybrid neural networks has better effect than single model neural network; Fourth, the average recognition rate and variance of these algorithms in order of largest to smallest are respectively shown as DBNESR, MCDFC, SVM, HRBFNNs, RBF, HBPNNs, BP and BP, RBF, HBPNNs, HRBFNNs, SVM, MCDFC, DBNESR; At last, it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling hard artiﬁcial intelligent tasks.</p><p>The remainder of this paper is organized as follows. Section 2 reviews the images preprocessing. Section 3 introduces the feature extraction methods. Section 4 designs the classifiers of supervised learning. Section 5 gives and designs the classifier combining unsupervised and supervised learning proposed by us. Experimental results are presented and discussed in Section 6. Section 7 gives the concluding remarks.</p></sec><sec id="s2"><title>2. Images Preprocessing</title><p>Images often appear the phenomenon such as low contrast, being not clear and so on in the process of generation, acquisition, input, etc. of images due to the influence of environmental factors such as the imaging system, noise and light conditions so on. Therefore it needs to make images preprocessing. The purpose of the preprocessing is to eliminate the interference of noise and redundant information, reduce the effects of environmental factors on images and highlight the important information of images [<xref ref-type="bibr" rid="scirp.102423-ref31">31</xref>]. Images preprocessing usually includes gray of images, images filtering, gray equalization of images, standardization of images, compression of images (or dimensionality-reduced) and so on [<xref ref-type="bibr" rid="scirp.102423-ref32">32</xref>]. The process of images preprocessing is as following.</p><p>1) Face images filtering</p><p>We use median filtering to make smoothing denoising for images. This method not only can effectively restrain the noise but also can very well protect the boundary. Median filter is a kind of nonlinear operation, it sorts a pixel point and all others pixel points within its neighborhood as the size of grey value, sets the median of the sequence as the gray value of the pixel point, as shown in Equation (1).</p><p>f ′ ( i , j ) = M e d s { f ( i , j ) } (1)</p><p>where, s is the filter window. Using the template of 3 &#215; 3 makes median filtering for the experiment in the back.</p><p>The purpose of histogram equalization is to make images enhancement, improve the visual effect of images, make redundant information of images after preprocessing less and highlight some important information of images.</p><p>Set the gray range of image A ( x , y ) as [ 0 , L ] , image histogram for H A ( r ) , Therefore, the total pixel points are:</p><p>A 0 = ∫ 0 L H A ( r ) d r (2)</p><p>Making normalization processing for the histogram, the probability density function of each grey value can be obtained:</p><p>p ( r ) = H A ( r ) A 0 (3)</p><p>The probability distribution function is:</p><p>P ( r ) = ∫ 0 L p ( r ) d r = 1 A 0 ∫ 0 L H A ( r ) d r (4)</p><p>Set the gray transformation function of histogram equalization as the limited slope not reduce continuously differentiable function s = T ( r ) , input it into A ( x , y ) to get the output B ( x , y ) . H B ( r ) is the histogram of output image, it can get</p><p>H B ( s ) d s = H A ( r ) d r (5)</p><p>H B ( s ) = H A ( r ) d s / d r = H A ( r ) T ′ ( r ) (6)</p><p>where, T ′ ( r ) = d s / d r . Therefore, when the difference between the molecular and denominator of H B ( r ) is only a proportionality constant, H B ( r ) is constant. Namely</p><p>T ′ ( r ) = C A 0 H A ( r ) (7)</p><p>s = T ( r ) = C A 0 ∫ 0 r H A ( r ) d r = C P ( r ) (8)</p><p>In order to make the scope of s for [ 0 , L ] , can get C = L . For discrete case the gray transformation function is as following:</p><p>s = T ( r ) = C P ( r k ) = C ∑ i = 0 k p ( r i ) = C ∑ i = o k n i n (9)</p><p>where, r k is the kth grayscale, n k is the pixel number of r k , n is the total pixels number of images, the scope of k for [ 0 , L − 1 ] .</p><p>We make the histogram equalization experiment for the images in the back.</p><p>It is well known that the original face images often need to be well represented instead of being input into the classiﬁer directly because of the huge computational cost. As one of the popular representations, geometric features are often extracted to attain a higher level of separability. Here we employ multi-scale two-dimensional wavelet transform to generate the initial geometric features for representing face images.</p><p>We make the multi-scale two-dimensional wavelet transform experiment for the images in the back.</p></sec><sec id="s3"><title>3. Feature Extraction</title><p>There are two main purposes for feature extraction: One is to extract characteristic information from the face images, the feature information can classify all the samples; The second is to reduce the redundant information of the images, make the data dimensionality being on behalf of human faces as far as possibly reduce, so as to improve the speed of subsequent operation process. It is well known that image features are usually classiﬁed into four classes: Statistical-pixel features, visual features, algebraic features, and geometric features (e.g. transform-coefﬁcient features).</p><p>Suppose that there are N facial images { X i } i = 1 N , X i is column vector of M dimension. All samples can be expressed as following:</p><p>X = ( X 1 , X 2 , ⋯ , X N ) T (10)</p><p>Calculate the average face of all sample images as following:</p><p>X &#175; = 1 N ∑ i = 1 N X i (11)</p><p>Calculate the difference of faces, namely the difference of each face with the average face as following:</p><p>d i = X i − X &#175; , i = 1 , 2 , ⋯ , N (12)</p><p>Therefore, the images covariance matrix C can be represented as following:</p><p>C = 1 N ∑ i = 1 N d i d i T = 1 N A A T A = ( d 1 , d 2 , ⋯ , d N ) (13)</p><p>Using the theorem of singular value decomposition (SVD) to calculate the eigenvalue λ i and orthogonal normalization eigenvector ν i of A T A , through Equation (14) the eigenvalues of covariance matrix C can be calculated.</p><p>u i = 1 λ i A v i , ( i = 1 , 2 , ⋯ , N ) (14)</p><p>Making all the eigenvalues [ λ 1 , λ 2 , ⋯ , λ N ] order in descend according to the size, through the formula as following:</p><p>t = min k { ∑ j = 1 k u j ∑ j = 1 N u j &gt; α , k ≤ t } (15)</p><p>where, usually set a = 90 % , can get the eigenvalues face subspace U = ( u 1 , u 2 , ⋯ , u t ) . All the samples project to subspace U, as following:</p><p>Z = U T X (16)</p><p>Therefore, using front t principal component instead of the original vector X, not only make the facial features parameter dimension is reduced, but also won’t loss too much feature information of the original images.</p><p>Suppose sample set is { S j i ∈ R m ⋅ n , i = 1 , 2 , ⋯ , N ; j = 1 , 2 , ⋯ , M } , i is the category, j is the sample of the ith category, N is the total number of category, M is the total number of samples of each category, K = N ⋅ M is the number of all samples.</p><p>Let S &#175; be average of all samples as follows:</p><p>S &#175; = 1 K ∑ i = 1 N ∑ j = 1 M S j i (17)</p><p>Therefore, the images covariance matrix G can be represented as follows:</p><p>G = 1 K ∑ i = 1 N ∑ j = 1 M ( S j i − S &#175; ) T ( S j i − S &#175; ) (18)</p><p>and the generalized total scattered criterion J ( X ) can be expressed by:</p><p>J ( X ) = X T G X (19)</p><p>Let X o p t be the unitary vector such that it maximizes the generalized total scatter criterion J ( X ) , that is:</p><p>X o p t = arg max X J ( X ) (20)</p><p>In general, there is more than one optimal solution. We usually select a set of optimal solutions { X 1 , ⋯ , X t } subjected to the orthonormal constraints and the maximizing criterion J ( X ) , where, t is smaller than the dimension of the coefﬁcients matrix. In fact, they are those orthonormal eigenvectors of the matrix G corresponding to t largest eigenvalues.</p><p>Now for each sub-band coefﬁcient matrix S i , compute the principal component of the matrix S i as follows:</p><p>y i j = A i x j , j = 1 , 2 , ⋯ , t (21)</p><p>Then we can get its reduced features matrix Y i = [ y i 1 , ⋯ , y i t ] , i = 1 , 2 , ⋯ , m .</p><p>We extract features respectively with PCA and 2D-PCA and compare their effects for the images in the back experiment.</p></sec><sec id="s4"><title>4. Designing the Classifiers of Supervised Learning</title><p>Usually the classifiers based on supervised learning are often used for FR, in the paper we design two types of classifiers. One is the type of supervised learning classifiers and the other is the classifiers combining unsupervised and supervised learning [<xref ref-type="bibr" rid="scirp.102423-ref33">33</xref>].</p><p>1) BP neural network</p><p>BP neural network is a kind of multilayer feed-forward network according to the back-propagation algorithm for errors, is currently one of the most widely used neural network models [<xref ref-type="bibr" rid="scirp.102423-ref34">34</xref>]. Recognition and classification of face images is an important application for BP neural network in the field of pattern recognition and classification.</p><p>The network consists of L layers as shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>. Its training algorithm consists of three steps, illustrated as follows [<xref ref-type="bibr" rid="scirp.102423-ref35">35</xref>].</p><p>2) Hybrid BP neural networks (HBPNNs)</p><p>When the number scale of human face images isn’t big, generalization ability and operation time of single model BP neural network are ideal, and with the increase of numbers of identification species, the structure of BP network will become more complicated, which causes the time of network training to become longer, slower convergence rate, easy to fall into local minimum and poorer generalization ability and so on.</p><p>In order to eliminate these problems we design the hybrid BP neural networks (HBPNNs) composed of multiple single model BP networks to replace the complex BP network for FR. Hybrid networks have better fault tolerant and generalization than single model network, and can implement distributed computing to greatly shorten the training time of network [<xref ref-type="bibr" rid="scirp.102423-ref36">36</xref>].</p><p>The core idea of designing hybrid networks classifier is to divide a K-class pattern classification into K independent 2-class pattern classification. That is to make a complex classification problem decomposed into some simple classification problems. In the paper multiple single model BP networks are combined into a hybrid network classifier, namely make K BP networks of multiple inputs single output integrated, a BP network is a child network only being responsible for identifying one of K-class model category and parallel to each other between</p><p>different subnets. In reference of <xref ref-type="fig" rid="fig1">Figure 1</xref> the model figure of HBPNNs is shown in <xref ref-type="fig" rid="fig2">Figure 2</xref>.</p><p>BP neural network only having a hidden layer and with sufficient hidden neurons is sufficient for approximating the input-output relationship [<xref ref-type="bibr" rid="scirp.102423-ref37">37</xref>]. Therefore, it selects standard three-layer BP neural network as the subnets for hybrid networks. For each subnets of hybrid networks, the number of neurons of input layer corresponds to the dimensions of face feature extraction, the number of neurons of output layer is 1. The number of neurons of hidden layer is calculated by the following empirical formula:</p><p>h = n + m + a (22)</p><p>where, m are the number of neurons of output layer, n are the number of neurons of input layer, a is constant between 1 - 10 [<xref ref-type="bibr" rid="scirp.102423-ref38">38</xref>]. If the dimensions of face feature extraction are X, the structure of each subnets of the hybrid networks is as following:</p><p>X → ( X + 1 + a ) → 1 (23)</p><p>The structure of BP neural network is as following:</p><p>X → ( X + K + a ) → K (24)</p><p>The structure of subnets is simpler than the structure of single model BP neural network. When the structure of networks is complex, every increasing a neural</p><p>the training time will greatly increase. In addition, with the size of networks gradually becoming larger, more and more complex network structure is easy to have slow convergence, prone to fall into local minimum, to have poor generalization ability and so on. By contrast, the hybrid networks based on some subnets can obtain more stable and efficient classifiers in the shorter period of time of training.</p><p>Radial Basis Function (RBF) simulates the structure of neural network of the adjustment and covering each other of receiving domain of human brain, can approximate any continuous function with arbitrary precision. With the characteristics of fast learning, won’t get into local minimum.</p><p>The expression of RBF is as following [<xref ref-type="bibr" rid="scirp.102423-ref39">39</xref>]:</p><p>ϕ ( x ) = ϕ ( ‖ x − c ‖ ) (25)</p><p>where, x , c ∈ R n , Euclidean distance of x to c is ‖ x − c ‖ . The radial basis function most commonly used is the Gaussian function for RBF neural network as following:</p><p>ϕ ( x ) = exp ( − ‖ x − c ‖ 2 σ 2 ) (26)</p><p>where, σ is the width of the function. Radial basis function is often used to construct the function as following:</p><p>y ( x ) = ∑ i = 1 M w i ϕ ( ‖ x − c i ‖ ) (27)</p><p>There are some different for c i of each radial basis function and the weight w i . The concrete process of training RBF is as follows.</p><p>For the set of sample data { ( x i , d i ) } i = 1 N , we use Equation (27) with M hidden nodes to classify those sample data.</p><p>The number of hidden nodes is chosen to be a small integer initially in applications. If the training error is not good, we can increase hidden nodes to reduce it. Considering the testing error simultaneously, there is a proper number of hidden nodes in applications. The model figure of RBF is shown in <xref ref-type="fig" rid="fig3">Figure 3</xref>.</p><p>The hybrid RBF neural networks (HRBFNNs) are composed of multiple RBF networks to replace RBF network for FR. Hybrid networks have better fault tolerant, higher convergence rate and stronger generalization than a single model network, and can implement distributed computing to greatly shorten the training time of network [<xref ref-type="bibr" rid="scirp.102423-ref40">40</xref>].</p><p>If the dimensions of face feature extraction are n, the structure of each subnets of the hybrid networks is as following:</p><p>n → m → 1 (29)</p><p>The structure of RBF neural network is as following:</p><p>n → m → k (30)</p><p>The structure of subnets is simpler than the structure of RBF neural network. In addition, when the structure of networks is complex, every increasing a neural the training time and amount of calculation will greatly increase. The model figure of the HRBFNNs is shown in <xref ref-type="fig" rid="fig4">Figure 4</xref>.</p><p>SVM is a novel machine learning technique based on the statistical learning theory that aims at ﬁnding the optimal hyper-plane among different classes (usually to solve binary classiﬁcation problem) of input data or training data in high dimensional feature space, and new test data can be classiﬁed by the separating hyper-plane [<xref ref-type="bibr" rid="scirp.102423-ref41">41</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref42">42</xref>].</p><p>Supposing there are two classes of examples (positive and negative), the label</p><p>of positive example is +1 and negative example is −1. The number of positive and negative examples respectively is n and m. The set { x i } i = 1 n + m are given positive and negative examples for training. The set { y i } i = 1 n + m are the labels of x i , in which { y i = + 1 } i = 1 n and { y i = − 1 } i = n + 1 n + m . SVM is to learn a decision function to predict the label of an example. The optimization formulation of SVM is:</p><p>min ‖ w ‖ 2 2 + G ∑ i = 1 n + m ξ i , s .t . w x i + b ≥ 1 − ξ i , i = 1 , ⋯ , n ,   w x i + b ≤ − 1 + ξ i , i = n + 1 , ⋯ , n + m (31)</p><p>where, ξ i is the slack variables and G controls the fraction on misclassiﬁed training examples. This is a quadratic programming problem, use Lagrange multiplier method and meet the KKT conditions, can get the optimal classification function for the above problems:</p><p>f ( x ) = sgn { w ⋅ x + b ∗ } = sgn { ∑ i = 1 n a i ∗ y i ( x i • x ) + b ∗ } (32)</p><p>where, a i ∗ and b ∗ are to the parameters to determine the optimal classification surface. ( x i • x ) is the dot product of two vectors.</p><p>For the nonlinear problem SVM can turn it into a high dimensional space by the nonlinear function mapping to solve the optimal classification surface. Therefore, the original problem becomes linearly separable. As can be seen from Equation (32) if we know dot product operation of the characteristics space the optimal classification surface can be obtained by simple calculation. According to the theory of Mercer, for any φ ( x ) ≠ 0 if:</p><p>{ ∫ φ 2 ( x ) d x &lt; ∞ and ∬ K ( x i , x j ) φ ( x i ) φ ( x j ) d x i d x j &gt; 0 (33)</p><p>The arbitrary symmetric function K ( x i , x j ) will be the dot product of a certain transformation space. Equation (32) will be corresponding to:</p><p>f ( x ) = sgn { ∑ i = 1 n a i ∗ y i K ( x i • x ) + b ∗ } (34)</p><p>This is SVM. There are a number of categories of the kernel function K ( x , x i ) :</p><p>l The linear kernel function K ( x , x i ) = ( x • x i ) ;</p><p>l The polynomial kernel function K ( x , x i ) = ( s ( x • x i ) + c ) d ,where s, c and d are parameters;</p><p>l The radial basis kernel function K ( x , x i ) = exp ( − γ | x − x i | 2 ) ,where, γ is the parameter;</p><p>l The Sigmoid kernel function K ( x , x i ) = tanh ( s ( x • x i ) + c ) , where, s and c are parameters.</p><p>The model figure of SVM [<xref ref-type="bibr" rid="scirp.102423-ref43">43</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref44">44</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref45">45</xref>] is shown in <xref ref-type="fig" rid="fig5">Figure 5</xref>.</p><p>SVM is essentially the classifier for two types. Solving multiple classification problems needs to make more appropriate classifier. There are two main methods</p><p>for SVM to structure the classifier for multiple classifications. One is the direct method, namely modify the objective function to use an optimization problem to solve the multiple classification parameters. This method is of high computational complexity. Another method is the indirect method. Combining multiple two-classifier constructs multiple classification classifiers. The method has two ways:</p><p>l One-Against-One: Build a hyper-plane between any two classes, to the problem of k classes needing to build k &#215; ( k − 1 ) / 2 classification planes.</p><p>l One-Against-the-Rest: The classification plane is built between one category and other multiple categories, to the problem of k classes only needing to build k classification planes.</p><p>We will use two methods of “One-Against-One” and “One-Against-the-Rest” for the experiment and choose the method with better effect to construct the multiple classification classifiers of SVM.</p><p>The different classifiers have different performance. Fusion of multiple classifiers integrating their respective characteristics can make classification effect and robustness further improvement.</p><p>Feature fusion and decision-making fusion are of two main methods of classifier fusion. Feature fusion has large computation to be not easy to achieve, therefore, we adopt the decision-making fusion. The model figure of MCDFC is shown in <xref ref-type="fig" rid="fig6">Figure 6</xref>.</p><p>We use the weighted voting for decision fusion of each classifier:</p><p>w i = { log ( 1 − ε i ε i ) , ε i ≤ 0.5 0 , ε i &gt; 0.5 (35)</p><p>where, w i is the weight of each classifier for the vote of classification result, ε i is variable. The final classification result is concluded by each classifier according to the following weighted voting formula:</p><p>f t ( x ) = arg max y ∈ Y ∑ i = 1 n w i [ f i ( x ) = y ] (36)</p><p>where, f t ( x ) is the final classification result and corresponding to the category y with the maximum, f i ( x ) is the classification result of the ith classifier, x is the input, y ∈ Y and Y is the category set. [ f i ( x ) = y ] indicates that the classification result of the ith classifier meeting the conditions is the category y and combines with the voting weight w i of the classifier.</p></sec><sec id="s5"><title>5. Designing the Classifier Combining Unsupervised and Supervised Learning</title><p>Supervised learning systems are domain-speciﬁc and annotating a large-scale corpus for each domain is very expensive [<xref ref-type="bibr" rid="scirp.102423-ref46">46</xref>]. Recently, semi-supervised learning, which uses a large amount of unlabeled data together with labeled data to build better learners, has attracted more and more attention in pattern recognition and classiﬁcation [<xref ref-type="bibr" rid="scirp.102423-ref47">47</xref>]. In the paper we design a novel classifier of semi-supervised learning, namely combining unsupervised and supervised learning—deep belief network embedded with Softmax regress (DBNESR) for FR. DBNESR first learns hierarchical representations of feature by greedy layer-wise unsupervised learning in a feed-forward (bottom-up) and back-forward (top-down) manner [<xref ref-type="bibr" rid="scirp.102423-ref48">48</xref>] and then makes more efficient classification with Softmax regress by supervised learning. Deep belief network (DBN) is a representative deep learning algorithm, has deep architecture that is composed of multiple levels of non-linear operations [<xref ref-type="bibr" rid="scirp.102423-ref49">49</xref>], which is expected to perform well in semi-supervised learning, because of its capability of modeling hard artiﬁcial intelligent tasks [<xref ref-type="bibr" rid="scirp.102423-ref50">50</xref>]. Softmax regression is a generalization of the logistic regression in many classification problems.</p><p>1) Problem formulation</p><p>The dataset is represented as a matrix:</p><p>X = [ X 1 , X 2 , ⋯ , X N + M ] = [ x 1 1 x 1 2 ⋯ x 1 N + M x 2 1 x 2 2 ⋯ x 2 N + M ⋮ ⋮ ⋱ ⋮ x D 1 x D 2 ⋯ x 1 N + M ] (37)</p><p>where, N is the number of training samples, M is the number of test samples, D is the number of feature values in the dataset. Each column of X corresponds to a sample X. A sample which has all features is viewed as a vector in ℝ D , where the jth coordinate corresponds to the jth feature.</p><p>Let Y be a set of labels correspond to L labeled training samples and is denoted as:</p><p>Y L = [ Y 1 , Y 2 , ⋯ , Y L ] = [ y 1 1 y 1 2 ⋯ y 1 L y 2 1 y 2 2 ⋯ y 2 L ⋮ ⋮ ⋱ ⋮ y C 1 y C 2 ⋯ y C L ] (38)</p><p>where, C is the number of classes. Each column of Y is a vector in ℝ C , where, the jth coordinate corresponds to the jth class:</p><p>y j = { 1 if   X ∈ j th   class 0 if   X ∉ j th   class (39)</p><p>We intend to seek the mapping function X → Y L using all the samples in order to determine Y when a new X comes.</p><p>2) Softmax regression</p><p>Softmax regression is a generalization of the logistic regression in many classification problems [<xref ref-type="bibr" rid="scirp.102423-ref51">51</xref>]. Logistic regression is for binary classification problems, class tag Y ( i ) ∈ { 0 , 1 } . The hypothesis function is as following:</p><p>h ϕ ( X ) = 1 1 + exp ( − ϕ T X ) (40)</p><p>Training model parameters vector ϕ ∈ ℝ D + 1 , which can minimize the cost function:</p><p>J ( ϕ ) = − 1 L [ ∑ i = 1 L Y ( i ) log h ϕ ( X ( i ) ) + ( 1 − Y ( i ) ) log ( 1 − h ϕ ( X ( i ) ) ) ] (41)</p><p>Softmax regression is for many classification problems, class tag Y ( i ) ∈ { 1 , 2 , ⋯ , k } . It is used for each given sample X, using hypothesis function to estimate the probability value <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x121.png" xlink:type="simple"/></inline-formula> for each category j. The hypothesis function is as following:</p><disp-formula id="scirp.102423-formula56"><label>(42)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x122.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x123.png" xlink:type="simple"/></inline-formula>denote model parameters vector, the cost function is as following:</p><disp-formula id="scirp.102423-formula57"><label>(43)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x124.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x125.png" xlink:type="simple"/></inline-formula>denotes:</p><disp-formula id="scirp.102423-formula58"><label>(44)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x126.png"  xlink:type="simple"/></disp-formula><p>There are no closed form solutions to minimize the cost function Equation (43) at present. Therefore, we use the iterative optimization algorithm (for example, gradient descent method or L-BFGS). After derivation we get gradient formula is as following:</p><disp-formula id="scirp.102423-formula59"><label>(45)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x127.png"  xlink:type="simple"/></disp-formula><p>Then make the following update operation:</p><disp-formula id="scirp.102423-formula60"><label>(46)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x128.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x129.png" xlink:type="simple"/></inline-formula>denotes learning rate.</p><p>3) Deep belief network embedded with Softmax regress (DBNESR)</p><p>DBN uses a Markov random ﬁeld Restricted Boltzmann Machine (RBM) [<xref ref-type="bibr" rid="scirp.102423-ref52">52</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref53">53</xref>] of unsupervised learning networks as building blocks for the multi-layer learning systems and uses a supervised learning algorithm named BP (back propagation) for fine-tuning after pre-training. Its architecture is shown in <xref ref-type="fig" rid="fig7">Figure 7</xref>. The deep architecture is a fully interconnected directed belief nets with one input layer<inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x130.png" xlink:type="simple"/></inline-formula>, <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x131.png" xlink:type="simple"/></inline-formula>hidden layers<inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x132.png" xlink:type="simple"/></inline-formula>, and one labeled layer at the top. The input layer <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x132.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x133.png" xlink:type="simple"/></inline-formula> has D units, equal to the number of features of samples. The label layer has C units, equal to the number of classes of label vector Y. The numbers of units for hidden layers, currently, are pre-deﬁned according to the experience or intuition. The seeking of the mapping function, here, is transformed to the problem of ﬁnding the parameter space <inline-formula><inline-graphic xlink:href="/html.scirp.org/file/7-2870351x132.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x133.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x134.png" xlink:type="simple"/></inline-formula> for the deep architecture [<xref ref-type="bibr" rid="scirp.102423-ref54">54</xref>].</p><p>The semi-supervised learning method based on DBN architecture can be divided into two stages: First, DBN architecture is constructed by greedy layer-wise unsupervised learning using RBM as building blocks. All samples are utilized to find the parameter space W with N layers. Second, DBN architecture is trained</p><p>according to the log-likelihood using gradient descent method. As it is difﬁcult to optimize a deep architecture using supervised learning directly, the unsupervised learning stage can abstract the hierarchical representations feature effectively, and prevent over-ﬁtting of the supervised training. The algorithm BP is used pass the error top-down for fine-tuning after pre-training.</p><p>For unsupervised learning, we define the energy of the joint configuration <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x136.png" xlink:type="simple"/></inline-formula> as [<xref ref-type="bibr" rid="scirp.102423-ref50">50</xref>]:</p><disp-formula id="scirp.102423-formula61"><label>(47)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x137.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula>are the model parameters: <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula>is the symmetric interaction term between unit i in the layer <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula> and unit j in the layer<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula>,<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula>is the ith bias of layer <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula> is the jth bias of layer<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x146.png" xlink:type="simple"/></inline-formula>. <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x147.png" xlink:type="simple"/></inline-formula>is the number of units in the kth layer. The network assigns a probability to every possible data via this energy function. The probability of a training data can be raised by adjusting the weights and biases to lower the energy of that data and to raise the energy of similar, confabulated data that <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x147.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x148.png" xlink:type="simple"/></inline-formula> would prefer to the real data. When we input the value of<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x147.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x148.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x149.png" xlink:type="simple"/></inline-formula>, the network can learn the content of <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x138.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x139.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x140.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x141.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x142.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x143.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x144.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x145.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x146.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x147.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x148.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x149.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x150.png" xlink:type="simple"/></inline-formula> by minimizing this energy function.</p><p>The probability that the model assigns to a <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x151.png" xlink:type="simple"/></inline-formula> is:</p><disp-formula id="scirp.102423-formula62"><label>(48)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x152.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.102423-formula63"><label>(49)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x153.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x154.png" xlink:type="simple"/></inline-formula>denotes the normalizing constant. The conditional distributions over <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x154.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x155.png" xlink:type="simple"/></inline-formula> and <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x154.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x155.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x156.png" xlink:type="simple"/></inline-formula> are given as:</p><disp-formula id="scirp.102423-formula64"><label>(50)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x157.png"  xlink:type="simple"/></disp-formula><disp-formula id="scirp.102423-formula65"><label>(51)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x158.png"  xlink:type="simple"/></disp-formula><p>The probability of turning unit j is a logistic function of the states <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x159.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x159.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x160.png" xlink:type="simple"/></inline-formula>:</p><disp-formula id="scirp.102423-formula66"><label>(52)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x161.png"  xlink:type="simple"/></disp-formula><p>The probability of turning unit i is a logistic function of the states of <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x162.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x162.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x163.png" xlink:type="simple"/></inline-formula>:</p><disp-formula id="scirp.102423-formula67"><label>(53)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x164.png"  xlink:type="simple"/></disp-formula><p>where, the logistic function been chosen is the sigmoid function:</p><disp-formula id="scirp.102423-formula68"><label>(54)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x165.png"  xlink:type="simple"/></disp-formula><p>The derivative of the log-likelihood with respect to the model parameter <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x166.png" xlink:type="simple"/></inline-formula> can be obtained from Equation (48):</p><disp-formula id="scirp.102423-formula69"><label>(55)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x167.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula>denotes an expectation with respect to the data distribution and <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula> denotes an expectation with respect to the distribution defined by the model [<xref ref-type="bibr" rid="scirp.102423-ref55">55</xref>]. The expectation <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x170.png" xlink:type="simple"/></inline-formula> cannot be computed analytically. In practice, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x171.png" xlink:type="simple"/></inline-formula>is replaced by<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x172.png" xlink:type="simple"/></inline-formula>, which denotes a distribution of samples when the feature detectors are being driven by reconstructed<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x172.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x173.png" xlink:type="simple"/></inline-formula>. This is an approximation to the gradient of a different objective function, called the contrastive divergence (CD) [<xref ref-type="bibr" rid="scirp.102423-ref56">56</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref57">57</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref58">58</xref>] [<xref ref-type="bibr" rid="scirp.102423-ref59">59</xref>]. Using Kullback-Leibler distance to measure two probability distribution “diversity”, represented by<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x168.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x169.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x170.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x171.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x172.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x173.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x174.png" xlink:type="simple"/></inline-formula>, is shown in Equation (56):</p><disp-formula id="scirp.102423-formula70"><label>(56)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x175.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula>denotes joint probability distribution of initial state of RBM network, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula>denotes joint probability distribution of RBM network after n transformations of Markov chain Monte Carlo(MCMC), <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula>denotes joint probability distribution of RBM network at the ends of MCMC. Therefore, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula>can be regarded as a measure location for <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula> between <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula>. It constantly assigns <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula> to <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x184.png" xlink:type="simple"/></inline-formula> and gets new <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x185.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x186.png" xlink:type="simple"/></inline-formula>. The experiments show that <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x187.png" xlink:type="simple"/></inline-formula> will tend to zero and the accuracy is approximate of MCMC after making slope for r times for correction parameter<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x176.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x177.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x178.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x179.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x180.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x181.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x182.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x183.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x184.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x185.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x186.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x187.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x188.png" xlink:type="simple"/></inline-formula>. The training process of RBM is shown in <xref ref-type="fig" rid="fig8">Figure 8</xref>.</p><p>We can get Equation (57) by training process of RBM using contrastive divergence:</p><disp-formula id="scirp.102423-formula71"><label>(57)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x189.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x190.png" xlink:type="simple"/></inline-formula>is the learning rate. Then the parameter can be adjusted through:</p><disp-formula id="scirp.102423-formula72"><label>(58)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x191.png"  xlink:type="simple"/></disp-formula><p>where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x192.png" xlink:type="simple"/></inline-formula>is the momentum.</p><p>The above discussion is based on the training of the parameters between hidden layers with one sample x. For unsupervised learning, we construct the deep architecture using all samples by inputting them one by one from layer<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula>, train the parameters between <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula>. Then <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula> is constructed, the value of <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula> is calculated by <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula> and the trained parameters between <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula> and<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula>. We also can use it to construct the next layer <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula> and so on. The deep architecture is constructed layer by layer from bottom to top. In each time, the parameter space <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x202.png" xlink:type="simple"/></inline-formula> is trained by the calculated data in the <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x202.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x203.png" xlink:type="simple"/></inline-formula> layer. Accord to the <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x202.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x203.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x204.png" xlink:type="simple"/></inline-formula> calculated above, the layer <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x202.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x203.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x205.png" xlink:type="simple"/></inline-formula> is obtained as below for a sample x fed from layer<inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x193.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x194.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x195.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x196.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x197.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x198.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x199.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x200.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x201.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x202.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x203.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x204.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x205.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x206.png" xlink:type="simple"/></inline-formula>:</p><disp-formula id="scirp.102423-formula73"><label>(59)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x207.png"  xlink:type="simple"/></disp-formula><p>For supervised learning, the DBM architecture is trained by C labeled data. The optimization problem is formulized as:</p><disp-formula id="scirp.102423-formula74"><label>(60)</label><graphic position="anchor" xlink:href="//html.scirp.org/file/7-2870351x209.png"  xlink:type="simple"/></disp-formula><p>namely, to minimize cross-entropy. Where, <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x210.png" xlink:type="simple"/></inline-formula>denotes the real label probability and <inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x210.png" xlink:type="simple"/></inline-formula><inline-formula><inline-graphic xlink:href="//html.scirp.org/file/7-2870351x211.png" xlink:type="simple"/></inline-formula> denotes the model label probability.</p><p>The greedy layer-wise unsupervised learning is just used to initialize the parameter of deep architecture, the parameters of the deep architecture are updated based on Equation (58). After initialization, real values are used in all the nodes of the deep architecture. We use gradient-descent through the whole deep architecture to retrain the weights for optimal classification.</p></sec><sec id="s6"><title>6. Experiments</title><p>1) Face Recognition Databases</p><p>We selected some typical databases of images, for example ORL Face Database, which consists of 10 different images for each of the 40 distinct individuals. Each people is imaged in different facial expressions and facial details under varying lighting conditions at different times. All the pictures are captured with a dark background and the individuals are in an upright and frontal position; the facial gestures are not identical, expressions, position, angle and scale are some different; The depth rotation and plane rotary can be up to 20˚, the scale of faces also has as much as 10% change. For each face database as above, we randomly choose a part of images as training data and the remaining as testing data. In this paper, in order to reflect the universality and high efficiency of all classification algorithms we randomly choose about 50% of each individual image as training data and the rest as testing data. At first all images will be made preprocessing and feature extraction.</p><p>All the experiments are carried out in MATLAB R2010b environment running on a desktop with Intel<sup>&#210;</sup> Core<sup>TM</sup>2 Duo CPU T6670 @2.20GHz and 4.00 GB RAM.</p><p>2) Relevant experiments</p><p>Experiment 1. In this experiment, we use median filtering to make smoothing denoising for images preprocessing and get the sample <xref ref-type="fig" rid="fig9">Figure 9</xref> as following:</p><p>Seeing from the comparison of face images, the face images after filtering eliminate most of noise interference.</p><p>Experiment 2. In this experiment, we make histogram equalization for the images preprocessing and get the sample figures as following:</p><p>From <xref ref-type="fig" rid="fig1">Figure 1</xref>0 and <xref ref-type="fig" rid="fig1">Figure 1</xref>1 we can see: After histogram equalization, the distribution of image histogram is more uniform, the range of gray increases some and the contrast has also been stronger. In addition, the image after histogram equalization basically eliminated the influence of illumination, expanded the representation range of pixel gray, improved the contrast of image,</p><p>made the facial features more evident and is conducive to follow-up feature extraction and FR.</p><p>Experiment 3. In this experiment, we employ multi-scale two-dimensional wavelet transform to generate the initial geometric features for representing face images. By the experiment we get the sample figures as following:</p><p>From <xref ref-type="fig" rid="fig1">Figure 1</xref>2 we can see: Although for compression of images (or dimensionality-reduced), LL sub-graph information capacity has decreased some, but still has very high resolution and the energy of wavelet domain did not decrease a lot. LL sub-graph can be well made for the follow-up feature extraction.</p><p>Experiment 4. In this experiment, we extract features respectively with PCA and 2D-PCA and compare their effects as following:</p><p>From <xref ref-type="fig" rid="fig1">Figure 1</xref>3 we can see that the first several principal components contribution rates extracted with 2D-PCA are higher than the first several principal components contribution rates extracted with PCA. From <xref ref-type="fig" rid="fig1">Figure 1</xref>4 we can see when the principal components are extracted for 20, the principal component</p><p>contribution rate of 2D-PCA is greater than 90%, while the principal component contribution rate of PCA is less than 80%. Accordingly, 2D-PCA can use less principal component to better describe the image than PCA.</p><p><xref ref-type="fig" rid="fig1">Figure 1</xref>5 is the comparing results image of reconstruction with the feature respectively extracted with PCA and 2D-PCA. We can see that the images of reconstruction by 2D-PCA are clearer than the images of reconstruction by PCA when extracting same number of principal components. The reconstruction face extracted 50 principal components by 2D-PCA is almost same clear with the original image. 2D-PCA has better effect than PCA.</p><p>Experiment 5. In this experiment, we compare the recognition rate of the methods respectively based on PCA + BP, WT + PCA + BP, PCA + HBPNNs and WT + PCA + HBPNNs. The experiment is repeated many times and takes the average recognition rate. The experimental results are shown in <xref ref-type="table" rid="table1">Table 1</xref>.</p><p>As shown in <xref ref-type="table" rid="table1">Table 1</xref>, Recognition rates of HBPNNs are improved very greatly being compared to BP, in the same classifier (BP or HBPNNs) recognition rates of the methods based on WT + PCA are higher than them based on PCA.</p><p>Experiment 6. This experiment compares the recognition rate of the methods respectively based on WT + 2D-PCA + RBF and WT + 2D-PCA + HRBFNNs. The experiment is repeated for many times and takes the average recognition rate. The experimental results are shown in <xref ref-type="table" rid="table2">Table 2</xref>.</p><p>As shown in <xref ref-type="table" rid="table2">Table 2</xref>, Recognition rates of HRBFNNs are improved very greatly being compared to RBF. Therefore, HRBFNNs being used for FR is more feasible.</p><p>Experiment 7. Because SVM is essentially the classifier for two types, solving</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Average recognition rates of different recognition methods</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >Recognition method</th><th align="center" valign="middle" >Recognition rate/%</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >PCA + BP</td><td align="center" valign="middle" >66.2</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >WT + PCA + BP</td><td align="center" valign="middle" >67.29</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >PCA + HBPNNs</td><td align="center" valign="middle" >91.7</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >WT + PCA + HBPNNs</td><td align="center" valign="middle" >93.3</td></tr></tbody></table></table-wrap><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Average recognition rates of different recognition methods</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >Recognition method</th><th align="center" valign="middle" >Recognition rate/%</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >WT + 2D-PCA + RBF</td><td align="center" valign="middle" >90.5</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >WT + 2D-PCA + HRBFNNs</td><td align="center" valign="middle" >95.5</td></tr></tbody></table></table-wrap><p>the multiple classification problems needs to reconstruct more appropriate classifier. We will use two methods of “One-Against-One” and “One-Against-the Rest” for the experiment and choose the method with better effect to construct the multiple classification classifiers of SVM. The experiment is repeated for 20 times and takes the average recognition rate. The experimental results are shown in <xref ref-type="table" rid="table3">Table 3</xref>.</p><p>As shown in <xref ref-type="table" rid="table3">Table 3</xref>, “One-Against-One” SVM has higher recognition rate than “One-Against-the-Rest” SVM and at the same time has lower wrong number. Therefore, we use the way of “One-Against-One” to reconstruct the SVM classifier to realize FR.</p><p>Experiment 8. In the paper we construct the multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier. In this experiment, in order to show the efficiency of MCDFC, we first make recognition experiment respectively based on HBPNNs, HRBFNNs and SVM, then use the decision function to make fusions for classification results of three classifiers and get classification results of MCDFC. The experiment is repeated for 20 times and the experimental results are shown in <xref ref-type="table" rid="table4">Table 4</xref> and in <xref ref-type="fig" rid="fig1">Figure 1</xref>6.</p><p>As shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>6, the recognition effect of MCDFC is always not lower than the average level of other three kinds of classifiers and in almost all cases the effect of MCDFC is optimal.</p><p>To eliminate the error of single experiment and greatly reduce the random uncertainty, <xref ref-type="table" rid="table5">Table 5</xref> lists the average recognition rates of each classifier for 20 times and the variance of each classifier. It can be seen from the experimental results that the multiple classification decision fusion classifier (MCDFC)—hybrid HBPNNs-HRBFNNs-SVM classifier has the best effect for FR, has the minimum variance, can effectively improve the generalization ability and has high stability.</p><p>Experiment 9. In this experiment, in order to validate the performance of our proposed algorithm—DBNESR is optimal for FR, we compare our proposed algorithm with some other methods such as BP, HBPNNs, RBF, HRBFNNs, SVM and MCDFC.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Average recognition rates of different recognition methods</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >Recognition method</th><th align="center" valign="middle" >Recognition rate/%</th><th align="center" valign="middle" >Wrong number</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >One-Against-One SVM</td><td align="center" valign="middle" >95.05</td><td align="center" valign="middle" >9.9</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >One-Against-the-Rest SVM</td><td align="center" valign="middle" >90.45</td><td align="center" valign="middle" >19.1</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Recognition rates of different recognition methods for 20 times</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Algorithm</th><th align="center" valign="middle" >1</th><th align="center" valign="middle" >2</th><th align="center" valign="middle" >3</th><th align="center" valign="middle" >4</th><th align="center" valign="middle" >5</th><th align="center" valign="middle" >6</th><th align="center" valign="middle" >7</th><th align="center" valign="middle" >8</th><th align="center" valign="middle" >9</th><th align="center" valign="middle" >10</th><th align="center" valign="middle" >11</th><th align="center" valign="middle" >12</th><th align="center" valign="middle" >13</th><th align="center" valign="middle" >14</th><th align="center" valign="middle" >15</th><th align="center" valign="middle" >16</th><th align="center" valign="middle" >17</th><th align="center" valign="middle" >18</th><th align="center" valign="middle" >19</th><th align="center" valign="middle" >20</th></tr></thead><tr><td align="center" valign="middle" >HBPNNs</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.91</td></tr><tr><td align="center" valign="middle" >HRBFNNs</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.97</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td></tr><tr><td align="center" valign="middle" >SVM</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.92</td></tr><tr><td align="center" valign="middle" >MCDFC</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.97</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Average recognition rates and variances of different recognition methods</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >Recognition method</th><th align="center" valign="middle" >Average recognition rate/%</th><th align="center" valign="middle" >Variance</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >HBPNNs</td><td align="center" valign="middle" >91.2</td><td align="center" valign="middle" >0.0002537</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >HRBFNNs</td><td align="center" valign="middle" >93.85</td><td align="center" valign="middle" >0.0002476</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >SVM</td><td align="center" valign="middle" >93.6</td><td align="center" valign="middle" >0.0002147</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >MCDFC</td><td align="center" valign="middle" >94.15</td><td align="center" valign="middle" >0.0001424</td></tr></tbody></table></table-wrap><p>In the experiment we set up different hidden layers and each hidden layer with different neurons. The architecture of DBNESR is similar with DBN, but with a different loss function introduced for supervised learning stage. For greedy layer-wise unsupervised learning we train the weights of each layer independently with the different epochs, we also make fine-tuning supervised learning for the different epochs. All DBNESR structures and learning epochs used in this experiment are separately shown in <xref ref-type="table" rid="table6">Table 6</xref>. The number of units in input layer is the same as the feature dimensions of the dataset.</p><p>Almost all the recognition rates of these DBNESR structures are more than 90%, in particular the effects of the models of 500-1000-40 and 1000-500-40 are</p><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Different hidden layers of DBNESR and learning epochs used in this experiment</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >DBNESR structures</th><th align="center" valign="middle" >Unsupervised learning epochs</th><th align="center" valign="middle" >Supervised learning epochs</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >400-200-100-50-20-40</td><td align="center" valign="middle" >10</td><td align="center" valign="middle" >1000</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >400-200-100-100-50-40</td><td align="center" valign="middle" >50</td><td align="center" valign="middle" >100</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >400-200-300-100-50-40</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >20</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >400-200-300-100-40</td><td align="center" valign="middle" >50</td><td align="center" valign="middle" >50</td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" >400-200-300-200-40</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >20</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" >200-200-300-400-40</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" >200-300-400-40</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >100</td></tr><tr><td align="center" valign="middle" >8</td><td align="center" valign="middle" >400-300-200-40</td><td align="center" valign="middle" >100</td><td align="center" valign="middle" >200</td></tr><tr><td align="center" valign="middle" >9</td><td align="center" valign="middle" >400-200-300-40</td><td align="center" valign="middle" >200</td><td align="center" valign="middle" >100</td></tr><tr><td align="center" valign="middle" >10</td><td align="center" valign="middle" >500-400-40</td><td align="center" valign="middle" >200</td><td align="center" valign="middle" >200</td></tr><tr><td align="center" valign="middle" >11</td><td align="center" valign="middle" >500-1000-40</td><td align="center" valign="middle" >200</td><td align="center" valign="middle" >200</td></tr><tr><td align="center" valign="middle" >12</td><td align="center" valign="middle" >1000-500-40</td><td align="center" valign="middle" >200</td><td align="center" valign="middle" >200</td></tr></tbody></table></table-wrap><table-wrap id="table7" ><label><xref ref-type="table" rid="table7">Table 7</xref></label><caption><title> Recognition rates of different recognition methods for 20 times</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Algorithm</th><th align="center" valign="middle" >1</th><th align="center" valign="middle" >2</th><th align="center" valign="middle" >3</th><th align="center" valign="middle" >4</th><th align="center" valign="middle" >5</th><th align="center" valign="middle" >6</th><th align="center" valign="middle" >7</th><th align="center" valign="middle" >8</th><th align="center" valign="middle" >9</th><th align="center" valign="middle" >10</th><th align="center" valign="middle" >11</th><th align="center" valign="middle" >12</th><th align="center" valign="middle" >13</th><th align="center" valign="middle" >14</th><th align="center" valign="middle" >15</th><th align="center" valign="middle" >16</th><th align="center" valign="middle" >17</th><th align="center" valign="middle" >18</th><th align="center" valign="middle" >19</th><th align="center" valign="middle" >20</th></tr></thead><tr><td align="center" valign="middle" >BP</td><td align="center" valign="middle" >0.65</td><td align="center" valign="middle" >0.655</td><td align="center" valign="middle" >0.68</td><td align="center" valign="middle" >0.675</td><td align="center" valign="middle" >0.645</td><td align="center" valign="middle" >0.645</td><td align="center" valign="middle" >0.805</td><td align="center" valign="middle" >0.64</td><td align="center" valign="middle" >0.665</td><td align="center" valign="middle" >0.635</td><td align="center" valign="middle" >0.635</td><td align="center" valign="middle" >0.68</td><td align="center" valign="middle" >0.625</td><td align="center" valign="middle" >0.625</td><td align="center" valign="middle" >0.7</td><td align="center" valign="middle" >0.8</td><td align="center" valign="middle" >0.635</td><td align="center" valign="middle" >0.65</td><td align="center" valign="middle" >0.628</td><td align="center" valign="middle" >0.74</td></tr><tr><td align="center" valign="middle" >HBPNNs</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.91</td></tr><tr><td align="center" valign="middle" >RBF</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.875</td><td align="center" valign="middle" >0.88</td><td align="center" valign="middle" >0.88</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.895</td><td align="center" valign="middle" >0.895</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.85</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.94</td></tr><tr><td align="center" valign="middle" >HRBFNNs</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.905</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.9</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.97</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td></tr><tr><td align="center" valign="middle" >SVM</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.91</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.92</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.92</td></tr><tr><td align="center" valign="middle" >MCDFC</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.93</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.915</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.97</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.935</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.955</td><td align="center" valign="middle" >0.925</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.94</td><td align="center" valign="middle" >0.94</td></tr><tr><td align="center" valign="middle" >DBNESR</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.965</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.965</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.965</td><td align="center" valign="middle" >0.945</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.965</td><td align="center" valign="middle" >0.95</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.96</td><td align="center" valign="middle" >0.965</td></tr></tbody></table></table-wrap><table-wrap id="table8" ><label><xref ref-type="table" rid="table8">Table 8</xref></label><caption><title> Average recognition rates and variances of different recognition methods</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Serial number</th><th align="center" valign="middle" >Recognition method</th><th align="center" valign="middle" >Average recognition rate/%</th><th align="center" valign="middle" >Variance</th></tr></thead><tr><td align="center" valign="middle" >1</td><td align="center" valign="middle" >BP</td><td align="center" valign="middle" >67.06</td><td align="center" valign="middle" >0.0028</td></tr><tr><td align="center" valign="middle" >2</td><td align="center" valign="middle" >HBPNNs</td><td align="center" valign="middle" >91.2</td><td align="center" valign="middle" >0.0002537</td></tr><tr><td align="center" valign="middle" >3</td><td align="center" valign="middle" >RBF</td><td align="center" valign="middle" >90.5</td><td align="center" valign="middle" >0.0005</td></tr><tr><td align="center" valign="middle" >4</td><td align="center" valign="middle" >HRBFNNs</td><td align="center" valign="middle" >93.85</td><td align="center" valign="middle" >0.0002476</td></tr><tr><td align="center" valign="middle" >5</td><td align="center" valign="middle" >SVM</td><td align="center" valign="middle" >93.6</td><td align="center" valign="middle" >0.0002147</td></tr><tr><td align="center" valign="middle" >6</td><td align="center" valign="middle" >MCDFC</td><td align="center" valign="middle" >94.15</td><td align="center" valign="middle" >0.0001424</td></tr><tr><td align="center" valign="middle" >7</td><td align="center" valign="middle" >DBNESR</td><td align="center" valign="middle" >95.63</td><td align="center" valign="middle" >0.0000523</td></tr></tbody></table></table-wrap><p>best and most stable. Therefore, the DBNESR structures used in this experiment are 1000-500-40, which represents the number of units in output layer is 40, and in 2 hidden layers are 1000 and 500 respectively. The learning rate is set to dynamic value, which the initial learning rate is set to 0.1 and becomes smaller as the training error becoming smaller. The experimental results are shown in <xref ref-type="table" rid="table7">Table 7</xref>, <xref ref-type="table" rid="table8">Table 8</xref> and in Figures 17-19.</p><p>As shown in <xref ref-type="table" rid="table7">Table 7</xref>, <xref ref-type="table" rid="table8">Table 8</xref> and in Figures 17-19, our proposed algorithm—DBNESR is optimal for FR, in almost all cases the recognition rates of DBNESR is highest and most stable, namely there is the largest average recognition rate and the smallest variance.</p></sec><sec id="s7"><title>7. Conclusion</title><p>The conducted experiments validate that the proposed algorithm DBNESR is optimal for face recognition with the highest and most stable recognition rates, that is, it successfully implements hierarchical representations’ feature deep learning for face recognition. You can also be sure that it reflects hierarchical representations of feature by DBNESR in terms of its capability of modeling other artiﬁcial intelligent tasks, which is also what we’re going to do in the future.</p></sec><sec id="s8"><title>Acknowledgements</title><p>This research was funded by the National Natural Science Foundation (Grand 61171141, 61573145), the Public Research and Capacity Building of Guangdong Province (Grand 2014B010104001), the Basic and Applied Basic Research of Guangdong Province (Grand 2015A030308018), the Main Project of the Natural Science Fund of Jiaying University (grant number 2017KJZ02) and the key research bases being jointly built by provinces and cities for humanities and social science of regular institutions of higher learning of Guangdong province (Grant number 18KYKT11), the cooperative education program of Ministry of Education (Grant number 201802153047), the college characteristic innovation project of Education Department of Guangdong province in 2019 (Grant number 2019KTSCX169), the authors are greatly thanks to these grants.</p></sec><sec id="s9"><title>Compliance with Ethical Standards</title><p>1) (In Case of Funding) Funding</p><p>This study was funded by the National Natural Science Foundation (grant number 61171141, 61573145), the Public Research and Capacity Building of Guangdong Province (grant number 2014B010104001), the Basic and Applied Basic Research of Guangdong Province (grant number 2015A030308018), the Main Project of the Natural Science Fund of Jiaying University (grant number 2017KJZ02) and the key research bases being jointly built by provinces and cities for humanities and social science of regular institutions of higher learning of Guangdong province (grant number 18KYKT11), the cooperative education program of ministry of education (grant number 201802153047), the college characteristic innovation project of education department of guangdong province in 2019 (grant number 2019KTSCX169).</p><p>2) (If Articles Do Not Contain Studies with Human Participants or Animals by Any of The Authors, Please Select One of The Following Statements) Ethical Approval:</p><p>This article does not contain any studies with human participants or animals performed by any of the authors.</p></sec><sec id="s10"><title>Conflicts of Interest</title><p>Hai-Jun Zhang declares that he has no conflict of interest. Ying-hui Chen declares that she has no conflict of interest.</p></sec><sec id="s11"><title>Cite this paper</title><p>Zhang, H.J. and Chen, Y.H. (2020) Hierarchical Representations Feature Deep Learning for Face Recognition. Journal of Data Analysis and Information Processing, 8, 195-227. https://doi.org/10.4236/jdaip.2020.83012</p></sec></body><back><ref-list><title>References</title><ref id="scirp.102423-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Wright, J., Ma, Y., Mairal, J., et al. (2010) Sparse Representation for Computer Vision and Pattern Recognition. Proceedings of the IEEE, 98, 1031-1044.https://doi.org/10.1109/JPROC.2010.2044470</mixed-citation></ref><ref id="scirp.102423-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Wang, S.J., Yang, J., Sun, M.F., et al. (2012) Sparse Tensor Discriminant Color Space for Face Verification. IEEE Transactions on Neural Networks and Learning Systems, 23, 876-888. https://doi.org/10.1109/TNNLS.2012.2191620</mixed-citation></ref><ref id="scirp.102423-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Y., Zhong, A., Yang, J. and Zhang, D. (2010) LPP Solution Schemes for Use with Face Recognition. Pattern Recognition, 43, 4165-4176.https://doi.org/10.1016/j.patcog.2010.06.016</mixed-citation></ref><ref id="scirp.102423-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Fan, Z.Z., Xu, Y., Zuo, W.M., Yang, J., et al. (2014) Modified Principal Component Analysis: An Integration of Multiple Similarity Subspace Models. IEEE Transactions on Neural Networks and Learning Systems, 25, 1538-1552.https://doi.org/10.1109/TNNLS.2013.2294492</mixed-citation></ref><ref id="scirp.102423-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Yang, W.K., Sun, C.Y. and Zhang, L. (2011) A Multi-Manifold Discriminant Analysis Method for Image Feature Extraction. Pattern Recognition, 44, 1649-1657.https://doi.org/10.1016/j.patcog.2011.01.019</mixed-citation></ref><ref id="scirp.102423-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Y., Li, X., Yang, J., et al. (2013) Integrating Conventional and Inverse Representation for Face Recognition. IEEE Transactions on Cybernetics, 44, 1738-1746.</mixed-citation></ref><ref id="scirp.102423-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Wang, S.J., Zhou, C.G., Chen, Y.H., et al. (2011) A Novel Face Recognition Method Based on Sub-Pattern and Tensor. Neurocomputing, 74, 3553-3564.https://doi.org/10.1016/j.neucom.2011.06.017</mixed-citation></ref><ref id="scirp.102423-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, H.Z., Zhang, Z., Li, Z.M., Chen, Y. and Shi, J. (2014) Improving Representation Based Classification for Robust Face Recognition. Journal of Modern Optics, 61, 961-968. https://doi.org/10.1080/09500340.2014.915064</mixed-citation></ref><ref id="scirp.102423-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Wang, S.J., Chen, H.L., et al. (2014) Face Recognition and Micro-Expression Recognition Based on Discriminant Tensor Subspace Analysis Plus Extreme Learning Machine. Neural Processing Letters, 39, 25-43.https://doi.org/10.1007/s11063-013-9288-7</mixed-citation></ref><ref id="scirp.102423-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Wan, W.G., Zhou, Z.H., Zhao, J.W. and Cao, F.L. (2015) A Novel Face Recognition Method: Using Random Weight Networks and Quasi-Singular Value Decomposition. Neurocomputing, 151, 1180-1186.https://doi.org/10.1016/j.neucom.2014.06.081</mixed-citation></ref><ref id="scirp.102423-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, Z. and Liu, H. (2007) Spectral Feature Selection for Supervised and Unsupervised Learning. Proceedings of the 24th International Conference on Machine Learning, Corvails, June 2007, 1151-1157. https://doi.org/10.1145/1273496.1273641</mixed-citation></ref><ref id="scirp.102423-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Cai, D., Zhang, C.Y. and He, X.F. (2010) Unsupervised Feature Selection for Multi-Cluster Data. Proceedings of the 16th SIGKDD International Conference on Knowledge Discovery and Data Mining, July 2010, Washington DC, 333-342.https://doi.org/10.1145/1835804.1835848</mixed-citation></ref><ref id="scirp.102423-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, Z., Wang, L. and Liu, H. (2010) Efficient Spectral Feature Selection with Minimum Redundancy. Proceedings of the 24th AAAI Conference on Artificial Intelligence, July 2010, Atlanta, 673-678.</mixed-citation></ref><ref id="scirp.102423-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Hou, C.P., Nie, F.P. and Li, X.L. (2011) Joint Embedding Learning and Sparse Regression: A Framework for Unsupervised Feature Selection. IEEE Transactions on Cybernetics, 44, 793-804.</mixed-citation></ref><ref id="scirp.102423-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Ghazali, K.H., Mansor, M.F. and Mustafa, M.M. (2007) A Feature Extraction Technique Using Discrete Wavelet Transform for Image Classification. Proceedings of the 5th Student Conference on Research and Development, Selangor, Malaysia, 12-11 December 2007, 1-4. https://doi.org/10.1109/SCORED.2007.4451366</mixed-citation></ref><ref id="scirp.102423-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Hu, H.F. (2011) Variable Lighting Face Recognition Using Discrete Wavelet Transform. Pattern Recognition Letters, 32, 1526-1534.https://doi.org/10.1016/j.patrec.2011.06.009</mixed-citation></ref><ref id="scirp.102423-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Jemai, O., Zaied, M., Amar, C.B. and Alimi, A.M. (2010) FBWN: An Architecture of Fast Beta Wavelet Networks for Image Classification. Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, 18-23 July 2010, 1-8. https://doi.org/10.1109/IJCNN.2010.5596876</mixed-citation></ref><ref id="scirp.102423-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Huang, K. and Aviyente, S. (2008) Wavelet Feature Selection for Image Classification. IEEE Transactions on Image Processing, 17, 1709-1719.https://doi.org/10.1109/TIP.2008.2001050</mixed-citation></ref><ref id="scirp.102423-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Zhao, M., Li, P. and Liu, Z. (2008) Face Recognition Based on Wavelet Transform Weighted Modular PCA. 2008 Congress on Image and Signal Processing, Sanya, 27-30 May 2008, 589-593. https://doi.org/10.1109/CISP.2008.138</mixed-citation></ref><ref id="scirp.102423-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, B.L., Zhang, H.H. and Ge, S.S. (2004) Face Recognition by Applying Wavelet Subband Representation and Kernel Associative Memory. IEEE Transactions on Neural Networks, 15, 166-177. https://doi.org/10.1109/TNN.2003.820673</mixed-citation></ref><ref id="scirp.102423-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Nefian A.V., Hayes, M.H. (1998) Face Detection and Recognition Using Hidden Markov Models. Proceedings 1998 International Conference on Image Processing, ICIP98 (Cat. No. 98CB36269), Chicago, 7-7 October 1998, 141-145.https://doi.org/10.1109/ICIP.1998.723445</mixed-citation></ref><ref id="scirp.102423-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Vlasenko, B., Prylipko, D., B&amp;ouml;ck, R. and Wendemuth, A. (2013) Modeling Phonetic Pattern Variability in Favor of the Creation of Robust Emotion Classifiers for Real-Life Applications. Computer Speech &amp; Language, 28, 483-500.https://doi.org/10.1016/j.csl.2012.11.003</mixed-citation></ref><ref id="scirp.102423-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">He, L., Lech, M., Maddage, N.C. and Allen, N.B. (2011) Study of Empirical Mode Decomposition and Spectral Analysis for Stress and Emotion Classification in Natural Speech. Biomedical Signal Processing and Control, 6, 139-146.https://doi.org/10.1016/j.bspc.2010.11.001</mixed-citation></ref><ref id="scirp.102423-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Suykens, J.A.K. and Vandewalle, J. (1999) Least Squares Support Vector Machine Classifiers. Neural Processing Letters, 9, 293-300.https://doi.org/10.1023/A:1018628609742</mixed-citation></ref><ref id="scirp.102423-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Lee, C.-C., Mower, E., Busso, C., Lee, S. and Narayanan, S. (2011) Emotion Recognition Using a Hierarchical Binary Decision Tree Approach. Speech Communication, 53, 1162-1171. https://doi.org/10.1016/j.specom.2011.06.004</mixed-citation></ref><ref id="scirp.102423-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Igelnik, B. and Pao, Y.H. (1995) Stochastic Choice of Basis Functions in Adaptive Function Approximation and the Functional-Link Net. IEEE Transactions on Neural Networks, 6, 1320-1329. https://doi.org/10.1109/72.471375</mixed-citation></ref><ref id="scirp.102423-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Pao, Y.H., Park, G.H. and Sobajic, D.J. (1994) Learning and Generalization Characteristics of the Random Vector Functional-Link Net. Neurocomputing, 6, 163-180. https://doi.org/10.1016/0925-2312(94)90053-1</mixed-citation></ref><ref id="scirp.102423-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Y., Zhang, X.F. and Gai, H.C. (2011) Quantum Neural Networks for Face Recognition Classifier. Procedia Engineering, 15, 1319-1323.https://doi.org/10.1016/j.proeng.2011.08.244</mixed-citation></ref><ref id="scirp.102423-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Reddy, K.R.L., Babu, G.R. and Kishore, L. (2010) Face Recognition Based on Eigen Features of Multi Scaled Face Components and an Artificial Neural Network. Procedia Computer Science, 2, 62-74. https://doi.org/10.1016/j.procs.2010.11.009</mixed-citation></ref><ref id="scirp.102423-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Suka, H.-I., Lee, S.-W., Shen, D.G. and the Alzheimer’s Disease Neuroimaging Initiative (2014) Hierarchical Feature Representation and Multimodal Fusion with Deep Learning for AD/MCI Diagnosis. Neuroimage, 101, 569-582.https://doi.org/10.1016/j.neuroimage.2014.06.077</mixed-citation></ref><ref id="scirp.102423-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Han, H., Shan, S.G., Chen, X.L. and Gao, W. (2013) A Comparative Study on Illumination Preprocessing in Face Recognition. Pattern Recognition, 46, 1691-1699. https://doi.org/10.1016/j.patcog.2012.11.022</mixed-citation></ref><ref id="scirp.102423-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Chao, S. (2013) Research and Implement of Face Recognition Based on Neural Network. South China University of Technology, Guangzhou.</mixed-citation></ref><ref id="scirp.102423-ref33"><label>33</label><mixed-citation publication-type="other" xlink:type="simple">L&amp;auml;ngkvist, M., Karlsson, L. and Loutfi, A. (2014) A Review of Unsupervised Feature Learning and Deep Learning for Time-Series Modeling. Pattern Recognition Letters, 42, 11-24. https://doi.org/10.1016/j.patrec.2014.01.008</mixed-citation></ref><ref id="scirp.102423-ref34"><label>34</label><mixed-citation publication-type="other" xlink:type="simple">Xu, Y.J., You, T. and Du, C.L. (2015) An Integrated Micromechanical Model and Bp Neural Network for Predicting Elastic Modulus of 3-D Multi-Phase and Multi-Layer Braided Composite. Composite Structures, 122, 308-315.https://doi.org/10.1016/j.compstruct.2014.11.052</mixed-citation></ref><ref id="scirp.102423-ref35"><label>35</label><mixed-citation publication-type="other" xlink:type="simple">Andrew, N. and Ngiam, J., et al. (2014) http://ufldl.stanford.edu/wiki/index.php/Backpropagation_Algorithm</mixed-citation></ref><ref id="scirp.102423-ref36"><label>36</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, Y.X., Gao, X.D. and Katayama, S. (2015) Weld Appearance Prediction with BP Neural Network Improved by Genetic Algorithm during Disk Laser Welding. Journal of Manufacturing Systems, 34, 53-59.https://doi.org/10.1016/j.jmsy.2014.10.005</mixed-citation></ref><ref id="scirp.102423-ref37"><label>37</label><mixed-citation publication-type="other" xlink:type="simple">Sundararajan, N. and Saratchandran, P. (1998) Parallel Architecture for Artificial Neural Networks: Paradigms and Implementations. IEEE Computer Society Press, 412.</mixed-citation></ref><ref id="scirp.102423-ref38"><label>38</label><mixed-citation publication-type="other" xlink:type="simple">Han, L.Q. (2007) Artificial Neural Networks Tutorial. Beijing University of Posts and Telecommunications Press of China, Beijing, 47-83.</mixed-citation></ref><ref id="scirp.102423-ref39"><label>39</label><mixed-citation publication-type="other" xlink:type="simple">Karami, A. and Guerrero-Zapata, M. (2015) A Hybrid Multiobjective RBF-PSO Method for Mitigating DoS Attacks in Named Data Networking, Neurocomputing, 151, 1262-1282. https://doi.org/10.1016/j.neucom.2014.11.003</mixed-citation></ref><ref id="scirp.102423-ref40"><label>40</label><mixed-citation publication-type="other" xlink:type="simple">Reiner, P. and Wilamowski, B.M. (2015) Efficient Incremental Construction of RBF Networks Using Quasi-Gradient Method. Neurocomputing, 150, 349-356.https://doi.org/10.1016/j.neucom.2014.05.082</mixed-citation></ref><ref id="scirp.102423-ref41"><label>41</label><mixed-citation publication-type="other" xlink:type="simple">Liu, X.F., Bo, L. and Luo, H.L. (2015) Bearing Faults Diagnostics Based on Hybrid LS-SVM and EMD Method. Measurement, 59, 145-166.https://doi.org/10.1016/j.measurement.2014.09.037</mixed-citation></ref><ref id="scirp.102423-ref42"><label>42</label><mixed-citation publication-type="other" xlink:type="simple">Wang, Z.G., Zhao, Z.S., Weng, S.F. and Zhang, C.S. (2015) Solving One-Class Problem with Outlier Examples by SVM. Neurocomputing, 149,100-105.https://doi.org/10.1016/j.neucom.2014.03.072</mixed-citation></ref><ref id="scirp.102423-ref43"><label>43</label><mixed-citation publication-type="other" xlink:type="simple">Al-Hadeethi, H., Abdulla, S., Diykh, M., Deo, R.C. and Green, J.H. (2020) Adaptive Boost LS-SVM Classification Approach for Time-Series Signal Classification in Epileptic Seizure Diagnosis Applications. Expert Systems with Applications, 161, Article ID 113676. https://doi.org/10.1016/j.eswa.2020.113676</mixed-citation></ref><ref id="scirp.102423-ref44"><label>44</label><mixed-citation publication-type="other" xlink:type="simple">Yin, H.P., Jiao, X.G., Chai, Y. and Fang, B. (2015) Scene Classification Based on Single-Layer SAE and SVM. Expert Systems with Applications, 42, 3368-3380.https://doi.org/10.1016/j.eswa.2014.11.069</mixed-citation></ref><ref id="scirp.102423-ref45"><label>45</label><mixed-citation publication-type="other" xlink:type="simple">Liu, X.F. and Bo, L. (2015) Identification of Resonance States of Rotor-Bearing System Using RQA and Optimal Binary Tree SVM. Neurocomputing, 152, 36-44.https://doi.org/10.1016/j.neucom.2014.11.021</mixed-citation></ref><ref id="scirp.102423-ref46"><label>46</label><mixed-citation publication-type="other" xlink:type="simple">Dasgupta, S. and Ng, V. (2009) Mine the Easy, Classify the Hard: A Semi-Supervised Approach to Automatic Sentiment Classification. Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, Suntec, Singapore, August 2009, 701-709. https://doi.org/10.3115/1690219.1690244</mixed-citation></ref><ref id="scirp.102423-ref47"><label>47</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, X. (2007) Semi-Supervised Learning Literature Survey. Technical Report, University of Wisconsin Madison, Madison.</mixed-citation></ref><ref id="scirp.102423-ref48"><label>48</label><mixed-citation publication-type="other" xlink:type="simple">Schmidhuber, J. (2015) Deep Learning in Neural Networks: An Overview. Neural Networks, 61, 85-117. https://doi.org/10.1016/j.neunet.2014.09.003</mixed-citation></ref><ref id="scirp.102423-ref49"><label>49</label><mixed-citation publication-type="other" xlink:type="simple">Bengio, Y. (2007) Learning Deep Architectures for AI. Technical Report, IRO, Universite de Montreal, Montreal.</mixed-citation></ref><ref id="scirp.102423-ref50"><label>50</label><mixed-citation publication-type="other" xlink:type="simple">Hinton, G.E. and Salakhutdinov, R. (2006) Reducing the Dimensionality of Data with Neural Networks. Science, 31, 3504-3507.https://doi.org/10.1126/science.1127647</mixed-citation></ref><ref id="scirp.102423-ref51"><label>51</label><mixed-citation publication-type="other" xlink:type="simple">Hu, W.P., Qian, Y., Soong, F.K. and Wang, Y. (2015) Improved Mispronunciation Detection with Deep Neural Network Trained Acoustic Models and Transfer Learning Based Logistic Regression Classifiers. Speech Communication, 67, 154-166. https://doi.org/10.1016/j.specom.2014.12.008</mixed-citation></ref><ref id="scirp.102423-ref52"><label>52</label><mixed-citation publication-type="other" xlink:type="simple">Fischera, A. and Igelb, C. (2014) Training Restricted Boltzmann Machines: An Introduction. Pattern Recognition, 47, 25-39.https://doi.org/10.1016/j.patcog.2013.05.025</mixed-citation></ref><ref id="scirp.102423-ref53"><label>53</label><mixed-citation publication-type="other" xlink:type="simple">Lopes, N. and Ribeiro, B. (2014) Towards Adaptive Learning with Improved Convergence of Deep Belief Networks on Graphics Processing Units. Pattern Recognition, 47, 114-127. https://doi.org/10.1016/j.patcog.2013.06.029</mixed-citation></ref><ref id="scirp.102423-ref54"><label>54</label><mixed-citation publication-type="other" xlink:type="simple">Zhou, S.S., Chen, Q.C. and Wang, X.L. (2014) Fuzzy Deep Belief Networks for Semi-Supervised Sentiment Classification. Neurocomputing, 131, 312-322.https://doi.org/10.1016/j.neucom.2013.10.011</mixed-citation></ref><ref id="scirp.102423-ref55"><label>55</label><mixed-citation publication-type="other" xlink:type="simple">Salakhutdinov, R. and Murray, I. (2008) On the Quantitative Analysis of Deep Belief Networks. Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, August 2008, 872-879. https://doi.org/10.1145/1390156.1390266</mixed-citation></ref><ref id="scirp.102423-ref56"><label>56</label><mixed-citation publication-type="other" xlink:type="simple">Hinton, G.E. (2002) Training Products of Experts by Minimizing Contrastive Divergence. Neural Computation, 14, 1771-1800.https://doi.org/10.1162/089976602760128018</mixed-citation></ref><ref id="scirp.102423-ref57"><label>57</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, H.-J., Zhang, N. and Xiao, N.-F. (2015) Fire Detection and Identification Method Based on Visual Attention Mechanism. Optik, 126, 5011-5018.https://doi.org/10.1016/j.ijleo.2015.09.167</mixed-citation></ref><ref id="scirp.102423-ref58"><label>58</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, H.-J. and Xiao, N.-F. (2016) Parallel Implementation of Multilayered Neural Networks Based on Map-Reduce on Cloud Computing Clusters. Soft Computing, 20, 1471-1483. https://doi.org/10.1007/s00500-015-1599-3</mixed-citation></ref><ref id="scirp.102423-ref59"><label>59</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, H.-J. and Xiao, N.-F. (2015) Learning Hierarchical Representations for Face Recognition Using Deep Belief Network Embedded with Softmax Regress and Multiple Neural Networks. Proceedings of the 2015 2nd International Workshop on Materials Engineering and Computer Sciences (IWMECS), 1-7https://doi.org/10.2991/iwmecs-15.2015.1</mixed-citation></ref></ref-list></back></article>