<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2022.107155</article-id><article-id pub-id-type="publisher-id">JAMP-118820</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Proximal Support Matrix Machine
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Wan</surname><given-names>Zhang</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Yulan</surname><given-names>Liu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou, China</addr-line></aff><pub-date pub-type="epub"><day>11</day><month>07</month><year>2022</year></pub-date><volume>10</volume><issue>07</issue><fpage>2268</fpage><lpage>2291</lpage><history><date date-type="received"><day>21,</day>	<month>June</month>	<year>2022</year></date><date date-type="rev-recd"><day>25,</day>	<month>July</month>	<year>2022</year>	</date><date date-type="accepted"><day>28,</day>	<month>July</month>	<year>2022</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  In this paper, we have proposed a novel model called proximal support matrix machine (PSMM), which is mainly based on the models of proximal support vector machine (PSVM) and low rank support matrix machine (LRSMM). In design, the PSMM model has comprehensively considered both the relationship between samples of the same class and the structure of rows or columns of matrix data. To a certain extent, our novel model can be regarded as a synthesis of the PSVM model and the LRSMM model. Since the PSMM model is an unconstrained convex problem in essence, we have established an alternating direction method of multipliers algorithm to deal with the proposed model. Finally, since a great deal of experiments on the minst digital database show that the PSMM classifier has a good ability to distinguish two digits with little difference, it encourages us to conduct more complex experiments on MIT face database, INRIA person database, the students face database and Japan female facial expression database. Meanwhile, the final experimental results show that PSMM performs better than PSVM, twin support vector machine, LRSMM and linear twin multiple rank support matrix machine in the demanding image classification tasks.
 
</p></abstract><kwd-group><kwd>PSMM</kwd><kwd> PSVM</kwd><kwd> LRSMM</kwd><kwd> The Alternating Direction Method of Multipliers Al-gorithm</kwd><kwd> Image Classification Tasks</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>Classification is an important research field of machine learning, in which the support vector classifier is a common and important tool. There are many excellent designs of support vector classifiers. For example, the proximal support vector machine (PSVM) [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] has considered the relationship between samples within a class, which makes congener samples have aggregation effect. Subsequently, a great deal of vector classifiers were developed on the basis of PSVM classifier, see [<xref ref-type="bibr" rid="scirp.118820-ref2">2</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref3">3</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref4">4</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref5">5</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref6">6</xref>]. Moreover, deferent from PSVM with single model, the twin support vector machine (TSVM) [<xref ref-type="bibr" rid="scirp.118820-ref7">7</xref>] has designed two relatively small SVM-type models, which makes samples can be quickly classified by a pair of nonparallel separating planes. Inspired by TSVM, there also have developed a lot of twin-type vector classifiers, see [<xref ref-type="bibr" rid="scirp.118820-ref8">8</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref9">9</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref10">10</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref11">11</xref>]. For more innovative designs of support vector classifiers, see [<xref ref-type="bibr" rid="scirp.118820-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref13">13</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref14">14</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref15">15</xref>]. However, with the progress of modern science and technology, data to be processed is often expressed as matrices rather than vectors in application. When using vector classifiers on matrix data, the input matrices have to be reshaped into vectors, which will destroy the structure of rows or columns and thus lose important classification information unique to matrix data. Additionally, the reconstruction of matrix will inevitably lead to the superposition of rows or columns, resulting in a high-dimensional vector. It is not conducive to conduct large-scale testing in matrix classification tasks.</p><p>Realizing the above problems, many works have been proposed to extend the classification strategy of support vector machines to matrix space, which can get the eminent results in matrix classification problems. For example, L. Luo et al. have proposed low rank support matrix machine (LRSMM) [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] on the basis of the soft-margin support vector machine (Soft-SVM) [<xref ref-type="bibr" rid="scirp.118820-ref17">17</xref>], which has considered the correlation of rows and columns of matrix samples by using nuclear norm as the convex approximation of matrix rank. X. Gao et al. have proposed linear twin multiple rank support matrix machine (LTMRSMM) [<xref ref-type="bibr" rid="scirp.118820-ref18">18</xref>] on the basis of TSVM [<xref ref-type="bibr" rid="scirp.118820-ref7">7</xref>], which has considered the practical situation that matrix data are multiple rank. Q. Zheng et al. have proposed the robust support matrix machine [<xref ref-type="bibr" rid="scirp.118820-ref19">19</xref>] on the basis of the bilinear support vector machine [<xref ref-type="bibr" rid="scirp.118820-ref20">20</xref>], which has considered the spatial-temporal structural information of the input matrices and thus can eliminate non-standard noises of samples. For more cutting-edge works, see [<xref ref-type="bibr" rid="scirp.118820-ref21">21</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref22">22</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref23">23</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref24">24</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref25">25</xref>]. The designs of these SMM-type models provide us a new idea to solve matrix classification problems.</p><p>In this paper, to generalize the classification method of PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] to matrix space, we propose the proximal support matrix machine (PSMM) on the basis of PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] and LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>], and we give the novel model as follows:</p><p>min ( W , b ) ∈ ℝ l &#215; m &#215; ℝ     1 2 ( ‖ W ‖ F 2 + b 2 ) + τ ‖ W ‖ ∗ + C 2 ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2</p><p>where X i ∈ ℝ l &#215; m ,   y i ∈ ℝ are given matrix sample and category label for each i and C , τ are two given penalty parameter. In design, our model has absorbed the advantages of PSVM and LRSMM, which has comprehensively considered the relationship between samples within a class and the structure of rows or columns of matrix samples. Meanwhile, since the novel model is essentially a convex problem, it encourages us to establish an alternating direction method of multipliers (ADMM) algorithm [<xref ref-type="bibr" rid="scirp.118820-ref26">26</xref>] to deal with the PSMM model. Finally, we mainly focus on image classification, which is a very significant problem of matrix classification. We conduct a series of comparative experiments on minst digital database [<xref ref-type="bibr" rid="scirp.118820-ref27">27</xref>], MIT face database, INRIA person database [<xref ref-type="bibr" rid="scirp.118820-ref28">28</xref>], the students face database [<xref ref-type="bibr" rid="scirp.118820-ref29">29</xref>] and Japan female facial expression database [<xref ref-type="bibr" rid="scirp.118820-ref30">30</xref>]. The final experimental results show that our method performs better than PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>], TSVM [<xref ref-type="bibr" rid="scirp.118820-ref7">7</xref>], LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] and LTMRSMM [<xref ref-type="bibr" rid="scirp.118820-ref18">18</xref>] in demanding image classification tasks.</p><p>The remainder of this paper is arranged as follows. In Section 2, we give the notations and lemma used in our paper. In Section 3, we recall the models of PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] and LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] related to our novel model. In Section 4, the PSMM model is proposed and then an ADMM algorithm is established to solve it. In Section 5, some experimental analyses are presented to verify the performance of the PSMM classifier. In Section 6, our work is summarized.</p></sec><sec id="s2"><title>2. Preliminaries</title><p>As a matter of convenience, we give the notations and lemma in this section, which will be used in this paper.</p><p>For a given matrix A ∈ ℝ l &#215; m with rank ( A ) = r A and the singular values σ 1 ( A ) ≥ ⋯ ≥ σ r A ( A ) &gt; 0 , we define the singular value decomposition of the matrix A by</p><p>A = U A Σ A V A T ∈ ℝ l &#215; m , (1)</p><p>where U A ∈ ℝ l &#215; r A ,   V A ∈ ℝ m &#215; r A and Σ A is a diagonal matrix with diagonal entries σ 1 ( A ) , ⋯ , σ r A ( A ) . And, the Frobenius norm of the matrix A is denoted as</p><p>‖ A ‖ F = ( ∑ i , j [ A ] i j 2 ) 1 2 = ( ∑ i = 1 r A     σ i ( A ) 2 ) 1 2</p><p>and the nuclear norm of A is denoted as</p><p>‖ A ‖ ∗ = ∑ i = 1 r A     σ i ( A )</p><p>whose subdifferential is denoted by ∂ ‖ A ‖ ∗ . We reshape A into a vector a ∈ ℝ l m with a traditional and common matrix reconstruction approach:</p><p>a : = vec ( A ) : = ( [ A ] 11 , ⋯ , [ A ] 1 l , [ A ] 21 , ⋯ , [ A ] l m ) T</p><p>and define the Frobenius norm ‖ a ‖ = ( ∑ i , j [ A ] i j 2 ) 1 2 = ‖ A ‖ F .</p><p>Lemma 1. (see [<xref ref-type="bibr" rid="scirp.118820-ref31">31</xref>], Theorem 2.1) Given a matrix A ∈ ℝ l &#215; m with rank ( A ) = r A and the singular value decomposition as defined in (1). For γ &gt; 0 , the following problem</p><p>min Q ∈ ℝ l &#215; m     γ ‖ Q ‖ ∗ + 1 2 ‖ Q − A ‖ F 2</p><p>has unique closed-form solution, denoted by Q * , as follows:</p><p>Q * = D γ ( A ) : = U A D γ V A T</p><p>where D γ is a diagonal matrix with [ σ 1 ( A ) − γ ] + , ⋯ , [ σ r A ( A ) − γ ] + as diagonal entries and [ t ] + = max { t ,0 } for any t ∈ ℝ .</p><p>In this paper, we give a matrix data set T m = { ( X i , y i ) } i = 1 n with X i ∈ ℝ l &#215; m and y i ∈ { − 1,1 } , and denote the label vector as y = ( y 1 , ⋯ , y n ) T ∈ ℝ n . For the convenience of calculation, we define a linear mapping A : ℝ l &#215; m → ℝ n by</p><p>A ( A ) = ( 〈 A , X 1 〉 , ⋯ , 〈 A , X n 〉 ) T ∈ ℝ n</p><p>and denote ∘ as the Hadamard product between two matrices. Additionally, we also denote the n-order identity matrix as I and define an all one vector e = ( 1, ⋯ ,1 ) T ∈ ℝ n .</p></sec><sec id="s3"><title>3. Related Work</title><p>The design inspiration of PSMM proposed in this paper mainly comes from PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] and LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>]. In order to show the rationality and advantages of the PSMM model in design, we review and analyze the models of PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] and LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] in this section.</p><sec id="s3_1"><title>3.1. Proximal Support Vector Machine (PSVM)</title><p>Different from Soft-SVM model [<xref ref-type="bibr" rid="scirp.118820-ref17">17</xref>], the PSVM model is described in ( w , b ) space of ℝ l m &#215; ℝ . Additionally, PSVM defines a pair of proximal hyperplanes 〈 w , x 〉 + b = &#177; 1 and constrains samples of the same class to be as close to the same proximal plane as possible. This is because the higher the fitting degree of proximal planes to the samples, the more obvious the boundary between the two types of samples, and then the better the classification effect of classification hyperplane. The strategy of PSVM can be presented as the following model, for details, see [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>]:</p><p>min ( w , b )   1 2 ( ‖ w ‖ 2 + b 2 ) + c 2 ∑ i = 1 n [ 1 − y i ( 〈 w , x i 〉 + b ) ] 2 (2)</p><p>where x i : = vec ( X i ) ∈ ℝ l m for each i is the reconstructed vector sample, ( w , b ) ∈ ℝ l m &#215; ℝ is the decision variable and c ∈ ℝ is a given penalty parameter. A new input X ∈ ℝ l &#215; m has to be reshaped into a vector x ∈ ℝ l m , and then it can be assigned by the decision function: y v ( x ) = sign ( 〈 w , x 〉 + b ) , where sign ( ⋅ ) is the symbolic function.</p><p>Compared with Soft-SVM [<xref ref-type="bibr" rid="scirp.118820-ref17">17</xref>], PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] has considered the relationship between samples within a class, that samples of the same class are required to be close together, so that congener samples have aggregation effect. The advantage of this consideration is that the distribution trend of congener samples can be grasped, which makes PSVM have a good performance in prediction. Meanwhile, the PSVM model (2) is an unconstrained quadratic programming problem in essence, so that the cost of PSVM classifier is lower than that of Soft-SVM classifier. Thus, PSVM has more advantage than Soft-SVM in large database classification, for details, see [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>].</p></sec><sec id="s3_2"><title>3.2. Low Rank Support Matrix Machine (LRSMM)</title><p>Different from vector data, each matrix data has an identifying intrinsic structure, which reveals significant category information in matrix classification. As an important structural information, the correlation of rows or columns of matrix is closely related to the rank of the regression matrix W . Meanwhile, it is an effective measure for filtering the redundant information of samples to impose the low rank constraint on W in application. However, matrix rank minimization problem is a typical NP-hard problem. Realizing this difficulty, inspired by the idea that the nuclear norm is the best convex approximation of matrix rank, LRSMM was developed on the basis of Soft-SVM [<xref ref-type="bibr" rid="scirp.118820-ref17">17</xref>], which has achieved the low-rank target of samples by minimizing the nuclear norm of W . The LRSMM model is presented as follows, for details, see [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>]:</p><p>min W , b   1 2 ‖ W ‖ F 2 + τ ‖ W ‖ ∗ + C ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] + (3)</p><p>where W ∈ ℝ l &#215; m ,   b ∈ ℝ are decision variables, C ∈ ℝ and τ ∈ ℝ + + are two given penalty parameters. A new input X ∈ ℝ l &#215; m can be assigned by the decision function: y m ( X ) = sign ( 〈 W , X 〉 + b ) .</p></sec></sec><sec id="s4"><title>4. Proximal Support Matrix Machine (PSMM)</title><p>In this section, we propose a novel matrix classification method PSMM on the basis of PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>] and LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>], which has generalized the classification method of PSVM to matrix space and also has inherited the design in LRSMM of constraining matrix samples to be low-rank. The formulation of our novel model is given as follows:</p><p>min ( W , b )   1 2 ( ‖ W ‖ F 2 + b 2 ) + τ ‖ W ‖ ∗ + C 2 ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2 (4)</p><p>where ( W , b ) ∈ ℝ l &#215; m &#215; ℝ is the decision variable, C ∈ ℝ and τ ∈ ℝ + + are two given penalty parameters. In order to take into consideration the relationship between congener samples, PSMM also defines a pair of proximal hyperplanes 〈 W , X 〉 + b = &#177; 1 . And, the formula [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2 is used to calculate the Euclidean distance between the ith sample and the proximal hyperplane. In (4), we can achieve the target that samples of the same class should gather near the same proximal plane by minimizing the sum of these Euclidean distances. Additionally, to meet the need that matrix samples should be low-rank in application, the nuclear norm of the regression matrix W is introduced into our objective function in the inspiration of LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>].</p><p>The design of the PSMM model (4) has many advantages. On the one hand, since the nuclear norm of W is the only nonsmooth part in the PSMM model, PSMM algorithm presented below will be simpler than LRSMM algorithm [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] in framework design. And, in the following experiments, it can be found that the cost of PSMM classifier is much lower than that of LRSMM classifier. On the other hand, compared with PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>], PSMM has taken into account the intrinsic structure of matrix data, so that the performance of PSMM classifier is better than that of PSVM classifier in the following matrix classification tasks. Thus, to a certain degree, the novel model (4) is a synthesis of PSVM model (2) and LRSMM model (3).</p><sec id="s4_1"><title>4.1. Model Solution</title><p>Since the PSMM model (4) is a convex problem, we can devise an ADMM-type algorithm [<xref ref-type="bibr" rid="scirp.118820-ref26">26</xref>] to solve the proposed novel model. Firstly, the PSMM model (4) can be equivalently written in the following form:</p><p>min ( W , b ) , S   1 2 ( ‖ W ‖ F 2 + b 2 ) + τ ‖ S ‖ ∗ + C 2 ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2 s . t .   W = S . (5)</p><p>And, its corresponding augmented Lagrangian function is given as follows:</p><p>L ρ ( ( W , b ) , S ; Λ ) = 1 2 ( ‖ W ‖ F 2 + b 2 ) + τ ‖ S ‖ ∗ + C 2 ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2     + 〈 Λ , S − W 〉 + ρ 2 ‖ S − W ‖ F 2 (6)</p><p>where Λ ∈ ℝ l &#215; m is the Lagrangian multiplier of the problem (5) and ρ &gt; 0 is a given parameter of the penalty term. Consequently, we can take advantage of the augmented Lagrangian function (6) to establish the iterative framework of PSMM algorithm, which is based on the framework of ADMM. Given the state variables ( ( W k , b k ) , S k ; Λ k ) at iteration k, the proposed framework for updating variable status is given as follows:</p><p>( W k + 1 , b k + 1 ) = arg min W , b { 1 2 ( ‖ W ‖ F 2 + b 2 ) + C 2 ∑ i = 1 n [ 1 − y i ( 〈 W , X i 〉 + b ) ] 2     + 〈 Λ k , S k − W 〉 + ρ 2 ‖ S k − W ‖ F 2 } (7)</p><p>S k + 1 = arg min S { τ ‖ S ‖ ∗ + 〈 Λ k , S − W k + 1 〉 + ρ 2 ‖ S − W k + 1 ‖ F 2 } (8)</p><p>Λ k + 1 = Λ k + ρ ( S k + 1 − W k + 1 ) (9)</p><p>Note that we have to solve the two optimization problems (7) and (8) on each update. Thus, it is necessary to determine their explicit solutions, which can reduce the cost of our algorithm to a great extent.</p><p>Solving (7) can be transformed into solving the following model:</p><p>min W , b , x   1 2 ( ‖ W ‖ F 2 + b 2 ) + C 2 ‖ x ‖ 2 − 〈 Λ k , W 〉 + ρ 2 ‖ S k − W ‖ F 2 s .t .   1 − y i ( 〈 W , X i 〉 + b ) = ξ i ,   i = 1, ⋯ , n . (10)</p><p>That is because the optimal partial variable of (10), uniformly denoted by ( W k + 1 , b k + 1 ) , is actually the optimal solution of the problem (7). Since the model (10) is essentially a smooth convex optimization problem only with equality constraints, we can solve this problem by its dual problem. The Lagrange function of (10) is given as follows:</p><p>L ( W , b , ξ ; λ ) = 1 2 ( ‖ W ‖ F 2 + b 2 ) + C 2 ‖ ξ ‖ 2 − 〈 Λ k , W 〉 + ρ 2 ‖ S k − W ‖ F 2     + ∑ i = 1 n     λ i [ 1 − y i ( 〈 W , X i 〉 + b ) − ξ i ]</p><p>where λ = ( λ 1 , ⋯ , λ n ) T ∈ ℝ n is the Lagrange multiplier of (10). For each k, we define the following matrices:</p><p>Y ˜ = y y T ,     X ˜ = Y ˜ ∘ [ A ( X 1 ) , ⋯ , A ( X n ) ]</p><p>S ˜ k = y ∘ A ( S k ) ,     Λ ˜ k = y ∘ A ( Λ k )</p><p>Consequently, the Karush-Kuhn-Tucher (KKT) conditions for the model (10) are given as follows:</p><p>( W = 1 1 + ρ ( ρ S k + Λ k + ∑ i = 1 n     y i λ i X i ) b = ∑ i = 1 n     y i λ i ,   ξ i = 1 C λ i ,   i = 1 , ⋯ , n 1 − y i ( 〈 W , X i 〉 + b ) − ξ i = 0,   i = 1, ⋯ , n (11)</p><p>According to duality theory and (11), after simple calculation, we can get the dual problem of (10) as follows:</p><p>max λ ∈ ℝ n     d ( λ ) (12)</p><p>where d ( λ ) has the following expression:</p><p>d ( λ ) = 〈 λ , e 〉 − ρ 1 + ρ 〈 λ , S ˜ k 〉 − 1 1 + ρ 〈 λ , Λ ˜ k 〉 − 1 C ‖ λ ‖ 2     − 1 2 〈 λ , Y ˜ λ 〉 − 1 2 ( 1 + ρ ) 〈 λ , X ˜ λ 〉</p><p>and the optimal solution of (12) is denoted by λ k . Since the dual problem (12) is essentially an unconstrained quadratic programming problem, by the optimality theorem, we find that solving (12) is equivalent to solve the following system:</p><p>∇ λ d = e − ρ 1 + ρ S ˜ k − 1 1 + ρ Λ ˜ k − 2 C λ − Y ˜ λ − 1 1 + ρ X ˜ λ = 0,</p><p>i.e.,</p><p>( 2 C I + Y ˜ + 1 1 + ρ X ˜ ) λ = e − ρ 1 + ρ S ˜ k − 1 1 + ρ Λ ˜ k . (13)</p><p>If the coefficient matrix of (13) is invertible, then the optimal solution λ k of (12) has the explicit exact expression as follows:</p><p>λ k = ( 2 C I + Y ˜ + 1 1 + ρ X ˜ ) − 1 ( e − ρ 1 + ρ S ˜ k − 1 1 + ρ Λ ˜ k ) .</p><p>Therefore, by the KKT system (11), the optimal partial variable of (10) ( W k + 1 , b k + 1 ) has the following explicit exact expression:</p><p>( W k + 1 = 1 1 + ρ ( ρ S k + Λ k + ∑ i = 1 n     y i λ i k X i ) b k + 1 = ∑ i = 1 n     y i λ i k</p><p>which means that the iterative format (7) can be written in a more concise form:</p><p>( W k + 1 , b k + 1 ) = ( 1 1 + ρ ( ρ S k + Λ k + ∑ i = 1 n     y i λ i k X i ) , ∑ i = 1 n     y i λ i k ) .</p><p>Additionally, the problem (8) can be equivalently written as follows:</p><p>arg min S { τ ρ ‖ S ‖ ∗ + 1 2 ‖ S − ( W k + 1 − 1 ρ Λ k ) ‖ F 2 } . (14)</p><p>By Lemma 1, we can deduce the closed-form solution of (14), so that the iterative format (8) has more straightforward expression as follows:</p><p>S k + 1 = D τ ρ ( W k + 1 − 1 ρ Λ k ) .</p></sec><sec id="s4_2"><title>4.2. Convergence</title><p>Since the model (5) is a convex problem only with equality constraints, then the convergence property of PSMM iterative framework can be guaranteed by [<xref ref-type="bibr" rid="scirp.118820-ref26">26</xref>] [<xref ref-type="bibr" rid="scirp.118820-ref32">32</xref>], so that we can get the following results:</p><p>Definition 1. For given parameters C ∈ ℝ and τ ∈ ℝ + + , we say ( ( W * , b * ) , S * ) is the proximal stationary (P-stationary) point of (5) if there exists a Lagrangian multiplier Λ * ∈ ℝ l &#215; m such that</p><p>0 = W * − C ∑ i = 1 n     y i X i [ 1 − y i ( 〈 W * , X i 〉 + b * ) ] − Λ * (15a)</p><p>0 = b * − C ∑ i = 1 n     y i [ 1 − y i ( 〈 W * , X i 〉 + b * ) ] (15b)</p><p>0 ∈ τ   ∂ ‖ S * ‖ ∗ + Λ * (15c)</p><p>0 = S * − W * (15d)</p><p>Theorem 2. Suppose ( ( W * , b * ) , S * , Λ * ) be the limit point of the sequence ( ( W k , b k ) , S k , Λ k ) generated by the PSMM iterative framework, then ( ( W * , b * ) , S * ) is a P-stationary point and thus a locally optimal solution to the problem (5).</p><p>Remark. Since ( W k + 1 , b k + 1 ) minimizes L ρ ( ( W , b ) , S k , Λ k ) , then we have the following relation:</p><p>0 = W k + 1 − C ∑ i = 1 n     y i X i [ 1 − y i ( 〈 W k + 1 , X i 〉 + b k + 1 ) ] − Λ k − ρ ( S k − W k + 1 ) (16)</p><p>0 = b k + 1 − ∑ i = 1 n     y i [ 1 − y i ( 〈 W k + 1 , X i 〉 + b k + 1 ) ] (17)</p><p>Similarly, since S k + 1 minimizes L ρ ( ( W k + 1 , b k + 1 ) , S , Λ k ) , then we also have the following relation:</p><p>0 ∈ τ   ∂ ‖ S k + 1 ‖ ∗ + Λ k + ρ ( S k + 1 − W k + 1 ) (18)</p><p>1) It is not difficult to find that the iteration formula (9) always produces a gap ρ ( S k + 1 − W k + 1 ) between the state variables Λ k and Λ k + 1 at iteration k + 1 , so that there always has a residual between Λ k and the locally optimum Λ * on each update. Meanwhile, from the perspective of feasibility, the locally optimal point of (5) must meet the feasible condition (15d). Nevertheless, W k + 1 and S k + 1 are generally unequal on each update. To this end, we denote E p k + 1 : = S k + 1 − W k + 1 as the primal residual at iteration k + 1 . Aditionally, combining (9) and (16), we can obtain the following equality:</p><p>0 = W k + 1 − C ∑ i = 1 n     y i X i [ 1 − y i ( 〈 W k + 1 , X i 〉 + b k + 1 ) ] − Λ k + 1 + ρ ( S k + 1 − S k )</p><p>which means that there always exists a residual ρ ( S k + 1 − S k ) for the condition (15a) on each iteration. We denote E d k + 1 : = ρ ( S k + 1 − S k ) as the dual residual at iteration k + 1 .</p><p>2) Conversely, by (17), it is easy to know that b k + 1 always satisfies the condition (15b) at iteration k + 1 . Similarly, combining (9) and (18), we can obtain the following equality:</p><p>0 ∈ τ   ∂ ‖ S k + 1 ‖ ∗ + Λ k + 1</p><p>which means that S k + 1 and Λ k + 1 also always satisfy the condition (15c) at iteration k + 1 .</p></sec><sec id="s4_3"><title>4.3. Stopping Criterion</title><p>Based on the above analysis, we know that the PSMM algorithm can obtain the locally optimal solution when the following conditions are satisfied:</p><p>E p k + 1 → 0     as   k → ∞ ,</p><p>E d k + 1 → 0     as   k → ∞ .</p><p>Therefore, inspired by [<xref ref-type="bibr" rid="scirp.118820-ref26">26</xref>], we set a stopping criterion that the decision variables stop updating and output when</p><p>‖ E p k + 1 ‖ F ≤ ε p ,   ‖ E d k + 1 ‖ F ≤ ε d .</p><p>The tolerances ε p , ε d &gt; 0 can be predetermined by the following criterion:</p><p>ε p = n ε a b s + ε r e l max { ‖ W k + 1 ‖ F , ‖ S k + 1 ‖ F }</p><p>ε d = n ε a b s + ε r e l ρ ‖ Λ k + 1 ‖ F</p><p>where ε a b s and ε r e l are absolute and relative tolerances, respectively, and their settings depend on the application, for details, see [<xref ref-type="bibr" rid="scirp.118820-ref26">26</xref>]. Furthermore, the specific procedure of PSMM algorithm is shown as follows:</p><disp-formula id="scirp.118820-formula1"><graphic  xlink:href="//html.scirp.org/file/11-1722858x143.png?20220727181848990"  xlink:type="simple"/></disp-formula></sec></sec><sec id="s5"><title>5. Experiments</title><p>To verify the binary classification effect of our method, we conduct the experimental analysis of PSMM classifier in this section. There are five selected image data sets, including minst digital database [<xref ref-type="bibr" rid="scirp.118820-ref27">27</xref>], MIT face database, INRIA person database [<xref ref-type="bibr" rid="scirp.118820-ref28">28</xref>], the students face database [<xref ref-type="bibr" rid="scirp.118820-ref29">29</xref>] and Japan female facial expression database (JAFFE) [<xref ref-type="bibr" rid="scirp.118820-ref30">30</xref>]. And, we conduct comparative experiments on these data sets with PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>], TSVM [<xref ref-type="bibr" rid="scirp.118820-ref7">7</xref>], LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>], LTMRSMM [<xref ref-type="bibr" rid="scirp.118820-ref18">18</xref>]. We summarize the main information of the selected data sets in <xref ref-type="table" rid="table1">Table 1</xref>. Some pictures of these databases are shown in Figures 1-5. All experiments are implemented in Matlab R2019a on a workstation with AMD A10-7300 Radeon R6 1.90 Hz, 10 Computer Cores 4C+6G, 4 GB RAM, and 64 bit Windows Server 2009 system.</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Summary of five image data sets</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >database</th><th align="center" valign="middle" >size (pos/neg)</th><th align="center" valign="middle" >dimension</th><th align="center" valign="middle" >class number</th></tr></thead><tr><td align="center" valign="middle" >Minst</td><td align="center" valign="middle" >5000/5000</td><td align="center" valign="middle" >28 &#215; 28</td><td align="center" valign="middle" >10</td></tr><tr><td align="center" valign="middle" >MIT</td><td align="center" valign="middle" >2706/4381</td><td align="center" valign="middle" >20 &#215; 20</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >Students face</td><td align="center" valign="middle" >200/200</td><td align="center" valign="middle" >200 &#215; 200</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >INRIA person</td><td align="center" valign="middle" >2416/12180</td><td align="center" valign="middle" >64 &#215; 128</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >JAFFE</td><td align="center" valign="middle" >175/175</td><td align="center" valign="middle" >256 &#215; 256</td><td align="center" valign="middle" >7</td></tr></tbody></table></table-wrap><sec id="s5_1"><title>5.1. Classification on Minst Digital Database</title><p>Minst digital database [<xref ref-type="bibr" rid="scirp.118820-ref27">27</xref>] comes from National Institute of Standards and Technology, which includes handwritten characters from zero to nine. Due to</p><p>the differences in writing habits of volunteers, different digits may have similar contours, resulting in wrong judgment. The purpose why we chose this database is to distinguish handwritten numerals with little difference in contour.</p><p>Through the observation of a large number of handwritten character samples, we found five digital pairs, and the two numbers of any digital pair are similar in morphology. They are (0, 6), (1, 7), (2, 7), (3, 8) and (5, 6), respectively. And, their comparisons are shown in <xref ref-type="fig" rid="fig6">Figure 6</xref>. We conducted binary classification experiments on the two numbers in the specific digital pair. The experimental results show that PSMM performs better than PSVM [<xref ref-type="bibr" rid="scirp.118820-ref1">1</xref>], TSVM [<xref ref-type="bibr" rid="scirp.118820-ref7">7</xref>], LRSMM [<xref ref-type="bibr" rid="scirp.118820-ref16">16</xref>] and LTMRSMM [<xref ref-type="bibr" rid="scirp.118820-ref18">18</xref>] in distinguishing handwritten digits with similar contours, for details, see <xref ref-type="table" rid="table2">Table 2</xref>.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> Experimental results on minst digital database</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Minst</th><th align="center" valign="middle" >PSVM</th><th align="center" valign="middle" >TSVM</th><th align="center" valign="middle" >LRSMM</th><th align="center" valign="middle" >LTMRSMM</th><th align="center" valign="middle" >PSMM</th></tr></thead><tr><td align="center" valign="middle"  rowspan="2"  >Subject</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td></tr><tr><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >(0, 6)</td><td align="center" valign="middle" >97.23</td><td align="center" valign="middle" >98.05</td><td align="center" valign="middle" >97.59</td><td align="center" valign="middle" >97.40</td><td align="center" valign="middle" >98.60</td></tr><tr><td align="center" valign="middle" >0.09434</td><td align="center" valign="middle" >0.21151</td><td align="center" valign="middle" >25.29174</td><td align="center" valign="middle" >4.79779</td><td align="center" valign="middle" >0.79307</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >(1, 7)</td><td align="center" valign="middle" >98.35</td><td align="center" valign="middle" >98.87</td><td align="center" valign="middle" >97.24</td><td align="center" valign="middle" >98.07</td><td align="center" valign="middle" >99.07</td></tr><tr><td align="center" valign="middle" >0.10088</td><td align="center" valign="middle" >0.22491</td><td align="center" valign="middle" >31.38959</td><td align="center" valign="middle" >5.42440</td><td align="center" valign="middle" >0.81284</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >(2, 7)</td><td align="center" valign="middle" >96.02</td><td align="center" valign="middle" >97.54</td><td align="center" valign="middle" >95.80</td><td align="center" valign="middle" >96.77</td><td align="center" valign="middle" >97.87</td></tr><tr><td align="center" valign="middle" >0.10020</td><td align="center" valign="middle" >0.20906</td><td align="center" valign="middle" >23.40696</td><td align="center" valign="middle" >5.40644</td><td align="center" valign="middle" >0.69939</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >(3, 8)</td><td align="center" valign="middle" >91.71</td><td align="center" valign="middle" >92.53</td><td align="center" valign="middle" >91.09</td><td align="center" valign="middle" >93.12</td><td align="center" valign="middle" >94.80</td></tr><tr><td align="center" valign="middle" >0.09304</td><td align="center" valign="middle" >0.21207</td><td align="center" valign="middle" >26.43666</td><td align="center" valign="middle" >5.07218</td><td align="center" valign="middle" >0.55883</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >(5, 6)</td><td align="center" valign="middle" >94.51</td><td align="center" valign="middle" >95.79</td><td align="center" valign="middle" >94.49</td><td align="center" valign="middle" >95.64</td><td align="center" valign="middle" >96.66</td></tr><tr><td align="center" valign="middle" >0.09083</td><td align="center" valign="middle" >0.21531</td><td align="center" valign="middle" >31.84351</td><td align="center" valign="middle" >5.23974</td><td align="center" valign="middle" >0.60023</td></tr></tbody></table></table-wrap></sec><sec id="s5_2"><title>5.2. Classification on Portrait Database</title><p>Through the results on minst digital database, we found that the novel classifier has a good ability to distinguish two categories with little difference. It prompted us to further apply it to more complex portrait classification tasks, such as face recognition, gender judgment, pedestrian detection and emotion recognition, which require classifiers to have the ability to distinguish details. To test the performance of PSMM classifier in these complex tasks, we selected the following four portrait databases.</p><p>1) MIT face database is affiliated to Massachusetts Institute of Technology, with a total of 7087 samples, including 2706 face samples and 4381 no face samples. All images are gray images stored in BMP format, with width and height of 20. We chose it for face recognition experiments which is to judge whether there is a face in picture.</p><p>2) INRIA person database [<xref ref-type="bibr" rid="scirp.118820-ref28">28</xref>] was collected to detect whether there exist people in the image, including 2416 pedestrian pictures and 12,180 landscape pictures. All images are color images stored in JPG or PNG format, and we have normalized the samples into 61 &#215; 128 gray level matrices to unify the specifications of samples.</p><p>3) The students face database [<xref ref-type="bibr" rid="scirp.118820-ref29">29</xref>] contains 400 photos of medical students in Stanford University, which consists of 200 males and 200 females. All images are gray images stored in JPG format, with the width and height of 200. In the process of gender judgment, hair style will be a disturbance, because girls may have short hair and boys may have long hair. Therefore, this database we chose will further test the ability of classifiers to distinguish facial details.</p><p>4) Japan female facial expression database [<xref ref-type="bibr" rid="scirp.118820-ref30">30</xref>] has 213 facial expression images, which are composed of 7 facial expression images of 10 women. The seven expressions are afraid, surprised, happy, sad, angry, disgusted, neutral, respectively. All images are gray images stored in TIFF format, with the width and height of 256. We conducted binary classification experiments between each emotion and the rest to observe the sensitivity of PSMM classifier to facial expression differences.</p><p>Through a large number of preliminary experiments, we have determined the appropriate training set size of each database. And, the stability and progressiveness of the novel algorithm have been verified by properly increasing the number of test samples and massive repeated experiments, for details, see Tables 3-6.</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Experimental results on MIT face database</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >MIT</th><th align="center" valign="middle" >PSVM</th><th align="center" valign="middle" >TSVM</th><th align="center" valign="middle" >LRSMM</th><th align="center" valign="middle" >LTMRSMM</th><th align="center" valign="middle" >PSMM</th></tr></thead><tr><td align="center" valign="middle" >Test set</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td></tr><tr><td align="center" valign="middle" >pos/neg</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >100/100</td><td align="center" valign="middle" >87.70</td><td align="center" valign="middle" >85.95</td><td align="center" valign="middle" >89.10</td><td align="center" valign="middle" >87.00</td><td align="center" valign="middle" >92.60</td></tr><tr><td align="center" valign="middle" >0.00678</td><td align="center" valign="middle" >0.01823</td><td align="center" valign="middle" >9.61162</td><td align="center" valign="middle" >1.53774</td><td align="center" valign="middle" >0.09710</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >200/200</td><td align="center" valign="middle" >89.10</td><td align="center" valign="middle" >86.15</td><td align="center" valign="middle" >88.80</td><td align="center" valign="middle" >85.28</td><td align="center" valign="middle" >92.15</td></tr><tr><td align="center" valign="middle" >0.00753</td><td align="center" valign="middle" >0.01756</td><td align="center" valign="middle" >10.32216</td><td align="center" valign="middle" >1.69821</td><td align="center" valign="middle" >0.09709</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >300/300</td><td align="center" valign="middle" >90.10</td><td align="center" valign="middle" >87.45</td><td align="center" valign="middle" >90.07</td><td align="center" valign="middle" >88.38</td><td align="center" valign="middle" >92.93</td></tr><tr><td align="center" valign="middle" >0.00455</td><td align="center" valign="middle" >0.01756</td><td align="center" valign="middle" >10.73969</td><td align="center" valign="middle" >1.70013</td><td align="center" valign="middle" >0.09879</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >400/400</td><td align="center" valign="middle" >88.09</td><td align="center" valign="middle" >84.64</td><td align="center" valign="middle" >88.45</td><td align="center" valign="middle" >85.65</td><td align="center" valign="middle" >92.72</td></tr><tr><td align="center" valign="middle" >0.00712</td><td align="center" valign="middle" >0.01797</td><td align="center" valign="middle" >10.17160</td><td align="center" valign="middle" >1.65579</td><td align="center" valign="middle" >0.09469</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >500/500</td><td align="center" valign="middle" >90.28</td><td align="center" valign="middle" >88.73</td><td align="center" valign="middle" >91.37</td><td align="center" valign="middle" >87.36</td><td align="center" valign="middle" >93.61</td></tr><tr><td align="center" valign="middle" >0.00705</td><td align="center" valign="middle" >0.01726</td><td align="center" valign="middle" >10.37729</td><td align="center" valign="middle" >1.61703</td><td align="center" valign="middle" >0.10190</td></tr></tbody></table></table-wrap><table-wrap id="table4" ><label><xref ref-type="table" rid="table4">Table 4</xref></label><caption><title> Experimental results on INRIA person database</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >INRIA</th><th align="center" valign="middle" >PSVM</th><th align="center" valign="middle" >TSVM</th><th align="center" valign="middle" >LRSMM</th><th align="center" valign="middle" >LTMRSMM</th><th align="center" valign="middle" >PSMM</th></tr></thead><tr><td align="center" valign="middle" >Test set</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td></tr><tr><td align="center" valign="middle" >pos/neg</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >100/100</td><td align="center" valign="middle" >79.60</td><td align="center" valign="middle" >83.00</td><td align="center" valign="middle" >82.00</td><td align="center" valign="middle" >72.55</td><td align="center" valign="middle" >86.85</td></tr><tr><td align="center" valign="middle" >0.53785</td><td align="center" valign="middle" >1.36313</td><td align="center" valign="middle" >19.32664</td><td align="center" valign="middle" >4.96127</td><td align="center" valign="middle" >3.46110</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >200/200</td><td align="center" valign="middle" >79.73</td><td align="center" valign="middle" >82.45</td><td align="center" valign="middle" >80.53</td><td align="center" valign="middle" >72.40</td><td align="center" valign="middle" >87.20</td></tr><tr><td align="center" valign="middle" >0.54470</td><td align="center" valign="middle" >1.36510</td><td align="center" valign="middle" >20.32056</td><td align="center" valign="middle" >5.21338</td><td align="center" valign="middle" >3.82192</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >300/300</td><td align="center" valign="middle" >80.02</td><td align="center" valign="middle" >84.25</td><td align="center" valign="middle" >82.32</td><td align="center" valign="middle" >75.28</td><td align="center" valign="middle" >87.23</td></tr><tr><td align="center" valign="middle" >0.55377</td><td align="center" valign="middle" >1.37249</td><td align="center" valign="middle" >20.67401</td><td align="center" valign="middle" >5.15255</td><td align="center" valign="middle" >3.72906</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >400/400</td><td align="center" valign="middle" >79.39</td><td align="center" valign="middle" >82.76</td><td align="center" valign="middle" >79.24</td><td align="center" valign="middle" >73.60</td><td align="center" valign="middle" >86.57</td></tr><tr><td align="center" valign="middle" >0.68570</td><td align="center" valign="middle" >1.73248</td><td align="center" valign="middle" >26.24048</td><td align="center" valign="middle" >6.93018</td><td align="center" valign="middle" >4.62039</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >500/500</td><td align="center" valign="middle" >79.44</td><td align="center" valign="middle" >81.91</td><td align="center" valign="middle" >79.18</td><td align="center" valign="middle" >74.27</td><td align="center" valign="middle" >85.67</td></tr><tr><td align="center" valign="middle" >0.70342</td><td align="center" valign="middle" >1.75017</td><td align="center" valign="middle" >31.68549</td><td align="center" valign="middle" >7.36364</td><td align="center" valign="middle" >5.26839</td></tr></tbody></table></table-wrap><table-wrap id="table5" ><label><xref ref-type="table" rid="table5">Table 5</xref></label><caption><title> Experimental results on the students face database</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Students</th><th align="center" valign="middle" >PSVM</th><th align="center" valign="middle" >TSVM</th><th align="center" valign="middle" >LRSMM</th><th align="center" valign="middle" >LTMRSMM</th><th align="center" valign="middle" >PSMM</th></tr></thead><tr><td align="center" valign="middle" >Test set</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td></tr><tr><td align="center" valign="middle" >pos/neg</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >40/40</td><td align="center" valign="middle" >90.13</td><td align="center" valign="middle" >90.63</td><td align="center" valign="middle" >88.63</td><td align="center" valign="middle" >80.25</td><td align="center" valign="middle" >94.25</td></tr><tr><td align="center" valign="middle" >0.67461</td><td align="center" valign="middle" >1.69127</td><td align="center" valign="middle" >5.29848</td><td align="center" valign="middle" >1.17995</td><td align="center" valign="middle" >4.77109</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >50/50</td><td align="center" valign="middle" >88.20</td><td align="center" valign="middle" >88.20</td><td align="center" valign="middle" >88.00</td><td align="center" valign="middle" >80.00</td><td align="center" valign="middle" >92.60</td></tr><tr><td align="center" valign="middle" >0.69538</td><td align="center" valign="middle" >1.65039</td><td align="center" valign="middle" >5.30147</td><td align="center" valign="middle" >1.16947</td><td align="center" valign="middle" >5.33867</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >60/60</td><td align="center" valign="middle" >88.42</td><td align="center" valign="middle" >88.83</td><td align="center" valign="middle" >87.67</td><td align="center" valign="middle" >79.92</td><td align="center" valign="middle" >93.00</td></tr><tr><td align="center" valign="middle" >0.67033</td><td align="center" valign="middle" >1.66927</td><td align="center" valign="middle" >5.50830</td><td align="center" valign="middle" >1.16938</td><td align="center" valign="middle" >4.78024</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >70/70</td><td align="center" valign="middle" >88.86</td><td align="center" valign="middle" >88.79</td><td align="center" valign="middle" >87.86</td><td align="center" valign="middle" >78.57</td><td align="center" valign="middle" >93.07</td></tr><tr><td align="center" valign="middle" >0.66228</td><td align="center" valign="middle" >1.61085</td><td align="center" valign="middle" >5.45025</td><td align="center" valign="middle" >1.15839</td><td align="center" valign="middle" >4.82768</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >80/80</td><td align="center" valign="middle" >90.69</td><td align="center" valign="middle" >91.44</td><td align="center" valign="middle" >90.32</td><td align="center" valign="middle" >78.00</td><td align="center" valign="middle" >93.88</td></tr><tr><td align="center" valign="middle" >0.65905</td><td align="center" valign="middle" >1.61492</td><td align="center" valign="middle" >5.69662</td><td align="center" valign="middle" >1.10800</td><td align="center" valign="middle" >5.03019</td></tr></tbody></table></table-wrap></sec><sec id="s5_3"><title>5.3. Parameters Selection</title><p>The penalty parameter C affects the fitting degree of the proximal plane to samples. If the value of C is too small, the prediction ability of proximal planes will be lost, resulting in under fitting. Conversely, C with too large value will produce over fitting phenomenon, which will lead to the poor generalization ability of</p><table-wrap id="table6" ><label><xref ref-type="table" rid="table6">Table 6</xref></label><caption><title> Experimental results on the JAFFE database</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >JAFFE</th><th align="center" valign="middle" >PSVM</th><th align="center" valign="middle" >TSVM</th><th align="center" valign="middle" >LRSMM</th><th align="center" valign="middle" >LTMRSMM</th><th align="center" valign="middle" >PSMM</th></tr></thead><tr><td align="center" valign="middle"  rowspan="2"  >Subject</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td><td align="center" valign="middle" >Accuracy (%)</td></tr><tr><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td><td align="center" valign="middle" >Time (s)</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >afraid</td><td align="center" valign="middle" >66.73</td><td align="center" valign="middle" >82.50</td><td align="center" valign="middle" >78.44</td><td align="center" valign="middle" >79.06</td><td align="center" valign="middle" >86.41</td></tr><tr><td align="center" valign="middle" >2.59157</td><td align="center" valign="middle" >5.67755</td><td align="center" valign="middle" >4.84327</td><td align="center" valign="middle" >0.77936</td><td align="center" valign="middle" >11.52640</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >surprised</td><td align="center" valign="middle" >71.51</td><td align="center" valign="middle" >84.51</td><td align="center" valign="middle" >83.67</td><td align="center" valign="middle" >84.17</td><td align="center" valign="middle" >90.17</td></tr><tr><td align="center" valign="middle" >3.31090</td><td align="center" valign="middle" >7.07017</td><td align="center" valign="middle" >6.49447</td><td align="center" valign="middle" >1.37633</td><td align="center" valign="middle" >18.93282</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >happy</td><td align="center" valign="middle" >61.62</td><td align="center" valign="middle" >86.30</td><td align="center" valign="middle" >84.20</td><td align="center" valign="middle" >82.10</td><td align="center" valign="middle" >90.17</td></tr><tr><td align="center" valign="middle" >3.63068</td><td align="center" valign="middle" >7.86516</td><td align="center" valign="middle" >6.38803</td><td align="center" valign="middle" >1.55475</td><td align="center" valign="middle" >21.77760</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >sad</td><td align="center" valign="middle" >62.43</td><td align="center" valign="middle" >78.88</td><td align="center" valign="middle" >72.59</td><td align="center" valign="middle" >74.68</td><td align="center" valign="middle" >82.10</td></tr><tr><td align="center" valign="middle" >3.63619</td><td align="center" valign="middle" >7.83720</td><td align="center" valign="middle" >8.26077</td><td align="center" valign="middle" >1.52470</td><td align="center" valign="middle" >16.67137</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >angry</td><td align="center" valign="middle" >67.51</td><td align="center" valign="middle" >87.50</td><td align="center" valign="middle" >86.34</td><td align="center" valign="middle" >85.33</td><td align="center" valign="middle" >90.00</td></tr><tr><td align="center" valign="middle" >3.62794</td><td align="center" valign="middle" >7.90733</td><td align="center" valign="middle" >7.06966</td><td align="center" valign="middle" >1.38328</td><td align="center" valign="middle" >22.54246</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >disgusted</td><td align="center" valign="middle" >66.22</td><td align="center" valign="middle" >86.21</td><td align="center" valign="middle" >82.07</td><td align="center" valign="middle" >82.07</td><td align="center" valign="middle" >88.80</td></tr><tr><td align="center" valign="middle" >3.62028</td><td align="center" valign="middle" >7.88570</td><td align="center" valign="middle" >8.71498</td><td align="center" valign="middle" >1.59652</td><td align="center" valign="middle" >19.36869</td></tr><tr><td align="center" valign="middle"  rowspan="2"  >neutral</td><td align="center" valign="middle" >58.01</td><td align="center" valign="middle" >80.01</td><td align="center" valign="middle" >76.84</td><td align="center" valign="middle" >77.83</td><td align="center" valign="middle" >84.17</td></tr><tr><td align="center" valign="middle" >3.63031</td><td align="center" valign="middle" >7.83308</td><td align="center" valign="middle" >8.32018</td><td align="center" valign="middle" >1.51735</td><td align="center" valign="middle" >17.49704</td></tr></tbody></table></table-wrap><p>our model. Thus, predetermining the value of C is of vital importance for PSMM classifier. To observe the influence of different C on the classification accuracy of the novel model, we select C from {1 &#215; 10<sup>−</sup><sup>3</sup>, 2.5 &#215; 10<sup>−3</sup>, 5 &#215; 10<sup>−3</sup>, 1 &#215; 10<sup>−2</sup>, 2.5 &#215; 10<sup>−2</sup>, 5 &#215; 10<sup>−2</sup>, ∙∙∙, 1 &#215; 10<sup>1</sup>, 2.5 &#215; 10<sup>1</sup>, 5 &#215; 10<sup>1</sup>} and fix the values of other parameters, i.e., τ = 2 ,   ρ = 1 . The results of different data sets are shown in Figures 7-16.</p></sec></sec><sec id="s6"><title>6. Conclusion</title><p>In this paper, a novel SMM-type method called PSMM is proposed for the matrix classification problems, which has absorbed the advantages of PSVM and LRSMM. In design, the novel method has considered both the relationship between samples within a class and the structure of rows or columns of matrix data, which makes PSMM have good properties to meet the challenges of complex image classification problems. Finally, to verify the performance of our design, we conduct a large number of comparative experiments. It can be seen from the experimental results that PSMM performs better than PSVM, TSVM, LRSMM, LTMRSMM in the demanding image classification tasks. Moreover, since we only considered the linearly binary classification situation and the selection of penalty parameter C in this paper, there still have some relevant topics worthy of in-depth study in the future. For example, how to extend the PSMM algorithm to multiple classification situation, how to introduce appropriate kernel function to create the nonlinear version of PSMM and how to select a stable parameter combination. We will take these topics as our future directions.</p></sec><sec id="s7"><title>Acknowledgements</title><p>Sincerely thank Professor Yulan Liu for the support and guidance to this research, and gratefully acknowledge Guangdong University of Technology for providing a good learning platform.</p></sec><sec id="s8"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s9"><title>Cite this paper</title><p>Zhang, W. and Liu, Y.L. (2022) Proximal Support Matrix Machine. Journal of Applied Mathematics and Physics, 10, 2268-2291. https://doi.org/10.4236/jamp.2022.107155</p></sec></body><back><ref-list><title>References</title><ref id="scirp.118820-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Fung, G. and Mangasarian, O.L. (2001) Proximal Support Vector Machine Classifiers. Proceedings of Seventh International Conference on Knowledge and Data Discovery, San Francisco, 26-29 August 2001, 77-86. https://doi.org/10.1145/502512.502527</mixed-citation></ref><ref id="scirp.118820-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Mangasarian, O.L. and Wild, E.W. (2006) Multisurface Proximal Support Vector Machine Classification via Generalized Eigenvalues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28, 69-74. https://doi.org/10.1109/TPAMI.2006.17</mixed-citation></ref><ref id="scirp.118820-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Chen, W.J. and Tian, Y.J. (2010) Lp-Norm Proximal Support Vector Machine and Its Applications. Procedia Computer Science, 1, 2417-2423. https://doi.org/10.1016/j.procs.2010.04.272</mixed-citation></ref><ref id="scirp.118820-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Zhu, Z.F., Zhu, X.Q., Guo, Y.F., Ye, Y.D. and Xue, X.Y. (2012) Inverse Matrix-free Incremental Proximal Support Vector Machine. Decision Support Systems, 53, 395-405. https://doi.org/10.1016/j.dss.2012.02.007</mixed-citation></ref><ref id="scirp.118820-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Shao, Y.H., Deng, N.Y., Chen, W.J. and Wang, Z. (2013) Improved Generalized Eigenvalue Proximal Support Vector Machine. IEEE Signal Processing Letters, 20, 213-216. https://doi.org/10.1109/LSP.2012.2216874</mixed-citation></ref><ref id="scirp.118820-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Li, G.Q., Yang, L.X., Wu, Z.Y. and Wu, C.Z. (2021) D.C. Programming for Sparse Proximal Support Vector Machines. Information Sciences, 547, 187-201. https://doi.org/10.1016/j.ins.2020.08.038</mixed-citation></ref><ref id="scirp.118820-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Jayadeva, Khemchandani, R. and Chandra, S. (2007) Twin Support Vector Machines for Pattern Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 905-910. https://doi.org/10.1109/TPAMI.2007.1068</mixed-citation></ref><ref id="scirp.118820-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Qi, Z.Q., Tiana, Y.J. and Shi, Y. (2012) Laplacian Twin Support Vector Machine for Semi-Supervised Classification. Neural Networks, 35, 46-53. https://doi.org/10.1016/j.neunet.2012.07.011</mixed-citation></ref><ref id="scirp.118820-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Guo, J.H., Yi, P., Wang, R.L., Ye, Q.L. and Zhao, C.X. (2014) Feature Selection for Least Squares Projection Twin Support Vector Machine. Neurocomputing, 144, 174-183. https://doi.org/10.1016/j.neucom.2014.05.040</mixed-citation></ref><ref id="scirp.118820-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Chen, S.G., Wu, X.J. and Yin, H.F. (2019) A Novel Projection Twin Support Vector Machine for Binary Classification. Soft Computing, 23, 655-668. https://doi.org/10.1007/s00500-017-2974-z</mixed-citation></ref><ref id="scirp.118820-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">An, Y.X. and Xue, H. (2022) Indefinite Twin Support Vector Machine with DC Functions Programming. Pattern Recognition, 121, Article ID: 108195. https://doi.org/10.1016/j.patcog.2021.108195</mixed-citation></ref><ref id="scirp.118820-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Mehrkanoon, S., Huang, X.L. and Suykens, J.A.K. (2014) Non-Parallel Support Vector Classifiers with Different Loss Functions. Neurocomputing, 143, 294-301. https://doi.org/10.1016/j.neucom.2014.05.063</mixed-citation></ref><ref id="scirp.118820-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Huang, X.L., Shi, L. and Suykens, J.A.K. (2014) Asymmetric Least Squares Support Vector Machine Classifiers. Computational Statistics and Data Analysis, 70, 395-405. https://doi.org/10.1016/j.csda.2013.09.015</mixed-citation></ref><ref id="scirp.118820-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Sun, J., Fujita, H., Chen, P. and Li, H. (2016) Dynamic Financial Distress Prediction with Concept Drift Based on Time Weighting Combined with Adaboost Support Vector Machine Ensemble. Knowledge-Based Systems, 120, 4-14. https://doi.org/10.1016/j.knosys.2016.12.019</mixed-citation></ref><ref id="scirp.118820-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Wang, H.J., Shao, Y.H., Zhou, S.L., Zhang, C. and Xiu, N.H. (2021) Support Vector Machine Classifier via L&lt;sub&gt;0/1&lt;/sub&gt; Soft-Margin Loss. IEEE Transactions on Pattern Analysis and Machine Intelligence (Early Access). https://doi.org/10.1109/TPAMI.2021.3092177</mixed-citation></ref><ref id="scirp.118820-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Luo, L., Xie, Y.B., Zhang, Z.H. and Li, W.J. (2015) Support Matrix Machines. Proceedings of the 32nd International Conference on Machine Learning, Vol. 37, Lille, 6-11 July 2015, 938-947.</mixed-citation></ref><ref id="scirp.118820-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Cortes, C. and Vapnik, V. (1995) Support-Vector Networks. Machine Learning, 20, 273-297. https://doi.org/10.1007/BF00994018</mixed-citation></ref><ref id="scirp.118820-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Gao, X.Z., Fan, L.Y. and Xu, H.T. (2016) A Novel Method for Classification of Matrix Data Using Twin Multiple Rank SMMs. Applied Soft Computing, 48, 546-562. https://doi.org/10.1016/j.asoc.2016.07.003</mixed-citation></ref><ref id="scirp.118820-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Zheng, Q.Q., Zhu, F.Y. and Heng, P.A. (2018) Robust Support Matrix Machine for Single Trial EEG Classification. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 551-562. https://doi.org/10.1109/TNSRE.2018.2794534</mixed-citation></ref><ref id="scirp.118820-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Kobayashi, T. and Otsu, N. (2012) Efficient Optimization for Low-Rank Integrated Bilinear Classifiers. Proceedings of the 12th European Conference on Computer Vision, Vol. 7573, Florence, 7-13 October 2012, 474-487. https://doi.org/10.1007/978-3-642-33709-3_34</mixed-citation></ref><ref id="scirp.118820-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Xu, H.T., Fan, L.Y. and Gao, X.Z. (2015) Projection Twin SMMs for 2D Image Data Classification. Neural Computing and Applications, 26, 91-100. https://doi.org/10.1007/s00521-014-1700-3</mixed-citation></ref><ref id="scirp.118820-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Zheng, Q.Q., Zhu, F.Y., Qin, J., Chen, B.D. and Heng, P.A. (2017) Sparse Support Matrix Machine. Pattern Recognition, 1-12.</mixed-citation></ref><ref id="scirp.118820-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Jiang, R. and Yang, Z.X. (2018) Multiple Rank Multi-Linear Twin Support Matrix Classification Machine. Journal of Intelligent and Fuzzy Systems, 35, 5741-5754. https://doi.org/10.3233/JIFS-17414</mixed-citation></ref><ref id="scirp.118820-ref24"><label>24</label><mixed-citation publication-type="other" xlink:type="simple">Pan, H.Y., Yang, Y., Zheng, J.D., Li, X. and Cheng, J.S. (2020) Symplectic Interactive Support Matrix Machine and Its Application in Roller Bearing Condition Monitoring. Neurocomputing, 398, 1-10. https://doi.org/10.1016/j.neucom.2020.01.074</mixed-citation></ref><ref id="scirp.118820-ref25"><label>25</label><mixed-citation publication-type="other" xlink:type="simple">Li, X., Yang, Y., Pan, H.Y., Cheng J. and Cheng, J.S. (2020) Non-Parallel Least Squares Support Matrix Machine for Rolling Bearing Fault Diagnosis. Mechanism and Machine Theory, 145, Article ID: 103676. https://doi.org/10.1016/j.mechmachtheory.2019.103676</mixed-citation></ref><ref id="scirp.118820-ref26"><label>26</label><mixed-citation publication-type="other" xlink:type="simple">Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, 3, 1-122. https://doi.org/10.1561/2200000016</mixed-citation></ref><ref id="scirp.118820-ref27"><label>27</label><mixed-citation publication-type="other" xlink:type="simple">Lecun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998) Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86, 2278-2324. https://doi.org/10.1109/5.726791</mixed-citation></ref><ref id="scirp.118820-ref28"><label>28</label><mixed-citation publication-type="other" xlink:type="simple">Dalal, N. and Triggs, B. (2005) Histograms of Oriented Gradients for Human Detection. IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, San Diego, 20-25 June 2005, 886-893. https://doi.org/10.1109/CVPR.2005.177</mixed-citation></ref><ref id="scirp.118820-ref29"><label>29</label><mixed-citation publication-type="other" xlink:type="simple">Nazir, M., Ishtiaq, M., Batool, A., Jaffar, M.A. and Mirza, A.M. (2010) Feature Selection for Efficient Gender Classification. Proceedings of the 11th WSEAS International Conference on Nural Networks, Evolutionary Computing and Fuzzy Systems, Iasi, 13-15 June 2010, 70-75.</mixed-citation></ref><ref id="scirp.118820-ref30"><label>30</label><mixed-citation publication-type="other" xlink:type="simple">Lyonsa, M., Akamatsu, S., Kamachia, M. and Gyoba, J. (1998) Coding Facial Expressionswith Gabor Wavelets. 3rd IEEE International Conference on Automatic Face and Gesture Recognition, Nara, 14-16 April 1998, 200-205. https://doi.org/10.1109/AFGR.1998.670949</mixed-citation></ref><ref id="scirp.118820-ref31"><label>31</label><mixed-citation publication-type="other" xlink:type="simple">Cai, J.F., Candes, E.J. and Shen, Z.W. (2010) A Singular Value Thresholding Algorithm for Matrix Completion. SIAM Journal on Optimization, 20, 1956-1982. https://doi.org/10.1137/080738970</mixed-citation></ref><ref id="scirp.118820-ref32"><label>32</label><mixed-citation publication-type="other" xlink:type="simple">Chen, L., Sun, D.F. and Toh, K.C. (2015) A Note on the Convergence of ADMM for Linearly Constrained Convex Optimization Problems. Computational Optimization and Applications, 66, 327-343. https://doi.org/10.1007/s10589-016-9864-7</mixed-citation></ref></ref-list></back></article>