<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article  PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd"><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article"><front><journal-meta><journal-id journal-id-type="publisher-id">JAMP</journal-id><journal-title-group><journal-title>Journal of Applied Mathematics and Physics</journal-title></journal-title-group><issn pub-type="epub">2327-4352</issn><publisher><publisher-name>Scientific Research Publishing</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="doi">10.4236/jamp.2024.124077</article-id><article-id pub-id-type="publisher-id">JAMP-132793</article-id><article-categories><subj-group subj-group-type="heading"><subject>Articles</subject></subj-group><subj-group subj-group-type="Discipline-v2"><subject>Physics&amp;Mathematics</subject></subj-group></article-categories><title-group><article-title>
 
 
  Fully Distributed Learning for Deep Random Vector Functional-Link Networks
 
</article-title></title-group><contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Huada</surname><given-names>Zhu</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref></contrib><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Wu</surname><given-names>Ai</given-names></name><xref ref-type="aff" rid="aff1"><sup>1</sup></xref><xref ref-type="corresp" rid="cor1"><sup>*</sup></xref></contrib></contrib-group><aff id="aff1"><addr-line>School of Mathematics and Statistics, Guilin University of Technology, Guilin, China</addr-line></aff><pub-date pub-type="epub"><day>11</day><month>04</month><year>2024</year></pub-date><volume>12</volume><issue>04</issue><fpage>1247</fpage><lpage>1262</lpage><history><date date-type="received"><day>18,</day>	<month>March</month>	<year>2024</year></date><date date-type="rev-recd"><day>25,</day>	<month>April</month>	<year>2024</year>	</date><date date-type="accepted"><day>28,</day>	<month>April</month>	<year>2024</year></date></history><permissions><copyright-statement>&#169; Copyright  2014 by authors and Scientific Research Publishing Inc. </copyright-statement><copyright-year>2014</copyright-year><license><license-p>This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/</license-p></license></permissions><abstract><p>
 
 
  In the contemporary era, the proliferation of information technology has led to an unprecedented surge in data generation, with this data being dispersed across a multitude of mobile devices. Facing these situations and the training of deep learning model that needs great computing power support, the distributed algorithm that can carry out multi-party joint modeling has attracted everyone&amp;#8217;s attention. The distributed training mode relieves the huge pressure of centralized model on computer computing power and communication. However, most distributed algorithms currently work in a master-slave mode, often including a central server for coordination, which to some extent will cause communication pressure, data leakage, privacy violations and other issues. To solve these problems, a decentralized fully distributed algorithm based on deep random weight neural network is proposed. The algorithm decomposes the original objective function into several sub-problems under consistency constraints, combines the decentralized average consensus (DAC) and alternating direction method of multipliers (ADMM), and achieves the goal of joint modeling and training through local calculation and communication of each node. Finally, we compare the proposed decentralized algorithm with several centralized deep neural networks with random weights, and experimental results demonstrate the effectiveness of the proposed algorithm.
 
</p></abstract><kwd-group><kwd>Distributed Optimization</kwd><kwd> Deep Neural Network</kwd><kwd> Random Vector Functional-Link (RVFL) Network</kwd><kwd> Alternating Direction Method of Multipliers (ADMM)</kwd></kwd-group></article-meta></front><body><sec id="s1"><title>1. Introduction</title><p>In recent years, with the rapid development of digital technology and network technology, the scale of data we can collect is unprecedented, and it presents three characteristics: one is the large scale of data, the other is the high dimension of data, and the third is the distributed storage of data. These characteristics of data bring us a lot of challenges in processing data [<xref ref-type="bibr" rid="scirp.132793-ref1">1</xref>] . Traditional centralized machine learning is limited to a single machine for processing calculations, which has revealed many drawbacks. The problem of limited training data size and long training time makes centralized learning unable to meet the requirements of processing today’s big data, so it is necessary to deploy the data to be processed to multiple machines for joint modeling in a distributed manner, which also corresponds to this feature of data distributed storage [<xref ref-type="bibr" rid="scirp.132793-ref2">2</xref>] . Therefore, it is of great significance to apply fast and efficient distributed learning algorithm to the original neural network.</p><p>On the other hand, in recent years, deep neural networks have become a very popular research direction in the field of machine learning, and have made major breakthroughs in many fields. Although deep neural networks are favored by everyone because of their excellent performance, with the rapid development of digitalization and the three characteristics of data presentation, stand-alone can no longer meet the training requirements of deep neural networks. Therefore, the application of distributed optimization algorithms to deep neural networks has become a new research trend. As early as 2012, Dean et al. [<xref ref-type="bibr" rid="scirp.132793-ref3">3</xref>] , a researcher at Google, developed two distributed training algorithms, Downpour SGD and Sandblaster L-BFGS, in the training of a large-scale deep neural network. It is of great significance. Of course, there is a gradual increase in research on distributed deep neural networks, and many frameworks that support distributed training have emerged, such as the TensorFlow framework proposed by Abadi et al. [<xref ref-type="bibr" rid="scirp.132793-ref4">4</xref>] and the Horovod framework proposed by Sergeev et al. [<xref ref-type="bibr" rid="scirp.132793-ref5">5</xref>]</p><p>To realize distributed training of models, two distributed frameworks are generally adopted [<xref ref-type="bibr" rid="scirp.132793-ref6">6</xref>] , one is master-slave mode, and the other is point-to-point mode. In master-slave mode, there is a central node, which is responsible for collecting and aggregating data or model parameters sent by other child nodes for processing and calculation, and then sending the calculated results to them respectively [<xref ref-type="bibr" rid="scirp.132793-ref7">7</xref>] [<xref ref-type="bibr" rid="scirp.132793-ref8">8</xref>] . Such a communication architecture may cause problems with communication stress on the one hand [<xref ref-type="bibr" rid="scirp.132793-ref9">9</xref>] , and risks of data leakage and misuse on the other [<xref ref-type="bibr" rid="scirp.132793-ref10">10</xref>] . In a point-to-point distributed architecture, there is no central node in the network, and the state between nodes is the same. Depending on the topology of the network, a node communicates with one or more other nodes, and after several rounds of communication, the entire network eventually reaches the goal of consistency. This decentralized, fully distributed architecture not only saves some communication overhead, but also data or model parameters are communicated only between adjacent nodes, thus preserving data privacy [<xref ref-type="bibr" rid="scirp.132793-ref11">11</xref>] . Due to the advantages of this framework, there have been many researches and applications on this distributed framework in recent years, and the application examples in deep learning are [<xref ref-type="bibr" rid="scirp.132793-ref12">12</xref>] [<xref ref-type="bibr" rid="scirp.132793-ref13">13</xref>] and so on.</p><p>In addition, the choice of algorithm for deep neural network also has an important impact on the efficiency of the model. Gradient algorithm is that most widely used neural networks learn algorithms in deep neural networks. However, traditional gradient algorithms have some disadvantage, such as easy to fall into local minimum points, slow convergence speed, strong dependence on initial parameters, etc. [<xref ref-type="bibr" rid="scirp.132793-ref14">14</xref>] . For deep networks, gradient algorithms also have gradient vanishing or gradient explosion problems, which will affect the training efficiency and make it difficult to exert the strong learning ability of the deep neural network [<xref ref-type="bibr" rid="scirp.132793-ref15">15</xref>] . In order to solve these problems, this paper proposes a distributed learning method based on deep random weight neural network. Compared with traditional neural network, random weight neural network has a very fast training speed, reduces the probability of falling into local minimum point, and ensures good approximation and generalization ability. Representative deep random neural networks, such as multi-hidden layer feedforward neural networks (MLFN) [<xref ref-type="bibr" rid="scirp.132793-ref16">16</xref>] , limit learners for deep structures (H-ELM) [<xref ref-type="bibr" rid="scirp.132793-ref17">17</xref>] , deep random vector functional-link neural networks based on stacked autoencoders (sdRVFL) [<xref ref-type="bibr" rid="scirp.132793-ref18">18</xref>] [<xref ref-type="bibr" rid="scirp.132793-ref19">19</xref>] , etc., where sdRVFL has faster and better generalization ability than the above deep random networks.</p><p>Based on the solid foundation of the above models and theories, combining the advantages of current deep neural networks and distributed learning frameworks in various aspects, this paper creates a point-to-point fully distributed deep vector functional-link model algorithm called D-sdRVFL on the proposed deep random vector functional-link neural network (sdRVFL). Our proposed algorithm is based on the decentralized average consensus (DAC) [<xref ref-type="bibr" rid="scirp.132793-ref20">20</xref>] and alternating direction method of multipliers (ADMM) [<xref ref-type="bibr" rid="scirp.132793-ref21">21</xref>] . In the process of distributed model training, we first use ADMM algorithm to transform the global consistency optimization problem of the model into equivalent sub-problems to solve. In the process of solving, we involve the values that need global information to calculate. We use DAC algorithm to achieve global consistency only through communication between nodes, avoiding the existence of central nodes, and finally realizing decentralized and completely distributed training of deep learning models. The main contributions of this paper are as follows:</p><p>&#183; A peer-to-peer distributed learning algorithm based on deep RVFL is proposed, in which multiple nodes can jointly train modeling without a central server, while also protecting data privacy.</p><p>&#183; According to two different connection variants of deep RVFL network, we propose corresponding distributed deep neural network algorithms.</p><p>&#183; The proposed D-sdRVFL algorithm is comparable to the centralized deep RVFL algorithm in performance. The experimental results on multiple classification datasets show that the proposed algorithm has little difference in model accuracy with the centralized deep RVFL, and the training speed of the model is improved. Compared with the centralized algorithm, the point-to-point distributed algorithm has great advantages in dealing with large-scale high-dimensional data, and at the same time, it also protects data privacy to a certain extent.</p><p>The rest of this paper is structured as follows. Section 2 briefly introduces the basic concepts and training optimization process of two kinds of deep RVFL networks. In Section 3, decentralized fully distributed optimization algorithms are proposed for two kinds of deep RVFL networks. In Section 4, we compare the performance of the proposed distributed algorithm with other centralized deep random weight algorithms. Section 5 summarizes the paper.</p></sec><sec id="s2"><title>2. Preliminary</title><p>In this section, we will introduce the basic structure of deep RVFL and its optimization problems, and introduce the concept of the decentralized average consensus (DAC) as the theoretical basis for our extension of the network to decentralized distributed deep networks.</p><sec id="s2_1"><title>2.1. Deep RVFL with Direct Links</title><p>In the Deep RVFL with direct links network, the original data first goes through L hidden layers for feature extraction to obtain complex high-level features, and then enters the RVFL classifier. The learning and optimization of the whole network are also divided into two parts, one is the optimization of the reconstruction matrix of the hidden layer encoder, and the other is the optimization of the weight matrix of the RVFL classifier.</p><p>The hidden layers in depth RVFL are composed of stacked self-encoded layers of L layers, and the output of each hidden layer represents H l . In the hidden layer, the output result of the previous layer is used as the input value of the next layer. The optimization problem for each coding layer is as follows:</p><p>U ^ l = arg min U l 1 2 ‖ Z l U l − H l − 1 ‖ 2 + λ l ‖ U l ‖ 1 (1)</p><p>where H l − 1 is the output of the coding layer of the l − 1 th layer, and is also the input of the encoder of the coding layer of the lth layer, Z l is the output of the encoder of the coding layer of the lth layer obtained by the activation function, and H<sub>0</sub> = X, our goal is to optimize the weight matrix U of the decoder of the coding layer, λ l is the regularization parameter of the lth layer.</p><p>After L self-encoding layers, the final feature representation H L is obtained. We need to connect H L with the original data X and then enter the classifier of RVFL. We use X c to represent the input value of the classifier, and X c can be defined as:</p><p>X c = [ H L , X ] . (2)</p><p>In the RVFL classifier, the learning objective is to optimize the weight matrix β , and the optimization objective function is as follows:</p><p>β ^ = arg min β 1 2 ‖ X c β − T ‖ 2 + λ 2 ‖ β ‖ 2 (3)</p><p>where T is the target matrix, λ is the regularization parameter.</p></sec><sec id="s2_2"><title>2.2. Deep RVFL with Dense Direct Links</title><p>In the Deep RVFL with dense direct links network, the original data is first subjected to feature extraction through hidden layers, and in the self-encoding layers of the L layers in the hidden layers, each self-encoding layer is connected with the subsequent self-encoding layer, so that the input value of each subsequent hidden layer includes the output values of all the previous hidden layers, and each hidden layer input X l can be represented as follows:</p><p>X l = [ X , H 1 , ⋯ , H l − 1 ] (4)</p><p>where X is the original data, H l is the output value of each hidden layer, and the form of the output value may refer to formula:</p><p>H l = Z l U ^ l , (5)</p><p>except that the input value of each layer is changed.</p><p>After passing through L hidden layers, we get the output value H L of the hidden layer. We connect the output value H l of each hidden layer with the original data and enter the classifier of RVFL as a whole. We use X c to represent the input value of the classifier, which can be expressed as:</p><p>X c = [ X , H 1 , ⋯ , H L ] . (6)</p><p>The optimization problem in the RVFL classifier is the same as in Equation (3), we need to solve the optimal weight matrix β .</p></sec><sec id="s2_3"><title>2.3. Decentralized Average Consensus (DAC)</title><p>DAC [<xref ref-type="bibr" rid="scirp.132793-ref15">15</xref>] is an algorithm that iterates continuously over the parameters of each node to reach a global average, requiring only communication between nodes. Here, we assume that there are N nodes in the network. In the kth iteration, the parameter of a node i is ψ i , and the update of the DAC of each local node is as follows:</p><p>ψ i ( k ) = ∑ j = 1 N   b i j ψ i ( k − 1 ) (7)</p><p>where B = [ b i j ] B is an adjacency matrix of size N &#215; N , The parameters will gradually converge to the global average value through continuous iteration, as follows:</p><p>l i m k → + ∞ ψ i ( k ) = 1 N ∑ i = 1 N   ψ i ( 0 ) ,   ∀ i ∈ N . (8)</p></sec></sec><sec id="s3"><title>3. Fully Distributed Deep RVFL Network</title><p>In this section, we extend the previous two forms of deep RVFL to the peer-to-peer distributed learning framework. By using DAC and ADMM methods to optimize the weights of each hidden layer of each node and the weights of their RVFL classifiers. The following describes two distributed deep RVFL networks and their solving processes.</p><sec id="s3_1"><title>3.1. Problem Description</title><p>In a distributed learning network based on a point-to-point architecture, we assume that the network has N nodes that are connected to their neighbors and can communicate with each other. The whole dataset is randomly distributed among nodes. Here, we assume that the dataset local to the ith node is X i and Y i , and each node is trained locally for the deep RVFL network. Then in distributed scenarios, the whole global optimization problem becomes minimizing the sum of the loss functions at each node. The following formula is used to express, assuming that the loss function at the ith node is f i ( z ) , then the global objective function is:</p><p>z * = arg min { F ( z ) : = ∑ i = 1 N   f i ( z ) } . (9)</p></sec><sec id="s3_2"><title>3.2. Fully Distributed Deep RVFL Network with Direct Links</title><p>From the introduction of the second part, we know that in deep RVFL directly connected networks, the optimization of the model is divided into two parts, one is the optimization of the decoder reconstruction weight matrix of the self-encoder in the hidden layer, and the other is the optimization of the RVFL classifier weight matrix. We extend the optimization problem to distributed scenarios. Suppose we are in a topological network of N nodes, and each node only communicates with its neighbors. From the analysis and deduction in the previous subsection, the optimization problem (1) is decomposed into N subproblems for cumulative solution:</p><p>U ^ l = arg min U l i 1 2 ∑ i = 1 N ‖ Z l i U l i − H l − 1 i ‖ 2 + λ l ‖ U l i ‖ 1 (10)</p><p>where λ l is the regularization parameter of the hidden layer of the lth layer. By solving the above objective function, we obtain the optimal reconstruction matrix U l of each hidden layer, and each node can use U l to extract the features of the hidden layer. After optimizing the RVFL classifier weight matrix, we assume the same distributed topology scenario, and then problem (3) naturally becomes the following form:</p><p>β ^ = arg min β 1 2 ∑ i = 1 N ‖ X c i β i − T i ‖ 2 + λ 2 ‖ β i ‖ 2 (11)</p><p>where X c i = [ H L i , X i ] , H L i is the output value of the hidden layer for each node, X i is the original data for each node and T i is the target matrix for each node.</p></sec><sec id="s3_3"><title>3.3. Fully Distributed Deep RVFL Network with Dense Direct Links</title><p>Here, the deep RVFL with dense direct links is also extended to a distributed scenario. The difference from the fully distributed deep RVFL network with direct links lies in the connection between the hidden layers. Each hidden layer in the front and all hidden layers in the back are connected, so that features with lower complexity can be used multiple times, so that the features extracted by the hidden layers are more representative and meaningful. Suppose that on a certain node, the input value X l i of a certain hidden layer can be represented as follows:</p><p>X l i = [ X i , H 1 i , ⋯ , H l − 1 i ] . (12)</p><p>The distributed optimization problem can then look at problems (10) and (11) for the optimization problem of the entire network, as in the case of direct connections.</p></sec><sec id="s3_4"><title>3.4. Fully Distributed Solutions</title><p>For the above objective function to solve the global optimal weight matrix, there are two aspects of the problem, one is to minimize the sum of loss functions, the other is to achieve global consistency, which is actually an optimization problem with constraints. For such problems, ADMM method can be used to solve. Below we outline the principles of ADMM.</p><p>ADMM algorithm combines Lagrangian multiplier method and dual decomposition, and solves the original problem by optimizing the original problem and dual problem alternately. ADMM is typically applied to constrained optimization problems of the form:</p><p>min   f 1 ( θ 1 ) + g 2 ( θ 2 ) s .t .   P 1 θ 1 + P 2 θ 2 − R = 0. (13)</p><p>The core idea of ADMM is to transform constrained optimization problems into equivalent unconstrained ones, and this process realizes the interpretation of constraints by introducing Lagrangian multiplier terms. In this way, we obtain the augmented Lagrangian function of the above problem, and then find its partial derivative to obtain the specific iterative formula of variables.</p><p>In addition, there have been many literatures on the convergence analysis and convergence rate judgment of distributed ADMM algorithm, and it has been proved in [<xref ref-type="bibr" rid="scirp.132793-ref22">22</xref>] that this algorithm converges at the rate O ( 1 k ) .</p><p>According to the principle of ADMM above, we set the auxiliary variable V l so that the parameters of each node converge to the same value. Then, problem (10) is rewritten as follows:</p><p>min   1 2 ∑ i = 1 N ‖ Z l i U l i − H l − 1 i ‖ 2 + λ l ‖ V l ‖ 1 s .t .   U l i − V l = 0,   i = 1,2, ⋯ , N (14)</p><p>where λ l denotes the regularization parameter for each hidden layer, then we obtain the augmented Lagrangian for the above problem as follows:</p><p>L ρ l ( { U l i } , V l , { μ l i } ) = 1 2 ∑ i = 1 N ‖ Z l i U l i − H l − 1 i ‖ 2 + λ l ‖ V l ‖ 1 + ∑ i = 1 N ( μ l i ) Τ ( U l i − V l )   + ρ l 2 ∑ i = 1 N ‖ U l i − V l ‖ 2 . (15)</p><p>For each of the hidden layers, where μ l i is the dual variable of the ith node, ρ l is the penalty term. In each iteration process, the local objective functions of U l i and V l are first optimized alternately, and then the dual variable μ l i is updated, and the iteration formula is as follows:</p><p>U l i ( t + 1 ) = arg min U l i L ρ l ( U l i , V l ( t ) , μ l i ( t ) ) (16)</p><p>V l ( t + 1 ) = arg min V l i L ρ l ( U l i ( t + 1 ) , V l , μ l i ( t ) ) (17)</p><p>μ l i ( t + 1 ) = μ l i ( t ) + ρ l ( U l i ( t + 1 ) − V l ( t + 1 ) ) (18)</p><p>where t represents the tth iteration. Equations (16) and (17) can be calculated to obtain closed solutions. Then, we can obtain the iterative steps as follows:</p><p>U l i ( t + 1 ) = ( ( Z l i ) Τ Z l i + ρ l I ) − 1 ( ( Z l i ) Τ H l − 1 i + ρ l V l ( t ) − μ l i ( t ) ) (19)</p><p>V l ( t + 1 ) = S λ l / N ρ l ( U ^ l + μ ^ l ) (20)</p><p>μ l i ( t + 1 ) = μ l i ( k ) + ρ l ( U l i ( t + 1 ) − V l ( t + 1 ) ) (21)</p><p>where U ^ l = 1 N ∑ i = 1 N   U l i ( t + 1 ) , μ ^ l = 1 N ∑ i = 1 N   μ l i ( t ) , are the average of global nodes. In master-slave mode, this requires a central node to aggregate information from all nodes to compute. Here, we use the decentralized average consensus (DAC) algorithm to achieve global average consistency only by communication between nodes, instead of the role of central nodes, thus avoiding the existence of central nodes and realizing decentralized distributed optimization. We obtain an estimate of the mean value by (7) and (8).</p><p>In addition, S κ ( ⋅ ) stands for the element-wise soft threshold operator [<xref ref-type="bibr" rid="scirp.132793-ref23">23</xref>] , which is defined as follows:</p><p>S κ ( a ) = { a − κ , a &gt; κ 0 , | a | ≤ κ a + κ , a &lt; − κ . (22)</p><p>Through the above calculation, we find the optimal reconstruction matrix U ^ l i of each hidden layer, the data enters each hidden layer to find the optimal reconstruction matrix and then enters the next layer, and the optimization of the hidden layer is completed before the optimization of the RVFL classifier.</p><p>For problem (11), we also use ADMM combined with DAC to solve it, set auxiliary variable V , so (11) is rewritten as follows:</p><p>min   1 2 ∑ i = 1 N ‖ X c i β i − T i ‖ 2 + λ 2 ‖ V ‖ 2 s .t .   β i − V = 0,   i = 1,2, ⋯ , N . (23)</p><p>We get the augmented Lagrange function as follows:</p><p>L ρ ( { β i } , V , { μ i } ) = 1 2 ∑ i = 1 N ‖ X c i β i − T i ‖ 2 + λ 2 ‖ V ‖ 2 + ∑ i = 1 N ( μ i ) Τ ( β i − V )   + ρ 2 ∑ i = 1 N ‖ β i − V ‖ 2 . (24)</p><p>Then, the ADMM iterations are as follows:</p><p>β i ( t + 1 ) = ( ( X c i ) Τ X c i + ρ I ) − 1 ( ( X c i ) Τ T i + ρ V ( t ) − μ i ( t ) ) (25)</p><p>V ( t + 1 ) = ρ β ^ + μ ^ ρ + λ N (26)</p><p>μ i ( t + 1 ) = μ i ( t ) + ρ ( β i ( t + 1 ) − V ( t + 1 ) ) (27)</p><p>where β ^ = 1 N ∑ i = 1 N   β i ( t + 1 ) and μ ^ = 1 N ∑ i = 1 N   μ i ( t ) in (26) are the average value of the global nodes, and the DAC algorithm is also used to obtain the average value, and the calculation is carried out according to Formulas (7) and (8). Through the calculation of the above formula, the global optimal value of the RVFL classifier weight matrix is finally obtained.</p><p>In order to understand the training process of the distributed algorithm more clearly, the pseudocode of Algorithm 1 shows the iterative steps of the decentralized distributed algorithm in the directly connected deep RVFL network. The algorithm for dense connections is similar to Algorithm 1 and will not be repeated here.</p><disp-formula id="scirp.132793-formula5"><graphic  xlink:href="//html.scirp.org/file/16-1723655x78.png?20240426170059845"  xlink:type="simple"/></disp-formula></sec></sec><sec id="s4"><title>4. Experiments and Analysis</title><p>In order to verify the effectiveness and feasibility of the proposed algorithm, and the robustness of the algorithm in the face of network layer number changes. We designed two experiments. The first part of the experiment is mainly to compare with other algorithms in terms of performance, by comparing the model accuracy and training time of each model algorithm on the same data set. In the second part, we change the number of hidden layers of the depth model to observe the accuracy and training time of the proposed distributed algorithm, and verify its robustness.</p><p>We will introduce the experimental setup below, including a brief description of the dataset, metrics to measure the accuracy of the model, a description of the training time of the model, and the selection and parameter setting of the model algorithm compared with it. Make the superiority of the proposed algorithm more convincing.</p><sec id="s4_1"><title>4.1. Experimental Setup</title><sec id="s4_1_1"><title>4.1.1. Training Datasets</title><p>In the selection of data, we use the data sets used for classification tasks on the classical UCI dataset, carefully selected according to the size of the data set, there are large data sets with a total data volume of more than one million, and there are small data sets with a total data volume of less than ten thousand. Minmax normalization is performed on the data, and the performance of the observation model on different orders of magnitude data sets is better. Details about the dataset are presented in <xref ref-type="table" rid="table1">Table 1</xref>, and further descriptions of the data can be found on the UCI dataset website.</p></sec><sec id="s4_1_2"><title>4.1.2. Evaluation Index</title><p>In the accuracy evaluation of the model, we select the classification accuracy as the evaluation index. The closer the classification prediction of the model is to the actual situation, the higher the accuracy of the model. The calculation formula for the classification accuracy is as follows:</p><p>CAR = thenumberofcorrectlyclassifiedsamples thetotalnumberofsamples &#215; 100 % . (28)</p><p>In terms of training time, we measure the training time of each node. For example, in a centralized model, there are no redundant nodes, so the training time</p><table-wrap id="table1" ><label><xref ref-type="table" rid="table1">Table 1</xref></label><caption><title> Overview of the UCI datasets</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Dataset</th><th align="center" valign="middle" >#Patterns</th><th align="center" valign="middle" >#Features</th><th align="center" valign="middle" >#Classes</th></tr></thead><tr><td align="center" valign="middle" >bank</td><td align="center" valign="middle" >4521</td><td align="center" valign="middle" >17</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >credit-approval</td><td align="center" valign="middle" >690</td><td align="center" valign="middle" >15</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >glass</td><td align="center" valign="middle" >214</td><td align="center" valign="middle" >9</td><td align="center" valign="middle" >6</td></tr><tr><td align="center" valign="middle" >musk-2</td><td align="center" valign="middle" >6598</td><td align="center" valign="middle" >166</td><td align="center" valign="middle" >2</td></tr><tr><td align="center" valign="middle" >statlog-image</td><td align="center" valign="middle" >2310</td><td align="center" valign="middle" >18</td><td align="center" valign="middle" >7</td></tr><tr><td align="center" valign="middle" >waveform</td><td align="center" valign="middle" >5000</td><td align="center" valign="middle" >21</td><td align="center" valign="middle" >3</td></tr></tbody></table></table-wrap><p>of its nodes is the training time of the model. In a distributed model, because multiple nodes participate in each optimization, the training time of each node needs to be divided by the corresponding number of nodes and then compared with the centralized model.</p></sec><sec id="s4_1_3"><title>4.1.3. Testing Models and Parameter Setting</title><p>For comparison model selection, we not only compare the proposed distributed algorithm model with the corresponding centralized model, but also select two representative deep random weight neural networks H-ELM and ML-KELM and centralized deep RVFL models sdRVFL (d) and sdRVFL (dense) as comparison objects for vertical and horizontal comparison.</p><p>We set all the models for comparison, and they keep consistent in the number of hidden layers and neurons to ensure the rationality of comparison. In this paper, the number of hidden layers is set to 3, the number of neurons is fixed to 32, and other parameters are simulated according to the optimal values mentioned in the paper where the model is located. For centralized depth RVFL and distributed depth RVFL, we uniformly adjust regularization term λ and Lagrangian parameter ρ synchronously, λ is set to λ = 0.01, 0.1, 1.10, 100, ρ is set to ρ = 0.01, 0.1, 1, 10, 100. The maximum iteration number of DAC algorithm is 500, and the iteration termination limit of DAC algorithm is 0.001.</p></sec></sec><sec id="s4_2"><title>4.2. Performance</title><sec id="s4_2_1"><title>4.2.1. Classification Accuracy</title><p>Through experimental verification on 6 classification data, as shown in <xref ref-type="table" rid="table2">Table 2</xref> above, we find that our proposed distributed depth models D-sdRVFL(d) and D-sdRVFL (dense) have good performance on classification tasks, and participate in the comparison of centralized depth models sdRVFL(d − l<sub>1</sub>/l<sub>2</sub>) and sdRVFL(dense − l<sub>1</sub>/l<sub>2</sub>) and H-ELM models differ only 3% to 4% in classification accuracy on average, and ML-KELM models differ less than 1% in classification accuracy on average, indicating that our proposed distributed depth model can match the performance of centralized models. In addition, the classification accuracy of D-sdRVFL(dense) model is higher than that of D-sdRVFL(d) model.</p><table-wrap id="table2" ><label><xref ref-type="table" rid="table2">Table 2</xref></label><caption><title> CAR (%) for different algorithms on the test datasets</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Dataset</th><th align="center" valign="middle" >H-ELM</th><th align="center" valign="middle" >ML-KELM</th><th align="center" valign="middle" >sdRVFL(d − l<sub>1</sub>/l<sub>2</sub>)</th><th align="center" valign="middle" >sdRVFL(dense − l<sub>1</sub>/l<sub>2</sub>)</th><th align="center" valign="middle" >D-sdRVFL(d)</th><th align="center" valign="middle" >D-sdRVFL(dense)</th></tr></thead><tr><td align="center" valign="middle" >musk-2</td><td align="center" valign="middle" >94.65</td><td align="center" valign="middle" >84.60</td><td align="center" valign="middle" >94.49</td><td align="center" valign="middle" >95.20</td><td align="center" valign="middle" >88.38</td><td align="center" valign="middle" >88.59</td></tr><tr><td align="center" valign="middle" >waveform</td><td align="center" valign="middle" >86.27</td><td align="center" valign="middle" >85.60</td><td align="center" valign="middle" >84.67</td><td align="center" valign="middle" >85.73</td><td align="center" valign="middle" >81.47</td><td align="center" valign="middle" >82.40</td></tr><tr><td align="center" valign="middle" >bank</td><td align="center" valign="middle" >89.08</td><td align="center" valign="middle" >88.63</td><td align="center" valign="middle" >89.00</td><td align="center" valign="middle" >89.15</td><td align="center" valign="middle" >88.63</td><td align="center" valign="middle" >88.56</td></tr><tr><td align="center" valign="middle" >glass</td><td align="center" valign="middle" >58.46</td><td align="center" valign="middle" >49.23</td><td align="center" valign="middle" >58,46</td><td align="center" valign="middle" >60.00</td><td align="center" valign="middle" >56.92</td><td align="center" valign="middle" >56.92</td></tr><tr><td align="center" valign="middle" >statlog-image</td><td align="center" valign="middle" >93.91</td><td align="center" valign="middle" >82.61</td><td align="center" valign="middle" >91.74</td><td align="center" valign="middle" >92.17</td><td align="center" valign="middle" >87.97</td><td align="center" valign="middle" >89.28</td></tr><tr><td align="center" valign="middle" >credit-approval</td><td align="center" valign="middle" >81.46</td><td align="center" valign="middle" >85.85</td><td align="center" valign="middle" >81.48</td><td align="center" valign="middle" >87.04</td><td align="center" valign="middle" >77.78</td><td align="center" valign="middle" >79.26</td></tr><tr><td align="center" valign="middle" >Mean Acc.</td><td align="center" valign="middle" >83.97</td><td align="center" valign="middle" >79.42</td><td align="center" valign="middle" >83.30</td><td align="center" valign="middle" >84.88</td><td align="center" valign="middle" >80.19</td><td align="center" valign="middle" >80.84</td></tr></tbody></table></table-wrap></sec><sec id="s4_2_2"><title>4.2.2. Training Time</title><p>As shown in <xref ref-type="table" rid="table3">Table 3</xref>, we observe that for the D-sdRVFL(d) and D-sdRVFL(dense) models with 5 agents and 3 hidden layers, the actual training time per agent is slightly higher than that of the centralized model, but the training time is greatly reduced compared to the ML-KELM model. In the following experiments, we discussed the change of training time of each agent in distributed model after changing the number of hidden layer network layers in the network. We found that with the increase of network layers and the number of agents, the training time of single agent will decrease continuously. On the contrary, the training time of centralized model will increase continuously.</p></sec></sec><sec id="s4_3"><title>4.3. Correlation Analysis of Model Robustness</title><p>In this experiment, we change the number of hidden layers in the network to observe the changes in classification accuracy and training time. Three representative data sets were selected as the data sets of this experiment, namely musk-2, waveform and credit-approval. These three data sets also represent large, medium and small data sets.</p><p>As shown in <xref ref-type="fig" rid="fig1">Figure 1</xref>, in this experiment we compared the classification accuracy and training time of two centralized depth models and two proposed distributed models, and the number of hidden layers changed from 3 to 7. For the model classification accuracy, on the dataset Waveform, the model classification accuracy of centralized deep RVFL model and distributed deep RVFL model does not change significantly with the increase of network layers, the difference between the highest and lowest is less than 2%, there is no obvious increase and decrease, and the highest accuracy does not appear in the model with the most layers. In Musk-2, the classification accuracy of centralized deep RVFL model and D-sdRVFL(d) model does not change significantly with the increase of network layers, while in D-sdRVFL(dense) model, the classification accuracy of model increases with the increase of network layers, and reaches the highest when the number of hidden layers reaches 6, and decreases after reaching 7 layers. In credit-approval dataset, the classification accuracy of centralized deep RVFL model and D-sdRVFL(d) model increases first and then decreases with the increase of network layers, while in D-sdRVFL(dense) model, the classification</p><table-wrap id="table3" ><label><xref ref-type="table" rid="table3">Table 3</xref></label><caption><title> Average training time (s) per node for different algorithms on training datasets</title></caption><table><tbody><thead><tr><th align="center" valign="middle" >Dataset</th><th align="center" valign="middle" >H-ELM</th><th align="center" valign="middle" >ML-KELM</th><th align="center" valign="middle" >sdRVFL(d − l<sub>1</sub>/l<sub>2</sub>)</th><th align="center" valign="middle" >sdRVFL(dense − l<sub>1</sub>/l<sub>2</sub>)</th><th align="center" valign="middle" >D-sdRVFL(d)</th><th align="center" valign="middle" >D-sdRVFL(dense)</th></tr></thead><tr><td align="center" valign="middle" >musk-2</td><td align="center" valign="middle" >0.138</td><td align="center" valign="middle" >19.275</td><td align="center" valign="middle" >0.097</td><td align="center" valign="middle" >0.231</td><td align="center" valign="middle" >0.430</td><td align="center" valign="middle" >0.840</td></tr><tr><td align="center" valign="middle" >waveform</td><td align="center" valign="middle" >0.040</td><td align="center" valign="middle" >7.583</td><td align="center" valign="middle" >0.031</td><td align="center" valign="middle" >0.046</td><td align="center" valign="middle" >0.114</td><td align="center" valign="middle" >0.240</td></tr><tr><td align="center" valign="middle" >bank</td><td align="center" valign="middle" >0.041</td><td align="center" valign="middle" >5.424</td><td align="center" valign="middle" >0.019</td><td align="center" valign="middle" >0.037</td><td align="center" valign="middle" >0.060</td><td align="center" valign="middle" >0.092</td></tr><tr><td align="center" valign="middle" >glass</td><td align="center" valign="middle" >0.062</td><td align="center" valign="middle" >0.012</td><td align="center" valign="middle" >0.014</td><td align="center" valign="middle" >0.023</td><td align="center" valign="middle" >0.083</td><td align="center" valign="middle" >0.160</td></tr><tr><td align="center" valign="middle" >statlog-image</td><td align="center" valign="middle" >0.033</td><td align="center" valign="middle" >2.135</td><td align="center" valign="middle" >0.032</td><td align="center" valign="middle" >0.052</td><td align="center" valign="middle" >0.116</td><td align="center" valign="middle" >0.234</td></tr><tr><td align="center" valign="middle" >credit-approval</td><td align="center" valign="middle" >0.057</td><td align="center" valign="middle" >0.116</td><td align="center" valign="middle" >0.014</td><td align="center" valign="middle" >0.024</td><td align="center" valign="middle" >0.102</td><td align="center" valign="middle" >0.212</td></tr></tbody></table></table-wrap><p>accuracy decreases gradually with the increase of network layers. Each model shows different characteristics on different data sets, but when other parameters are fixed and only the number of layers is changed, the classification accuracy of the model does not change greatly, the maximum change is not more than 7%, most of them are concentrated in about 2%, and the change of distributed model is slightly larger than that of centralized model, thus verifying the robustness of the model.</p><p>As for the training time of the model, it can be seen from the training results of the three data sets that the training time of a single node in the centralized model will increase with the increase of the number of hidden layers of the network, while the training time of each node in the distributed model will gradually decrease with the increase of the number of layers of the network, and the average training time of each node in the distributed network will be lower than that of the centralized network when the number of layers of the network is greater than 4. With the increase of the number of layers of the network, distributed networks have more and more obvious advantages in training time, but can maintain robustness in training effect.</p></sec></sec><sec id="s5"><title>5. Conclusions</title><p>Based on the deep RVFL model, this paper proposes a completely distributed deep RVFL algorithm. In the fully distributed framework, agents in the network topology only communicate with each other, and do not need to interact with the original data. At the same time, DAC and ADMM algorithms are used to achieve collaborative optimization between agents in hidden layer and output layer, avoiding the existence of central servers and effectively protecting data privacy. Through experiments on several representative classification data sets show that the proposed algorithm has good classification accuracy and can greatly save the training time of each agent. At the same time, the robustness of the model is verified by changing the number of hidden layers.</p><p>The outlook for future work is mainly divided into two aspects. Firstly, in the aspect of algorithm, DAC and ADMM algorithms are used for collaborative optimization, which needs two iterations and consumes more training time. In the later research, other collaborative optimization methods will be selected to reduce the number of iterations in the process, thus further reducing the training time. Second, in terms of model application, relevant experiments have been carried out only on classification tasks to verify the effectiveness of the model, while experiments on other tasks of machine learning need to be expanded and verified.</p></sec><sec id="s6"><title>Acknowledgements</title><p>This work was supported in part by the National Natural Science Foundation of China (No. 62166013), the Natural Science Foundation of Guangxi (No. 2022GXNSFAA035499) and the Foundation of Guilin University of Technology (No. GLUTQD2007029).</p></sec><sec id="s7"><title>Conflicts of Interest</title><p>The authors declare no conflicts of interest regarding the publication of this paper.</p></sec><sec id="s8"><title>Cite this paper</title><p>Zhu, H.D. and Ai, W. (2024) Fully Distributed Learning for Deep Random Vector Functional-Link Networks. Journal of Applied Mathematics and Physics, 12, 1247-1262. https://doi.org/10.4236/jamp.2024.124077</p></sec></body><back><ref-list><title>References</title><ref id="scirp.132793-ref1"><label>1</label><mixed-citation publication-type="other" xlink:type="simple">Gupta, D. and Rani, R. (2019) A Study of Big Data Evolution and Research Challenges. &lt;i&gt;Journal of Information Science&lt;/i&gt;, 45, 322-340. &lt;br&gt;https://doi.org/10.1177/0165551518789880</mixed-citation></ref><ref id="scirp.132793-ref2"><label>2</label><mixed-citation publication-type="other" xlink:type="simple">Peteiro-Barral, D. and Guijarro-Berdinas, B. (2013) A Survey of Methods for Distributed Machine Learning. &lt;i&gt;Progress in Artificial Intelligence&lt;/i&gt;, 2, 1-11. &lt;br&gt;https://doi.org/10.1007/s13748-012-0035-5</mixed-citation></ref><ref id="scirp.132793-ref3"><label>3</label><mixed-citation publication-type="other" xlink:type="simple">Dean, J., &lt;i&gt;et al&lt;/i&gt;. (2012) Large Scale Distributed Deep Networks. &lt;i&gt;NIPS&lt;/i&gt;&amp;#8217;12: &lt;i&gt;Proceedings &lt;/i&gt;&lt;i&gt;of the &lt;/i&gt;25&lt;i&gt;th International Conference on Neural Information Processing Systems&lt;/i&gt;, 1, 1223-1231.</mixed-citation></ref><ref id="scirp.132793-ref4"><label>4</label><mixed-citation publication-type="other" xlink:type="simple">Abadi, M., &lt;i&gt;et al&lt;/i&gt;. (2016) TensorFlow: A System for Large-Scale Machine Learning. 12&lt;i&gt;th USENIX Symposium on Operating Systems Design and Implementation&lt;/i&gt;, Savannah, 2-4 November 2016, 265-283.</mixed-citation></ref><ref id="scirp.132793-ref5"><label>5</label><mixed-citation publication-type="other" xlink:type="simple">Sergeev, A. and Del Balso, M. (2018) Horovod: Fast and Easy Distributed Deep Learning in Tensorflow. arXiv: 1802.05799.</mixed-citation></ref><ref id="scirp.132793-ref6"><label>6</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, D., Chen, X., Wang, D. and Shi, J. (2018) A Survey on Collaborative Deep Learning and Privacy-Preserving. 2018&lt;i&gt; IEEE &lt;/i&gt;3&lt;i&gt;rd International Conference on Data Science in Cyberspace &lt;/i&gt;(&lt;i&gt;DSC&lt;/i&gt;), Guangzhou, 18-21 June 2018, 652-658. &lt;br&gt;https://doi.org/10.1109/DSC.2018.00104</mixed-citation></ref><ref id="scirp.132793-ref7"><label>7</label><mixed-citation publication-type="other" xlink:type="simple">Li, P., Li, J., Huang, Z., Li, T., Zhi Gao, C., Yiu, S. and Chen, K. (2017) Multi-Key Privacy-Preserving Deep Learning in Cloud Computing. &lt;i&gt;Future Generation Compu&lt;/i&gt;&lt;i&gt;t&lt;/i&gt;&lt;i&gt;er&lt;/i&gt;&lt;i&gt; Systems&lt;/i&gt;, 74, 76-85. &lt;br&gt;https://doi.org/10.1016/j.future.2017.02.006</mixed-citation></ref><ref id="scirp.132793-ref8"><label>8</label><mixed-citation publication-type="other" xlink:type="simple">Kwabena, O., Qin, Z., Zhuang, T. and Qin, Z. (2019) Mscryptonet: Multi-Scheme Privacy-Preserving Deep Learning in Cloud Computing. &lt;i&gt;IEEE Access&lt;/i&gt;, 7, 29344-29354. &lt;br&gt;https://doi.org/10.1109/ACCESS.2019.2901219</mixed-citation></ref><ref id="scirp.132793-ref9"><label>9</label><mixed-citation publication-type="other" xlink:type="simple">Nedic, A., Olshevsky, A. and Rabbat, M. (2017) Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization. &lt;i&gt;Proceedings of the IEEE&lt;/i&gt;, 106, 953-976. &lt;br&gt;https://doi.org/10.1109/JPROC.2018.2817461</mixed-citation></ref><ref id="scirp.132793-ref10"><label>10</label><mixed-citation publication-type="other" xlink:type="simple">Liang, Y., Cai, Z., Yu, J., Han, Q. and Li, Y. (2018) Deep Learning Based Inference of Private Information Using Embedded Sensors in Smart Devices. &lt;i&gt;IEEE Network&lt;/i&gt;, 32, 8-14. &lt;br&gt;https://doi.org/10.1109/MNET.2018.1700349</mixed-citation></ref><ref id="scirp.132793-ref11"><label>11</label><mixed-citation publication-type="other" xlink:type="simple">Cattivelli, F. and Sayed, A.H. (2010) Diffusion LMS Strategies for Distributed Estimation. &lt;i&gt;IEEE Transactions on Signal Processing&lt;/i&gt;, 58, 1035-1048. &lt;br&gt;https://doi.org/10.1109/TSP.2009.2033729</mixed-citation></ref><ref id="scirp.132793-ref12"><label>12</label><mixed-citation publication-type="other" xlink:type="simple">Chang, K., &lt;i&gt;et al&lt;/i&gt;. (2018) Distributed Deep Learning Networks among Institutions for Medical Imaging. &lt;i&gt;Journal of the American Medical Informatics Association&lt;/i&gt;:&lt;i&gt; JAMIA&lt;/i&gt;, 25, 945-954. &lt;br&gt;https://doi.org/10.1093/jamia/ocy017</mixed-citation></ref><ref id="scirp.132793-ref13"><label>13</label><mixed-citation publication-type="other" xlink:type="simple">Jiang, Z., Balu, A., Hegde, C. and Sarkar, S. (2017) Collaborative Deep Learning in Fixed Topology Networks. 31&lt;i&gt;st Conference on Neural Information Processing Systems&lt;/i&gt; (&lt;i&gt;NIPS&lt;/i&gt; 2017), Long Beach, 4-9 December 2017, 5904-5914.</mixed-citation></ref><ref id="scirp.132793-ref14"><label>14</label><mixed-citation publication-type="other" xlink:type="simple">Alhamdoosh, M. and Wang, D. (2014) Fast Decorrelated Neural Network Ensembles with Random Weights. &lt;i&gt;Information Sciences&lt;/i&gt;, 264, 104-117. &lt;br&gt;https://doi.org/10.1016/j.ins.2013.12.016</mixed-citation></ref><ref id="scirp.132793-ref15"><label>15</label><mixed-citation publication-type="other" xlink:type="simple">Pascanu, R., Mikolov, T. and Bengio, Y. (2013) On the Difficulty of Training Recurrent Neural Networks. &lt;i&gt;International Conference on Machine Learning&lt;/i&gt;, Atlanta, 17-19 June 2013, 1310-1318.</mixed-citation></ref><ref id="scirp.132793-ref16"><label>16</label><mixed-citation publication-type="other" xlink:type="simple">Widrow, B., Greenblatt, A., Kim, Y. and Park, D. (2013) The No-Prop Algorithm: A New Learning Algorithm for Multilayer Neural Networks. &lt;i&gt;Neural Networks&lt;/i&gt;, 37, 182-188. &lt;br&gt;https://doi.org/10.1016/j.neunet.2012.09.020</mixed-citation></ref><ref id="scirp.132793-ref17"><label>17</label><mixed-citation publication-type="other" xlink:type="simple">Tang, J., Deng, C. and Huang, G.-B. (2015) Extreme Learning Machine for Multilayer Perceptron. &lt;i&gt;IEEE Transactions on Neural Networks and Learning Systems&lt;/i&gt;, 27, 809-821. &lt;br&gt;https://doi.org/10.1109/TNNLS.2015.2424995</mixed-citation></ref><ref id="scirp.132793-ref18"><label>18</label><mixed-citation publication-type="other" xlink:type="simple">Zhang, Y., Wu, J., Cai, Z., Du, B. and Philip, S.Y. (2019) An Unsupervised Parameter Learning Model for RVFL Neural Network. &lt;i&gt;Neural Networks&lt;/i&gt;, 112, 85-97. &lt;br&gt;https://doi.org/10.1016/j.neunet.2019.01.007</mixed-citation></ref><ref id="scirp.132793-ref19"><label>19</label><mixed-citation publication-type="other" xlink:type="simple">Katuwal, R. and Suganthan, P.N. (2019) Stacked Autoencoder Based Deep Random Vector Functional Link Neural Network for Classification. &lt;i&gt;Applied Soft Computing&lt;/i&gt;, 85, Article ID: 105854. &lt;br&gt;https://doi.org/10.1016/j.asoc.2019.105854</mixed-citation></ref><ref id="scirp.132793-ref20"><label>20</label><mixed-citation publication-type="other" xlink:type="simple">Olfati-Saber, R., Fax, J.A. and Murray, R.M. (2007) Consensus and Cooperation in Networked Multi-Agent Systems. &lt;i&gt;Proceedings of the IEEE&lt;/i&gt;, 95, 215-233. &lt;br&gt;https://doi.org/10.1109/JPROC.2006.887293</mixed-citation></ref><ref id="scirp.132793-ref21"><label>21</label><mixed-citation publication-type="other" xlink:type="simple">Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J., &lt;i&gt;et al&lt;/i&gt;. (2011) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. &lt;i&gt;Foundations and Trends&amp;#174; in Machine Learning&lt;/i&gt;, 3, 1-122. &lt;br&gt;https://doi.org/10.1561/2200000016</mixed-citation></ref><ref id="scirp.132793-ref22"><label>22</label><mixed-citation publication-type="other" xlink:type="simple">Wei, E. and Ozdaglar, A. (2012) Distributed Alternating Direction Method of Multipliers. 2012&lt;i&gt; IEEE &lt;/i&gt;51&lt;i&gt;st IEEE Conference on Decision and Control&lt;/i&gt; (&lt;i&gt;CDC&lt;/i&gt;), Maui, 10-13 December 2012, 5445-5450. &lt;br&gt;https://doi.org/10.1109/CDC.2012.6425904</mixed-citation></ref><ref id="scirp.132793-ref23"><label>23</label><mixed-citation publication-type="other" xlink:type="simple">Bredies, K. and Lorenz, D.A. (2008) Linear Convergence of Iterative Soft Thresholding. &lt;i&gt;Journal of Fourier Analysis and Applications&lt;/i&gt;, 14, 813-837. &lt;br&gt;https://doi.org/10.1007/s00041-008-9041-1</mixed-citation></ref></ref-list></back></article>