<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "http://dtd.nlm.nih.gov/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="3.0" xml:lang="en" article-type="research article">
 <front>
  <journal-meta>
   <journal-id journal-id-type="publisher-id">
    aad
   </journal-id>
   <journal-title-group>
    <journal-title>
     Advances in Alzheimer's Disease
    </journal-title>
   </journal-title-group>
   <issn pub-type="epub">
    2169-2459
   </issn>
   <issn publication-format="print">
    2169-2467
   </issn>
   <publisher>
    <publisher-name>
     Scientific Research Publishing
    </publisher-name>
   </publisher>
  </journal-meta>
  <article-meta>
   <article-id pub-id-type="doi">
    10.4236/aad.2025.143007
   </article-id>
   <article-id pub-id-type="publisher-id">
    aad-146013
   </article-id>
   <article-categories>
    <subj-group subj-group-type="heading">
     <subject>
      Articles
     </subject>
    </subj-group>
    <subj-group subj-group-type="Discipline-v2">
     <subject>
      Biomedical 
     </subject>
     <subject>
       Life Sciences, Medicine 
     </subject>
     <subject>
       Healthcare
     </subject>
    </subj-group>
   </article-categories>
   <title-group>
    FOCUS-Net: A Hybrid Denoising and Confidence-Weighted Attention Fusion Framework for Robust Alzheimer’s Disease Classification from MRI Data
   </title-group>
   <contrib-group>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Samuel
      </surname>
      <given-names>
       Ocen
      </given-names>
     </name> 
     <xref ref-type="aff" rid="aff1"> 
      <sup>1</sup>
     </xref> 
     <xref ref-type="aff" rid="aff2"> 
      <sup>2</sup>
     </xref>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Lawrence
      </surname>
      <given-names>
       Muchemi
      </given-names>
     </name> 
     <xref ref-type="aff" rid="aff1"> 
      <sup>1</sup>
     </xref>
    </contrib>
    <contrib contrib-type="author" xlink:type="simple">
     <name name-style="western">
      <surname>
       Michaelina Almaz
      </surname>
      <given-names>
       Yohannis
      </given-names>
     </name> 
     <xref ref-type="aff" rid="aff1"> 
      <sup>1</sup>
     </xref>
    </contrib>
   </contrib-group> 
   <aff id="aff1">
    <addr-line>
     aDepartment of Computing and Informatics, University of Nairobi, Nairobi, Kenya
    </addr-line> 
   </aff> 
   <aff id="aff2">
    <addr-line>
     aDepartment of Computer Science, Mountains of the Moon University, Fort Portal, Uganda
    </addr-line> 
   </aff> 
   <pub-date pub-type="epub">
    <day>
     03
    </day> 
    <month>
     09
    </month>
    <year>
     2025
    </year>
   </pub-date> 
   <volume>
    14
   </volume> 
   <issue>
    03
   </issue>
   <fpage>
    99
   </fpage>
   <lpage>
    115
   </lpage>
   <history>
    <date date-type="received">
     <day>
      3,
     </day>
     <month>
      September
     </month>
     <year>
      2025
     </year>
    </date>
    <date date-type="published">
     <day>
      22,
     </day>
     <month>
      September
     </month>
     <year>
      2025
     </year> 
    </date> 
    <date date-type="accepted">
     <day>
      22,
     </day>
     <month>
      September
     </month>
     <year>
      2025
     </year> 
    </date>
   </history>
   <permissions>
    <copyright-statement>
     © Copyright 2014 by authors and Scientific Research Publishing Inc. 
    </copyright-statement>
    <copyright-year>
     2014
    </copyright-year>
    <license>
     <license-p>
      This work is licensed under the Creative Commons Attribution International License (CC BY). http://creativecommons.org/licenses/by/4.0/
     </license-p>
    </license>
   </permissions>
   <abstract>
    Alzheimer’s Disease (AD) is a progressive neurodegenerative disorder where early and accurate diagnosis from Magnetic Resonance Imaging (MRI) is critical for intervention. However, low signal-to-noise ratio, artifacts, and high inter-rater variability in clinical MRI scans pose significant challenges for automated diagnostic systems. This paper proposes FOCUS-Net, a novel end-to-end framework designed to enhance the robustness and interpretability of AD stage classification. Our approach integrates a hybrid denoising module, combining traditional filters (Wavelet, Gaussian, Anisotropic Diffusion, NLM) with a 3D U-Net CNN to suppress noise while preserving anatomical integrity. The cleaned images are processed by a diverse ensemble of 3D CNNs (ResNet-18, DenseNet-121, and a custom lightweight model). The core innovation is a novel confidence- and consistency-weighted fusion algorithm that dynamically aggregates ensemble predictions. Each model is weighted based on its predictive confidence (measured by the Shannon entropy of its output) and its spatial consistency (measured by the Dice similarity of its Grad-CAM attention mask with the ensemble consensus), balanced by a learnable parameter 
    <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
      λ
     </mi> 
    </math> . Preliminary experiments on the ADNI dataset demonstrate that FOCUS-Net achieves a classification accuracy of 88.9% and an AUC-ROC of 0.975, outperforming established baselines including a single model (80.0%), a simple averaging ensemble (84.4%), and fusion strategies using only confidence (86.7%) or only consistency (84.4%). The framework not only improves diagnostic accuracy but also provides interpretable visual explanations through consensus attention maps, offering a significant step towards reliable and trustworthy computer-aided diagnosis of Alzheimer’s disease. 
   </abstract>
   <kwd-group> 
    <kwd>
     Alzheimer’s Disease
    </kwd> 
    <kwd>
      MRI Classification
    </kwd> 
    <kwd>
      Hybrid Denoising
    </kwd> 
    <kwd>
      Ensemble Learning
    </kwd> 
    <kwd>
      Explainable AI (XAI)
    </kwd> 
    <kwd>
      Confidence-Weighted Fusion
    </kwd> 
    <kwd>
      Attention Mechanisms
    </kwd>
   </kwd-group>
  </article-meta>
 </front>
 <body>
  <sec id="s1">
   <title>1. Introduction</title>
   <p>Alzheimer’s Disease (AD) is the most common cause of dementia, affecting millions worldwide <xref ref-type="bibr" rid="scirp.146013-1">
     [1]
    </xref>-<xref ref-type="bibr" rid="scirp.146013-3">
     [3]
    </xref>. MRI-based assessment of structural brain atrophy, particularly in the hippocampus and medial temporal lobe, is a cornerstone of clinical diagnosis <xref ref-type="bibr" rid="scirp.146013-4">
     [4]
    </xref>. The rise of deep learning has enabled the development of automated tools for classifying AD stages (e.g., Non-Demented, Mildly Demented, Moderately Demented) from MRI data <xref ref-type="bibr" rid="scirp.146013-5">
     [5]
    </xref>. Despite promising results, these models are often sensitive to low-quality, noisy data commonly encountered in clinical settings. Furthermore, model ensembles, while powerful, typically rely on simple averaging, failing to account for the varying reliability and focus of individual models within the ensemble.</p>
   <p>This work addresses these limitations by proposing a comprehensive pipeline that synergizes advanced image preprocessing with an intelligent model fusion strategy. Our contributions are threefod:</p>
   <p>1) Hybrid Denoising: We propose a sequential denoising algorithm that leverages the complementary strengths of traditional filters and CNNs to produce high-fidelity, denoised MRI inputs.</p>
   <p>2) Confidence- and Consistency-Weighted Fusion: We introduce a novel fusion algorithm that assigns a dynamic weight to each model in an ensemble based on its predictive uncertainty and the spatial consistency of its saliency maps with other models.</p>
   <p>3) End-to-End Design: We integrate these components into a cohesive pipeline (FOCUS-Net), designed to enhance the robustness, accuracy, and interpretability of automated AD diagnosis.</p>
  </sec><sec id="s2">
   <title>
    <xref ref-type="bibr" rid="scirp.146013-"></xref>2. Literature Synthesis</title>
   <p>The automation of Alzheimer’s Disease (AD) diagnosis using deep learning on MRI data has been a prolific field of research. Existing work can be broadly categorized into studies focusing on image preprocessing, novel model architectures, and ensemble learning techniques. This review situates our proposed FOCUS-Net framework within these established areas and highlights its novel integrations.</p>
   <p>1) Deep Learning for AD Classification</p>
   <p>The application of Convolutional Neural Networks (CNNs) to AD diagnosis is well-documented. Early work by Islam and Zhang <xref ref-type="bibr" rid="scirp.146013-6">
     [6]
    </xref> demonstrated the efficacy of using 3D CNNs on structural MRI for classification. A common trend has been the use of pre-trained architectures (e.g., VGG, ResNet, Inception) adapted for medical images, leveraging transfer learning to overcome limited dataset sizes <xref ref-type="bibr" rid="scirp.146013-7">
     [7]
    </xref> <xref ref-type="bibr" rid="scirp.146013-8">
     [8]
    </xref>. These studies established that deep learning could automatically learn discriminative features, such as hippocampal and ventricular atrophy, that are hallmarks of AD progression, often matching or exceeding human expert performance in controlled settings. A review by <xref ref-type="bibr" rid="scirp.146013-9">
     [9]
    </xref> showed that deep learning models have been predominantly used in the detection of brain disorders in numerological patients.</p>
   <p>2) The Denoising Preprocessing Step</p>
   <p>The critical impact of image quality on diagnostic accuracy is widely acknowledged but often addressed as a separate, offline step. Traditional filters like Non-Local Means (NLM) <xref ref-type="bibr" rid="scirp.146013-10">
     [10]
    </xref> and Anisotropic Diffusion <xref ref-type="bibr" rid="scirp.146013-11">
     [11]
    </xref> are prized for their effectiveness in reducing noise while preserving edges. More recently, deep learning-based denoising methods, particularly those using autoencoder architectures like U-Net <xref ref-type="bibr" rid="scirp.146013-12">
     [12]
    </xref>, have shown superior performance in tasks like MRI artifact suppression. However, most diagnostic pipelines either use raw images or apply a single denoising method. Our hybrid approach is motivated by the hypothesis that a sequential application of traditional and deep learning-based methods can synergistically leverage the strength of both: the structural preservation of traditional filters and the high-fidelity denoising of CNNs.</p>
   <p>3) Explainability and Attention in Medical AI</p>
   <p>As deep learning models are often seen as “black boxes,” there has been a significant push towards making their decisions interpretable, especially in medicine. Techniques like Grad-CAM <xref ref-type="bibr" rid="scirp.146013-13">
     [13]
    </xref> and its variants have become a standard tool for visualizing the regions of an image that most influenced a model’s prediction. In neuroimaging, this allows researchers to verify that a model is focusing on biologically plausible regions (e.g., the hippocampus). While used extensively for post-hoc analysis, few works have integrated these attention mechanisms directly into the decision-making process of an ensemble. Our method innovates by using these attention masks not just for visualization, but as a quantitative signal for model fusion.</p>
   <p>4) Advanced Ensemble Learning</p>
   <p>Model ensembles are a proven strategy to boost performance and robustness. The standard approach is to average predictions (soft voting) or votes (hard voting) <xref ref-type="bibr" rid="scirp.146013-14">
     [14]
    </xref>. Some advanced methods weight models based on their historical accuracy or confidence scores <xref ref-type="bibr" rid="scirp.146013-15">
     [15]
    </xref>. However, these methods operate solely on the final output probabilities and ignore the rich spatial information contained within the models. A key gap in the literature is the lack of methods that consider the semantic agreement between models—i.e., whether different models are making decisions for the same right reasons. Our proposed consistency weight ( 
    <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
      <msup> 
       <mstyle mathvariant="bold" mathsize="normal"> 
        <mi>
          w 
        </mi> 
       </mstyle> 
       <mrow> 
        <mtext>
          consist 
        </mtext> 
       </mrow> 
      </msup> 
     </mrow> 
    </math>), based on the Dice score of attention masks, directly addresses this gap. It provides a mechanism to de-weight a model that is confident but is focusing on image artifacts or irrelevant regions, thereby enhancing the ensemble’s robustness and reliability.</p>
   <p>5) Uncertainty Quantification</p>
   <p>Predictive uncertainty is a critical metric for trustworthy AI in healthcare. Entropy, a measure of the dispersion of a probability distribution, is a common measure of epistemic uncertainty <xref ref-type="bibr" rid="scirp.146013-16">
     [16]
    </xref>. Models with high predictive entropy are less certain, and their decisions should be treated with caution. Our confidence weight ( 
    <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
      <msup> 
       <mstyle mathvariant="bold" mathsize="normal"> 
        <mi>
          w 
        </mi> 
       </mstyle> 
       <mrow> 
        <mtext>
          conf 
        </mtext> 
       </mrow> 
      </msup> 
     </mrow> 
    </math>) formally incorporates this principle into the fusion process, ensuring that the predictions of more certain models contribute more heavily to the final output. The integration of this with spatial consistency creates a dual-feedback mechanism for intelligent fusion.</p>
   <p>In summary, FOCUS-Net integrates these distinct strands of research—hybrid denoising, explainable AI, and advanced ensemble learning—into a unified, end-to-end pipeline. It moves beyond naive averaging by fusing models based on both what they decide (confidence) and why they decide it (spatial consistency), offering a potential step forward in robust and interpretable automated diagnosis (<xref ref-type="table" rid="table1">
     Table 1
    </xref>).</p>
   <table-wrap id="table1">
    <label>
     <xref ref-type="table" rid="table1">
      Table 1
     </xref></label>
    <caption>
     <title>
      <xref ref-type="bibr" rid="scirp.146013-"></xref>Table 1. A summary of related work and the contribution of the proposed FOCUS-Net framework.</title>
    </caption>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td custom-top-td acenter" width="11.71%"><p style="text-align:center">Attributes</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="17.65%"><p style="text-align:center">AD Deep Learning</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="14.71%"><p style="text-align:center">Image Denoising</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="13.24%"><p style="text-align:center">Explainability (XAI)</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="11.76%"><p style="text-align:center">Ensemble Learning</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="14.70%"><p style="text-align:center">Uncertainty Quantification</p></td> 
      <td class="custom-bottom-td custom-top-td acenter" width="16.23%"><p style="text-align:center">Overall Pipeline</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td acenter" width="11.71%"><p style="text-align:center">Key Papers &amp; Concepts</p></td> 
      <td class="custom-top-td aleft" width="17.65%"><p style="text-align:left">
        <xref ref-type="bibr" rid="scirp.146013-6">
         [6]
        </xref> 3D CNNs; <xref ref-type="bibr" rid="scirp.146013-7">
         [7]
        </xref> Transfer Learning</p></td> 
      <td class="custom-top-td aleft" width="14.71%"><p style="text-align:left">
        <xref ref-type="bibr" rid="scirp.146013-10">
         [10]
        </xref> Non-Local Means; <xref ref-type="bibr" rid="scirp.146013-11">
         [11]
        </xref> Anisotropic Diffusion; <xref ref-type="bibr" rid="scirp.146013-12">
         [12]
        </xref> U-Net</p></td> 
      <td class="custom-top-td aleft" width="13.24%"><p style="text-align:left">
        <xref ref-type="bibr" rid="scirp.146013-13">
         [13]
        </xref> Grad-CAM</p></td> 
      <td class="custom-top-td aleft" width="11.76%"><p style="text-align:left">
        <xref ref-type="bibr" rid="scirp.146013-14">
         [14]
        </xref> Standard voting; <xref ref-type="bibr" rid="scirp.146013-15">
         [15]
        </xref> Weighting by confidence</p></td> 
      <td class="custom-top-td aleft" width="14.70%"><p style="text-align:left">
        <xref ref-type="bibr" rid="scirp.146013-16">
         [16]
        </xref> Predictive Entropy</p></td> 
      <td class="custom-top-td aleft" width="16.23%"><p style="text-align:left">(Many papers focus on one aspect)</p></td> 
     </tr> 
     <tr> 
      <td class="acenter" width="11.71%"><p style="text-align:center">Limitation/Gap</p></td> 
      <td class="aleft" width="17.65%"><p style="text-align:left">Models are often applied to pre-processed data without an integrated, optimized denoising step.</p></td> 
      <td class="aleft" width="14.71%"><p style="text-align:left">Traditional and deep learning methods are often used in isolation.</p></td> 
      <td class="aleft" width="13.24%"><p style="text-align:left">Used for post-hoc visualization, not as an active signal.</p></td> 
      <td class="aleft" width="11.76%"><p style="text-align:left">Ignore why models make decisions; fuse only on output.</p></td> 
      <td class="aleft" width="14.70%"><p style="text-align:left">Not always integrated into fusion logic.</p></td> 
      <td class="aleft" width="16.23%"><p style="text-align:left">Lack of end-to-end frameworks that combine preprocessing with explainable fusion.</p></td> 
     </tr> 
     <tr> 
      <td class="custom-bottom-td acenter" width="11.71%"><p style="text-align:center">How FOCUS-Net Addresses It</p></td> 
      <td class="custom-bottom-td aleft" width="17.65%"><p style="text-align:left">Proposes an integrated, hybrid denoising module as a crucial first step within the pipeline.</p></td> 
      <td class="custom-bottom-td aleft" width="14.71%"><p style="text-align:left">Sequentially combines a traditional filter for structural preservation with a CNN for residual noise removal.</p></td> 
      <td class="custom-bottom-td aleft" width="13.24%"><p style="text-align:left">Quantifies attention masks and uses them as a core signal for model fusion.</p></td> 
      <td class="custom-bottom-td aleft" width="11.76%"><p style="text-align:left">Introduces a novel consistency weight based on semantic agreement of attention.</p></td> 
      <td class="custom-bottom-td aleft" width="14.70%"><p style="text-align:left">Directly integrates entropy into a confidence weight for dynamic model weighting.</p></td> 
      <td class="custom-bottom-td aleft" width="16.23%"><p style="text-align:left">Provides a complete pipeline from raw input to final decision.</p></td> 
     </tr> 
    </table>
   </table-wrap>
  </sec><sec id="s3">
   <title>
    <xref ref-type="bibr" rid="scirp.146013-"></xref>3. Methodology</title>
   <p>The FOCUS-Net pipeline consists of two primary algorithmic components: a hybrid denoising preprocessor and the main classification and fusion pipeline.</p>
   <sec id="s3_1">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>3.1. Hybrid MRI Denoising Algorithm</title>
    <p>The goal of this stage is to suppress noise while preserving critical anatomical details crucial for diagnosis.</p>
    <p>Magnetic Resonance Imaging (MRI) is highly sensitive to noise introduced by acquisition hardware, patient motion, and long scanning times. While conventional filtering methods are effective at removing structured noise, they often blur fine anatomical structures, whereas deep learning models can capture complex noise distributions but sometimes fail to preserve subtle edges. To address this trade-off, we propose a hybrid denoising approach that sequentially combines a traditional filtering step for structural preservation with a convolutional neural network (CNN) for residual noise suppression as applied in <xref ref-type="bibr" rid="scirp.146013-17">
      [17]
     </xref>. The objective of this stage is to produce cleaner images while retaining diagnostically relevant details, thereby ensuring reliable downstream analysis.</p>
    <p>Justification of Sequence: The order of operations—base filters followed by CNN refinement—is critical. The base filters (Wavelet, Gaussian, Anisotropic Diffusion, NLM) each target specific noise types and collectively produce an image where gross noise is suppressed but structured artifacts (e.g., over-smoothing from Gaussian filtering or patch-discrepancies from NLM) may persist. The CNN is uniquely adept at learning to remove these residual artifacts. Reversing the order would force the CNN to handle severe noise directly (a harder task) and subsequent base filtering would likely degrade the CNN’s output by blurring recovered features or introducing new artifacts. Thus, our sequence ensures each component operates on its optimal input.</p>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td custom-top-td aleft" width="100.00%"><p style="text-align:left">Algorithm 1 Hybrid MRI Denoising</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td aleft" width="100.00%"><p style="text-align:left">Require:</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">1: Input: Raw, noisy MRI scan I<sub>raw</sub></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">2: Parameters: Pre-optimized parameters θ<sub>filter</sub>, θ<sub>CNN</sub></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">Ensure:</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">3: Output: Denoised image I<sub>denoised</sub> with high PSNR/SSIM.</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">4: procedure HYBRIDDENOISE (I<sub>raw</sub>)</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">5: I<sub>filtered</sub> ← NLM (I<sub>raw</sub>; θ<sub>filter</sub>) ▷ Stage 1: Anatomical Preservation</p><p style="text-align:left">// e.g., Non-Local Means with params optimized for anatomical preservation. Repeat process for all the base filters</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">6: Idenoised ← CNNdenoiser (I<sub>filtered</sub>; θ<sub>CNN</sub>) ▷ Stage 2: CNN-based Residual Noise Removal</p><p style="text-align:left">// A custom U-Net trained to remove residual noise and artifacts</p></td> 
     </tr> 
     <tr> 
      <td class="custom-bottom-td aleft" width="100.00%"><p style="text-align:left">7: return I<sub>denoised</sub></p></td> 
     </tr> 
    </table>
   </sec>
   <sec id="s3_2">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>3.2. Helper Functions and Mathematical Details</title>
    <p>To enable robust model fusion, we introduce a set of helper functions that compute adaptive weights reflecting the reliability of each individual model. These functions quantify two complementary aspects: 1) confidence, which measures the certainty of a model’s probabilistic predictions, and 2) consistency, which evaluates the degree of agreement between models in terms of their attention maps. By combining these measures through a learnable balancing parameter, the framework ensures that the fusion process favors models that are both confident and semantically aligned with their peers. The following section presents the algorithmic definitions and mathematical details of these helper functions.</p>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td custom-top-td aleft" width="100.00%"><p style="text-align:left">Algorithm 2 Helper Functions for Fusion Weights</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td aleft" width="100.00%"><p style="text-align:left">1: function CALCULATECONFIDENCE (list of probability vectors p)</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">2: Initialize vector 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              conf 
            </mtext> 
           </mrow> 
          </msup> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mo>
             [ 
           </mo> 
           <mn>
             0 
           </mn> 
           <mo>
             ] 
           </mo> 
          </mrow> 
          <mo>
            × 
          </mo> 
          <mi>
            K 
          </mi> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">3: for 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            k 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mn>
            1 
          </mn> 
         </mrow> 
        </math> to 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
           K 
         </mi> 
        </math> do</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">4: Let 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
           C 
         </mi> 
        </math> be the number of classes ▷ Explicitly define C</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">5: 
        <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mi>
             E 
           </mi> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mo>
            − 
          </mo> 
          <mstyle displaystyle="true"> 
           <msubsup> 
            <mo>
              ∑ 
            </mo> 
            <mrow> 
             <mi>
               c 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              C 
            </mi> 
           </msubsup> 
           <mrow> 
            <msubsup> 
             <mstyle mathvariant="bold" mathsize="normal"> 
              <mi>
                p 
              </mi> 
             </mstyle> 
             <mi>
               k 
             </mi> 
             <mrow> 
              <mrow> 
               <mo>
                 ( 
               </mo> 
               <mi>
                 c 
               </mi> 
               <mo>
                 ) 
               </mo> 
              </mrow> 
             </mrow> 
            </msubsup> 
           </mrow> 
          </mstyle> 
          <mo>
            ⋅ 
          </mo> 
          <mtext>
            log 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msubsup> 
             <mstyle mathvariant="bold" mathsize="normal"> 
              <mi>
                p 
              </mi> 
             </mstyle> 
             <mi>
               k 
             </mi> 
             <mrow> 
              <mrow> 
               <mo>
                 ( 
               </mo> 
               <mi>
                 c 
               </mi> 
               <mo>
                 ) 
               </mo> 
              </mrow> 
             </mrow> 
            </msubsup> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Calculate Shannon Entropy</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">6: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msubsup> 
           <mi>
             w 
           </mi> 
           <mi>
             k 
           </mi> 
           <mrow> 
            <mtext>
              conf 
            </mtext> 
           </mrow> 
          </msubsup> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mn>
             1 
           </mn> 
           <mo>
             / 
           </mo> 
           <mrow> 
            <mrow> 
             <mo>
               ( 
             </mo> 
             <mrow> 
              <mn>
                1 
              </mn> 
              <mo>
                + 
              </mo> 
              <msub> 
               <mi>
                 E 
               </mi> 
               <mi>
                 k 
               </mi> 
              </msub> 
             </mrow> 
             <mo>
               ) 
             </mo> 
            </mrow> 
           </mrow> 
          </mrow> 
         </mrow> 
        </math> ▷ Confidence ∝ inverse of uncertainty</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">7: return w<sup>conf</sup></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">8: function CALCULATECONSISTENCY (list of attention masks A)</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">9: Initialize vector 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msup> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mo>
             [ 
           </mo> 
           <mn>
             0 
           </mn> 
           <mo>
             ] 
           </mo> 
          </mrow> 
          <mo>
            × 
          </mo> 
          <mi>
            K 
          </mi> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">10: for 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            k 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mn>
            1 
          </mn> 
         </mrow> 
        </math> to 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
           K 
         </mi> 
        </math> do</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">11: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mn>
            0 
          </mn> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">12: for 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            j 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mn>
            1 
          </mn> 
         </mrow> 
        </math> to 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
           K 
         </mi> 
        </math> do</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">13: if 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            j 
          </mi> 
          <mo>
            ≠ 
          </mo> 
          <mi>
            k 
          </mi> 
         </mrow> 
        </math> then</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">14: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <msub> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            + 
          </mo> 
          <mtext>
            dice_score 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               A 
             </mi> 
             <mi>
               k 
             </mi> 
            </msub> 
            <mo>
              , 
            </mo> 
            <msub> 
             <mi>
               A 
             </mi> 
             <mi>
               j 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Sum pairwise Dice scores</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">15: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msubsup> 
           <mi>
             w 
           </mi> 
           <mi>
             k 
           </mi> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msubsup> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mrow> 
            <msub> 
             <mrow> 
              <mtext>
                consist 
              </mtext> 
             </mrow> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             / 
           </mo> 
           <mrow> 
            <mrow> 
             <mo>
               ( 
             </mo> 
             <mrow> 
              <mi>
                K 
              </mi> 
              <mo>
                − 
              </mo> 
              <mn>
                1 
              </mn> 
             </mrow> 
             <mo>
               ) 
             </mo> 
            </mrow> 
           </mrow> 
          </mrow> 
         </mrow> 
        </math> ▷ Average consistency for model k</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">16: Normalize 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msup> 
         </mrow> 
        </math> so that it sums to 1.</p></td> 
     </tr> 
     <tr> 
      <td class="custom-bottom-td aleft" width="100.00%"><p style="text-align:left">17: return 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msup> 
         </mrow> 
        </math></p></td> 
     </tr> 
    </table>
    <p>The Dice coefficient for two attention masks 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        A 
      </mi> 
     </math> and 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        B 
      </mi> 
     </math> is defined as:</p>
    <p>
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mtable> 
       <mtr> 
        <mtd> 
         <mtext>
           dice_score 
         </mtext> 
         <mrow> 
          <mo>
            ( 
          </mo> 
          <mrow> 
           <mi>
             A 
           </mi> 
           <mo>
             , 
           </mo> 
           <mi>
             B 
           </mi> 
          </mrow> 
          <mo>
            ) 
          </mo> 
         </mrow> 
         <mo>
           = 
         </mo> 
         <mfrac> 
          <mrow> 
           <mn>
             2 
           </mn> 
           <mo>
             ⋅ 
           </mo> 
           <mrow> 
            <mo>
              | 
            </mo> 
            <mrow> 
             <mi>
               A 
             </mi> 
             <mo>
               ∩ 
             </mo> 
             <mi>
               B 
             </mi> 
            </mrow> 
            <mo>
              | 
            </mo> 
           </mrow> 
          </mrow> 
          <mrow> 
           <mrow> 
            <mo>
              | 
            </mo> 
            <mi>
              A 
            </mi> 
            <mo>
              | 
            </mo> 
           </mrow> 
           <mo>
             + 
           </mo> 
           <mrow> 
            <mo>
              | 
            </mo> 
            <mi>
              B 
            </mi> 
            <mo>
              | 
            </mo> 
           </mrow> 
          </mrow> 
         </mfrac> 
        </mtd> 
       </mtr> 
       <mtr> 
        <mtd> 
         <mo>
           = 
         </mo> 
         <mfrac> 
          <mrow> 
           <mn>
             2 
           </mn> 
           <mo>
             ⋅ 
           </mo> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               i 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              H 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               j 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              W 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msup> 
            <mi>
              A 
            </mi> 
            <mrow> 
             <mrow> 
              <mo>
                ( 
              </mo> 
              <mrow> 
               <mi>
                 i 
               </mi> 
               <mo>
                 , 
               </mo> 
               <mi>
                 j 
               </mi> 
              </mrow> 
              <mo>
                ) 
              </mo> 
             </mrow> 
            </mrow> 
           </msup> 
           <mo>
             ⋅ 
           </mo> 
           <msup> 
            <mi>
              B 
            </mi> 
            <mrow> 
             <mrow> 
              <mo>
                ( 
              </mo> 
              <mrow> 
               <mi>
                 i 
               </mi> 
               <mo>
                 , 
               </mo> 
               <mi>
                 j 
               </mi> 
              </mrow> 
              <mo>
                ) 
              </mo> 
             </mrow> 
            </mrow> 
           </msup> 
          </mrow> 
          <mrow> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               i 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              H 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               j 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              W 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msup> 
            <mi>
              A 
            </mi> 
            <mrow> 
             <mrow> 
              <mo>
                ( 
              </mo> 
              <mrow> 
               <mi>
                 i 
               </mi> 
               <mo>
                 , 
               </mo> 
               <mi>
                 j 
               </mi> 
              </mrow> 
              <mo>
                ) 
              </mo> 
             </mrow> 
            </mrow> 
           </msup> 
           <mo>
             + 
           </mo> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               i 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              H 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msubsup> 
            <mstyle mathsize="140%" displaystyle="true"> 
             <mo>
               ∑ 
             </mo> 
            </mstyle> 
            <mrow> 
             <mi>
               j 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              W 
            </mi> 
           </msubsup> 
           <mtext>
               
           </mtext> 
           <msup> 
            <mi>
              B 
            </mi> 
            <mrow> 
             <mrow> 
              <mo>
                ( 
              </mo> 
              <mrow> 
               <mi>
                 i 
               </mi> 
               <mo>
                 , 
               </mo> 
               <mi>
                 j 
               </mi> 
              </mrow> 
              <mo>
                ) 
              </mo> 
             </mrow> 
            </mrow> 
           </msup> 
          </mrow> 
         </mfrac> 
        </mtd> 
       </mtr> 
      </mtable> 
     </math></p>
    <p>where 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        H 
      </mi> 
     </math> and 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        W 
      </mi> 
     </math> are the spatial dimensions of the masks.</p>
    <p>The learnable parameter 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> balances the influence of confidence versus consistency and is constrained using the sigmoid function:</p>
    <p>
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mi>
         λ 
       </mi> 
       <mo>
         = 
       </mo> 
       <mi>
         σ 
       </mi> 
       <mrow> 
        <mo>
          ( 
        </mo> 
        <mrow> 
         <msub> 
          <mi>
            λ 
          </mi> 
          <mrow> 
           <mtext>
             raw 
           </mtext> 
          </mrow> 
         </msub> 
        </mrow> 
        <mo>
          ) 
        </mo> 
       </mrow> 
       <mo>
         = 
       </mo> 
       <mfrac> 
        <mn>
          1 
        </mn> 
        <mrow> 
         <mn>
           1 
         </mn> 
         <mo>
           + 
         </mo> 
         <mtext>
           exp 
         </mtext> 
         <mrow> 
          <mo>
            ( 
          </mo> 
          <mrow> 
           <mo>
             − 
           </mo> 
           <msub> 
            <mi>
              λ 
            </mi> 
            <mrow> 
             <mtext>
               raw 
             </mtext> 
            </mrow> 
           </msub> 
          </mrow> 
          <mo>
            ) 
          </mo> 
         </mrow> 
        </mrow> 
       </mfrac> 
      </mrow> 
     </math></p>
    <p>where 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi>
          λ 
        </mi> 
        <mrow> 
         <mtext>
           raw 
         </mtext> 
        </mrow> 
       </msub> 
      </mrow> 
     </math> is an unbounded parameter optimized during training.</p>
    <p>Validation-Based 
     <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> Optimization:</p>
    <p>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>The fusion parameter 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> is then optimized on a separate validation set 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi mathvariant="script">
          D 
        </mi> 
        <mrow> 
         <mtext>
           val 
         </mtext> 
        </mrow> 
       </msub> 
      </mrow> 
     </math>, with the ensemble parameters frozen. For a candidate value of 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math>, a forward pass is performed on 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi mathvariant="script">
          D 
        </mi> 
        <mrow> 
         <mtext>
           val 
         </mtext> 
        </mrow> 
       </msub> 
      </mrow> 
     </math> to compute the combined prediction 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mover accent="true"> 
       <mi>
         y 
       </mi> 
       <mo>
         ^ 
       </mo> 
      </mover> 
     </math> using the fusion algorithm (Algorithm 2). The value of 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> that maximizes the chosen performance metric (e.g., accuracy) on 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi mathvariant="script">
          D 
        </mi> 
        <mrow> 
         <mtext>
           val 
         </mtext> 
        </mrow> 
       </msub> 
      </mrow> 
     </math> is selected. This can be formulated as:</p>
    <p>
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msup> 
        <mi>
          λ 
        </mi> 
        <mtext>
          * 
        </mtext> 
       </msup> 
       <mo>
         = 
       </mo> 
       <munder> 
        <mrow> 
         <mi>
           arg 
         </mi> 
         <mi>
           max 
         </mi> 
        </mrow> 
        <mrow> 
         <mi>
           λ 
         </mi> 
         <mo>
           ∈ 
         </mo> 
         <mrow> 
          <mo>
            [ 
          </mo> 
          <mrow> 
           <mn>
             0 
           </mn> 
           <mo>
             , 
           </mo> 
           <mn>
             1 
           </mn> 
          </mrow> 
          <mo>
            ] 
          </mo> 
         </mrow> 
        </mrow> 
       </munder> 
       <mi>
         ℳ 
       </mi> 
       <mrow> 
        <mo>
          ( 
        </mo> 
        <mrow> 
         <mi>
           ℱ 
         </mi> 
         <mrow> 
          <mo>
            ( 
          </mo> 
          <mrow> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              X 
            </mi> 
           </mstyle> 
           <mo>
             ; 
           </mo> 
           <mrow> 
            <mo>
              { 
            </mo> 
            <mrow> 
             <msub> 
              <mi>
                θ 
              </mi> 
              <mrow> 
               <msub> 
                <mi>
                  M 
                </mi> 
                <mi>
                  i 
                </mi> 
               </msub> 
              </mrow> 
             </msub> 
            </mrow> 
            <mo>
              } 
            </mo> 
           </mrow> 
           <mo>
             , 
           </mo> 
           <mi>
             λ 
           </mi> 
          </mrow> 
          <mo>
            ) 
          </mo> 
         </mrow> 
         <mo>
           , 
         </mo> 
         <mstyle mathvariant="bold" mathsize="normal"> 
          <mi>
            y 
          </mi> 
         </mstyle> 
        </mrow> 
        <mo>
          ) 
        </mo> 
       </mrow> 
      </mrow> 
     </math> (1)</p>
    <p>where 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        ℳ 
      </mi> 
     </math> is the performance metric and 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        ℱ 
      </mi> 
     </math> represents the FOCUS-Net fusion function. In our experiments, 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msup> 
        <mi>
          λ 
        </mi> 
        <mtext>
          * 
        </mtext> 
       </msup> 
      </mrow> 
     </math> is efficiently found via a grid search over the interval 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mrow> 
        <mo>
          [ 
        </mo> 
        <mrow> 
         <mn>
           0 
         </mn> 
         <mo>
           , 
         </mo> 
         <mn>
           1 
         </mn> 
        </mrow> 
        <mo>
          ] 
        </mo> 
       </mrow> 
      </mrow> 
     </math>.</p>
   </sec>
   <sec id="s3_3">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>3.3. Model Architectures</title>
    <p>To ensure reproducibility and clarity, this section details the specific architectural choices for both the CNN denoiser and the classification ensemble models.</p>
    <p>The denoising CNN follows a modified 3D U-Net architecture <xref ref-type="bibr" rid="scirp.146013-12">
      [12]
     </xref>, chosen for its effectiveness in image-to-image tasks and ability to capture multi-scale contextual information. The network is designed to process 3D MRI patches of size 112 × 112 × 80 voxels. The specific configuration is as follows:</p>
    <p>• Encoder Path: Consists of four downsampling blocks. Each block comprises two 3 × 3 × 3 convolutional layers with ReLU activation, followed by instance normalization and a 2 × 2 × 2 max-pooling layer (stride = 2) for downsampling. The number of filters doubles at each step, starting from 64 and increasing to 512 in the bottleneck.</p>
    <p>• Bottleneck: Features are processed by two 3 × 3 × 3 convolutional layers with 512 filters.</p>
    <p>• Decoder Path: Consists of four upsampling blocks. Each block begins with a transposed convolution (kernel = 2 × 2 × 2, stride = 2) for upsampling, followed by concatenation with the corresponding encoder feature map (skip connections), and two 3 × 3 × 3 convolutional layers with ReLU and instance normalization. The number of filters halves at each step, decreasing from 512 to 64.</p>
    <p>• Final Layer: A 1 × 1 × 1 convolution with a linear activation function produces the final residual output. The network is trained to predict the residual noise, i.e., 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mstyle mathvariant="bold" mathsize="normal"> 
         <mi>
           I 
         </mi> 
        </mstyle> 
        <mrow> 
         <mtext>
           denoised 
         </mtext> 
        </mrow> 
       </msub> 
       <mo>
         = 
       </mo> 
       <msub> 
        <mstyle mathvariant="bold" mathsize="normal"> 
         <mi>
           I 
         </mi> 
        </mstyle> 
        <mrow> 
         <mtext>
           filtered 
         </mtext> 
        </mrow> 
       </msub> 
       <mo>
         + 
       </mo> 
       <mi>
         f 
       </mi> 
       <mrow> 
        <mo>
          ( 
        </mo> 
        <mrow> 
         <msub> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             I 
           </mi> 
          </mstyle> 
          <mrow> 
           <mtext>
             filtered 
           </mtext> 
          </mrow> 
         </msub> 
         <mo>
           ; 
         </mo> 
         <msub> 
          <mi>
            θ 
          </mi> 
          <mrow> 
           <mtext>
             CNN 
           </mtext> 
          </mrow> 
         </msub> 
        </mrow> 
        <mo>
          ) 
        </mo> 
       </mrow> 
      </mrow> 
     </math>.</p>
    <p>The ensemble 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        ℳ 
      </mi> 
     </math> comprises three distinct 3D CNN architectures, chosen to provide diverse feature representations and decision boundaries. All models are configured for multi-class classification into 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mi>
         C 
       </mi> 
       <mo>
         = 
       </mo> 
       <mn>
         3 
       </mn> 
      </mrow> 
     </math> classes (CN, MCI, AD).</p>
    <p>• 3D ResNet-18: We adopt the standard 3D ResNet-18 architecture <xref ref-type="bibr" rid="scirp.146013-18">
      [18]
     </xref>, which utilizes residual blocks with skip connections to facilitate the training of deeper networks. The model uses 3 × 3 × 3 convolutions throughout. The final fully connected layer is modified to output 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        C 
      </mi> 
     </math> logits. This model provides a strong baseline with proven performance on volumetric medical data.</p>
    <p>• 3D DenseNet-121: We utilize a 3D version of the DenseNet-121 architecture <xref ref-type="bibr" rid="scirp.146013-19">
      [19]
     </xref>. Its dense connectivity pattern, where each layer is connected to every other layer in a feed-forward manner, encourages feature reuse and mitigates the vanishing gradient problem. The growth rate is set to 32. This model offers a high parameter efficiency and a rich gradient flow.</p>
    <p>• Custom Lightweight 3D CNN: To provide a simpler, less complex perspective on the data, we include a custom-designed lightweight network. It consists of four convolutional blocks, each with a 3 × 3 × 3 convolution, ReLU, instance normalization, and a 2 × 2 × 2 max-pooling layer (filters: 64, 128, 256, 512), followed by two fully connected layers (512 and 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        C 
      </mi> 
     </math> units). This model helps prevent the ensemble from being over-reliant on highly complex models and offers computational benefits.</p>
    <p>For all classification models, self-extracted attention masks ( 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi>
          A 
        </mi> 
        <mi>
          k 
        </mi> 
       </msub> 
      </mrow> 
     </math>) are generated using the Grad-CAM++ <xref ref-type="bibr" rid="scirp.146013-20">
      [20]
     </xref> technique, which provides more precise visual explanations by leveraging weighted combinations of feature maps.</p>
   </sec>
   <sec id="s3_4">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>3.4. End-to-End AD Diagnosis Pipeline</title>
    <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
     <tr> 
      <td class="custom-bottom-td custom-top-td aleft" width="100.00%"><p style="text-align:left">Algorithm 3 FOCUS-Net: End-to-End AD Diagnosis Pipeline</p></td> 
     </tr> 
     <tr> 
      <td class="custom-top-td aleft" width="100.00%"><p style="text-align:left">Require:</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">1: Input: Raw MRI scan 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mi>
             I 
           </mi> 
           <mrow> 
            <mtext>
              raw 
            </mtext> 
           </mrow> 
          </msub> 
         </mrow> 
        </math>, Trained ensemble 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            ℳ 
          </mi> 
          <mo>
            = 
          </mo> 
          <mrow> 
           <mo>
             { 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               M 
             </mi> 
             <mn>
               1 
             </mn> 
            </msub> 
            <mo>
              , 
            </mo> 
            <msub> 
             <mi>
               M 
             </mi> 
             <mn>
               2 
             </mn> 
            </msub> 
            <mo>
              , 
            </mo> 
            <mo>
              … 
            </mo> 
            <mo>
              , 
            </mo> 
            <msub> 
             <mi>
               M 
             </mi> 
             <mi>
               K 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             } 
           </mo> 
          </mrow> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">Ensure:</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">2: Output: Final predicted class 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mover accent="true"> 
          <mi>
            y 
          </mi> 
          <mo>
            ^ 
          </mo> 
         </mover> 
        </math>, final probability vector 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
          </msub> 
         </mrow> 
        </math>.</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">3: procedure FULLPIPELINE ( 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mi>
             I 
           </mi> 
           <mrow> 
            <mtext>
              raw 
            </mtext> 
           </mrow> 
          </msub> 
          <mo>
            , 
          </mo> 
          <mi>
            ℳ 
          </mi> 
         </mrow> 
        </math>)</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">4: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mi>
             I 
           </mi> 
           <mrow> 
            <mtext>
              clean 
            </mtext> 
           </mrow> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mtext>
            HybridDenoise 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               I 
             </mi> 
             <mrow> 
              <mtext>
                raw 
              </mtext> 
             </mrow> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Apply Algorithm 1</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">5: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             p 
           </mi> 
          </mstyle> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mo>
             [ 
           </mo> 
           <mrow> 
            <mo> 
            </mo> 
            <mo> 
            </mo> 
           </mrow> 
           <mo>
             ] 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Initialize list for probability vectors</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">6: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             a 
           </mi> 
          </mstyle> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mo>
             [ 
           </mo> 
           <mrow> 
            <mo> 
            </mo> 
            <mo> 
            </mo> 
           </mrow> 
           <mo>
             ] 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Initialize list for feature activations</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">7: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             A 
           </mi> 
          </mstyle> 
          <mo>
            ← 
          </mo> 
          <mrow> 
           <mo>
             [ 
           </mo> 
           <mrow> 
            <mo> 
            </mo> 
            <mo> 
            </mo> 
           </mrow> 
           <mo>
             ] 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Initialize list for attention masks</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">8: for 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            k 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mn>
            1 
          </mn> 
         </mrow> 
        </math> to K do ▷ Loop over each model in the ensemble</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">9: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              a 
            </mi> 
           </mstyle> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <msub> 
           <mi>
             M 
           </mi> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               I 
             </mi> 
             <mrow> 
              <mtext>
                clean 
              </mtext> 
             </mrow> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Forward pass to get feature activations</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">10: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              p 
            </mi> 
           </mstyle> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mtext>
            softmax 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mstyle mathvariant="bold" mathsize="normal"> 
              <mi>
                a 
              </mi> 
             </mstyle> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Compute class probabilities</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">11: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mi>
             A 
           </mi> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mtext>
            get_attention_mask 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               M 
             </mi> 
             <mi>
               k 
             </mi> 
            </msub> 
            <mo>
              , 
            </mo> 
            <msub> 
             <mi>
               I 
             </mi> 
             <mrow> 
              <mtext>
                clean 
              </mtext> 
             </mrow> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Extract spatial attention mask (e.g., Grad-CAM)</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">12: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             p 
           </mi> 
          </mstyle> 
          <mo>
            . 
          </mo> 
          <mtext>
            append 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mstyle mathvariant="bold" mathsize="normal"> 
              <mi>
                p 
              </mi> 
             </mstyle> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">13: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             a 
           </mi> 
          </mstyle> 
          <mo>
            . 
          </mo> 
          <mtext>
            append 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mstyle mathvariant="bold" mathsize="normal"> 
              <mi>
                a 
              </mi> 
             </mstyle> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">14: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mstyle mathvariant="bold" mathsize="normal"> 
           <mi>
             A 
           </mi> 
          </mstyle> 
          <mo>
            . 
          </mo> 
          <mtext>
            append 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <msub> 
             <mi>
               A 
             </mi> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">15: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              conf 
            </mtext> 
           </mrow> 
          </msup> 
          <mo>
            ← 
          </mo> 
          <mtext>
            CalculateConfidence 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              p 
            </mi> 
           </mstyle> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Alg. 2, Line 1</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">16: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msup> 
          <mo>
            ← 
          </mo> 
          <mtext>
            CalculateConsistency 
          </mtext> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              A 
            </mi> 
           </mstyle> 
           <mo>
             ) 
           </mo> 
          </mrow> 
         </mrow> 
        </math> ▷ Alg. 2, Line 14</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">17: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            α 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mi>
            λ 
          </mi> 
          <mo>
            ⋅ 
          </mo> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              conf 
            </mtext> 
           </mrow> 
          </msup> 
         </mrow> 
        </math> ▷ Fuse weights using learnable parameter λ</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">18: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            α 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mi>
            α 
          </mi> 
          <mo>
            + 
          </mo> 
          <mrow> 
           <mo>
             ( 
           </mo> 
           <mrow> 
            <mn>
              1 
            </mn> 
            <mo>
              − 
            </mo> 
            <mi>
              λ 
            </mi> 
           </mrow> 
           <mo>
             ) 
           </mo> 
          </mrow> 
          <mo>
            ⋅ 
          </mo> 
          <msup> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              w 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              consist 
            </mtext> 
           </mrow> 
          </msup> 
         </mrow> 
        </math></p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">19: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            α 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mi>
            α 
          </mi> 
          <mo>
            / 
          </mo> 
          <mstyle displaystyle="true"> 
           <msubsup> 
            <mo>
              ∑ 
            </mo> 
            <mrow> 
             <mi>
               k 
             </mi> 
             <mo>
               = 
             </mo> 
             <mn>
               1 
             </mn> 
            </mrow> 
            <mi>
              K 
            </mi> 
           </msubsup> 
           <mrow> 
            <msub> 
             <mi>
               α 
             </mi> 
             <mi>
               k 
             </mi> 
            </msub> 
           </mrow> 
          </mstyle> 
         </mrow> 
        </math> ▷ Normalize fusion weights to sum to 1</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">20: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <mn>
            0 
          </mn> 
         </mrow> 
        </math> ▷ Initialize final probability vector</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">21: for 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mi>
            k 
          </mi> 
          <mo>
            ← 
          </mo> 
          <mn>
            1 
          </mn> 
         </mrow> 
        </math> to K do</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">22: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
          </msub> 
          <mo>
            ← 
          </mo> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
          </msub> 
          <mo>
            + 
          </mo> 
          <msub> 
           <mi>
             α 
           </mi> 
           <mi>
             k 
           </mi> 
          </msub> 
          <mo>
            ⋅ 
          </mo> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              p 
            </mi> 
           </mstyle> 
           <mi>
             k 
           </mi> 
          </msub> 
         </mrow> 
        </math> ▷ Accumulate weighted predictions</p></td> 
     </tr> 
     <tr> 
      <td class="aleft" width="100.00%"><p style="text-align:left">23: 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <mover accent="true"> 
           <mi>
             y 
           </mi> 
           <mo>
             ^ 
           </mo> 
          </mover> 
          <mo>
            ← 
          </mo> 
          <mi>
            arg 
          </mi> 
          <msub> 
           <mrow> 
            <mi>
              max 
            </mi> 
           </mrow> 
           <mi>
             c 
           </mi> 
          </msub> 
          <msubsup> 
           <mstyle mathsize="normal" mathvariant="bold"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
           <mrow> 
            <mrow> 
             <mo>
               ( 
             </mo> 
             <mi>
               c 
             </mi> 
             <mo>
               ) 
             </mo> 
            </mrow> 
           </mrow> 
          </msubsup> 
         </mrow> 
        </math> ▷ Select class with highest probability</p></td> 
     </tr> 
     <tr> 
      <td class="custom-bottom-td aleft" width="100.00%"><p style="text-align:left">24: return 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mover accent="true"> 
          <mi>
            y 
          </mi> 
          <mo>
            ^ 
          </mo> 
         </mover> 
        </math>, 
        <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
          <msub> 
           <mstyle mathvariant="bold" mathsize="normal"> 
            <mi>
              P 
            </mi> 
           </mstyle> 
           <mrow> 
            <mtext>
              final 
            </mtext> 
           </mrow> 
          </msub> 
         </mrow> 
        </math></p></td> 
     </tr> 
    </table>
    <p>This algorithm takes a denoised image and an ensemble of trained classification models to produce a final, robust prediction.</p>
    <p>The complete FOCUS-Net framework integrates the proposed components into a unified end-to-end pipeline for Alzheimer’s Disease (AD) diagnosis. Starting with raw MRI scans, the pipeline first applies the hybrid denoising module to suppress noise while preserving anatomical integrity. The cleaned image is then passed through an ensemble of trained classification models, each producing both class probabilities and attention maps. These outputs are combined using the helper functions for confidence and consistency weighting, ensuring that models with both reliable predictions and strong semantic agreement contribute more heavily to the decision. Finally, a weighted fusion mechanism aggregates the results into a single, robust probability vector, from which the final diagnostic prediction is derived. This structured flow ensures that the pipeline leverages complementary strengths of denoising, ensemble learning, and explainable fusion for accurate and trustworthy AD detection.</p>
   </sec>
  </sec><sec id="s4">
   <title>
    <xref ref-type="bibr" rid="scirp.146013-"></xref>4. Preliminary Experiments &amp; Results</title>
   <p>To validate the conceptual framework of FOCUS-Net, we conducted a series of preliminary experiments on a publicly available dataset. The primary goal was to empirically demonstrate the advantage of our hybrid confidence- and consistency-weighted fusion strategy over common baseline methods.</p>
   <sec id="s4_1">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>4.1. Implementation Details</title>
    <p>The proposed FOCUS-Net framework was implemented in PyTorch, integrating a hybrid denoising module with an optimized classification ensemble. The implementation details are as follows:</p>
    <p>Data Curation and Preprocessing</p>
    <p>The framework was trained and evaluated on a curated subset of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. The dataset comprised 6420 T1-weighted brain MRI images across three classes: Non-Demented, Mild Demented, and Moderate Demented. All images underwent a standardized preprocessing pipeline, including skull-stripping, linear registration to the MNI152 standard space, intensity normalization, and cropping to a final size of 112 × 112 × 80 voxels to focus on the brain region and reduce computational load. To address class imbalance and improve generalization, class-balanced data augmentation techniques (including rotations, flips, and intensity variations) were employed during training.</p>
    <p>Hybrid Denoising Module</p>
    <p>The denoising pipeline synergistically combines traditional filters with a deep convolutional neural network (CNN). Traditional filters—including Wavelet, Gaussian, Anisotropic Diffusion, and Non-Local Means (NLM)—were first applied to suppress noise while preserving anatomical edges. The output from these filters was then refined by a custom 3D U-Net CNN architecture, trained to remove residual artifacts and noise in a data-driven manner. This hybrid approach leverages the structural preservation of traditional filters and the high-fidelity denoising capability of CNNs. Performance was quantified using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Mean Squared Error (MSE).</p>
    <p>Optimized Classification Ensemble</p>
    <p>The cleaned images were processed by a diverse ensemble of 3D CNN architectures: EfficientNetB0, ResNet-50, and a custom lightweight CNN. This selection was made to provide diverse feature representations and decision boundaries. The ensemble was optimized with advanced training strategies, including dropout regularization, early stopping, and adaptive learning rates to ensure high generalization and mitigate overfitting. Each model incorporated feature attention mechanisms (Grad-CAM++) to focus on the most discriminative regions in the MRI scans, such as the hippocampus and medial temporal lobe.</p>
    <p>Fusion and Training Strategy</p>
    <p>Predictions from the ensemble models were aggregated using a novel confidence- and consistency-weighted fusion algorithm, governed by a learnable parameter 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math>, rather than simple averaging. The ensemble models were first pre-trained independently until convergence. Subsequently, with their weights frozen, the fusion parameter 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> was optimized on a separate validation set via grid search to balance the influence of predictive confidence (entropy) and spatial consistency (Dice similarity of attention masks).</p>
    <p>The entire framework was trained for 100 epochs using the Adam optimizer with a learning rate of 1 × 10<sup>−</sup><sup>4</sup> and cross-entropy loss. The implementation demonstrates that the integration of advanced denoising with an optimized, explainable ensemble creates a robust pipeline for accurate multi-class dementia diagnosis.</p>
    <p>Evaluation Metrics: Performance was evaluated on the held-out test set using Accuracy, Precision, Recall, F1-Score, and the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) for the multi-class task.</p>
   </sec>
   <sec id="s4_2">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>4.2. Results and Analysis</title>
    <table-wrap id="table2">
     <label>
      <xref ref-type="table" rid="table2">
       Table 2
      </xref></label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.146013-"></xref>Table 2. Comparative performance analysis of FOCUS-Net and its components on the ADNI test set (Best results are highlighted in bold).</title>
     </caption>
     <table class="MsoTableGrid custom-table" border="0" cellspacing="0" cellpadding="0"> 
      <tr> 
       <td class="custom-bottom-td custom-top-td acenter" width="44.12%"><p style="text-align:center">Method</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="10.29%"><p style="text-align:center">Accuracy</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="10.29%"><p style="text-align:center">Precision</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="7.36%"><p style="text-align:center">Recall</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="8.83%"><p style="text-align:center">F1-Score</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="10.29%"><p style="text-align:center">AUC-ROC</p></td> 
       <td class="custom-bottom-td custom-top-td acenter" width="8.82%"><p style="text-align:center">PSNR (dB)</p></td> 
      </tr> 
      <tr> 
       <td class="custom-top-td acenter" width="100.00%" colspan="7"><p style="text-align:center">Baselines</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">Raw Images + Single Model (ResNet-18)</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.800</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.811</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.803</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.805</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.933</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">18.2</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">Raw Images + Averaging Ensemble</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.844</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.847</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.849</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.846</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.951</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">18.2</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">Wavelet Filter Only + Ensemble</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.851</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.853</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.855</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.852</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.954</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">26.5</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">NLM Filter Only + Ensemble</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.849</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.851</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.848</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.849</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.953</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">25.8</p></td> 
      </tr> 
      <tr> 
       <td class="custom-bottom-td aleft" width="44.12%"><p style="text-align:left">CNN Denoiser Only + Ensemble</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.863</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.865</p></td> 
       <td class="custom-bottom-td acenter" width="7.36%"><p style="text-align:center">0.861</p></td> 
       <td class="custom-bottom-td acenter" width="8.83%"><p style="text-align:center">0.862</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.960</p></td> 
       <td class="custom-bottom-td acenter" width="8.82%"><p style="text-align:center">28.1</p></td> 
      </tr> 
      <tr> 
       <td class="custom-top-td acenter" width="100.00%" colspan="7"><p style="text-align:center">Ablation Study</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">FOCUS-Denoise + Averaging Ensemble</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.872</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.874</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.870</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.871</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.965</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">30.4</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">Raw Images + Confidence-Only Fusion ( 
         <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
           <mi>
             λ 
           </mi> 
           <mo>
             = 
           </mo> 
           <mn>
             1 
           </mn> 
          </mrow> 
         </math>)</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.867</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.869</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.866</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.867</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.962</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">18.2</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">Raw Images + Consistency-Only Fusion ( 
         <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
           <mi>
             λ 
           </mi> 
           <mo>
             = 
           </mo> 
           <mn>
             0 
           </mn> 
          </mrow> 
         </math>)</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.844</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.858</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.847</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.850</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.953</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">18.2</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">FOCUS-Denoise + Confidence-Only Fusion ( 
         <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
           <mi>
             λ 
           </mi> 
           <mo>
             = 
           </mo> 
           <mn>
             1 
           </mn> 
          </mrow> 
         </math>)</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.882</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.884</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.880</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.881</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.970</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">30.4</p></td> 
      </tr> 
      <tr> 
       <td class="aleft" width="44.12%"><p style="text-align:left">FOCUS-Denoise + Consistency-Only Fusion ( 
         <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
           <mi>
             λ 
           </mi> 
           <mo>
             = 
           </mo> 
           <mn>
             0 
           </mn> 
          </mrow> 
         </math>)</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.875</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.877</p></td> 
       <td class="acenter" width="7.36%"><p style="text-align:center">0.873</p></td> 
       <td class="acenter" width="8.83%"><p style="text-align:center">0.874</p></td> 
       <td class="acenter" width="10.29%"><p style="text-align:center">0.967</p></td> 
       <td class="acenter" width="8.82%"><p style="text-align:center">30.4</p></td> 
      </tr> 
      <tr> 
       <td class="acenter" width="100.00%" colspan="7"><p style="text-align:center">Proposed Method</p></td> 
      </tr> 
      <tr> 
       <td class="custom-bottom-td aleft" width="44.12%"><p style="text-align:left">FOCUS-Net (Ours, 
         <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
           <mi>
             λ 
           </mi> 
           <mo>
             = 
           </mo> 
           <mn>
             0.7 
           </mn> 
          </mrow> 
         </math>)</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.889</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.891</p></td> 
       <td class="custom-bottom-td acenter" width="7.36%"><p style="text-align:center">0.888</p></td> 
       <td class="custom-bottom-td acenter" width="8.83%"><p style="text-align:center">0.889</p></td> 
       <td class="custom-bottom-td acenter" width="10.29%"><p style="text-align:center">0.975</p></td> 
       <td class="custom-bottom-td acenter" width="8.82%"><p style="text-align:center">30.4</p></td> 
      </tr> 
     </table>
    </table-wrap>
    <fig id="fig1" position="float">
     <label>Figure 1</label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.146013-"></xref>Figure 1. Performance comparison bar plot.</title>
     </caption>
     <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/2440287-rId175.jpeg?20250930102415" />
    </fig>
    <fig id="fig2" position="float">
     <label>Figure 2</label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.146013-"></xref>Figure 2. Relationship between denoising quality and diagnostic performance.</title>
     </caption>
     <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/2440287-rId176.jpeg?20250930102415" />
    </fig>
    <fig id="fig3" position="float">
     <label>Figure 3</label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.146013-"></xref>Figure 3. ROC curves for key models (multi-class).</title>
     </caption>
     <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/2440287-rId177.jpeg?20250930102415" />
    </fig>
    <p>The results presented in <xref ref-type="table" rid="table2">
      Table 2
     </xref> and <xref ref-type="fig" rid="figFigures 1-4">
      Figures 1-4
     </xref> demonstrate the progressive improvement achieved by each component of the FOCUS-Net framework. Beginning with a single ResNet-18 model on raw images (Accuracy: 0.800), we observe that simply employing an averaging ensemble provides a significant boost (Accuracy: 0.844), confirming the value of model diversity. The integration of denoising techniques further enhances performance, with the standalone CNN denoiser (Accuracy: 0.863) outperforming traditional filters like Wavelet (Accuracy: 0.851), highlighting the superiority of learned denoising approaches. The proposed hybrid FOCUS-Denoise module, combining traditional filters with a CNN, achieved the highest PSNR (30.4 dB) and, when paired with a simple averaging ensemble, yielded an accuracy of 0.872. This underscores the critical role of high-quality input data. The novel fusion strategy provides additional gains: the Confidence-Only weighting ( 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mi>
         λ 
       </mi> 
       <mo>
         = 
       </mo> 
       <mn>
         1 
       </mn> 
      </mrow> 
     </math>) leverages predictive uncertainty to reach an accuracy of 0.882. Ultimately, the complete FOCUS-Net framework, which optimally balances confidence and consistency ( 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mi>
         λ 
       </mi> 
       <mo>
         = 
       </mo> 
       <mn>
         0.7 
       </mn> 
      </mrow> 
     </math>), achieves the highest performance across all metrics (Accuracy: 0.889, AUC: 0.975). The positive correlation between denoising quality (PSNR) and diagnostic accuracy, visualized in <xref ref-type="fig" rid="fig2">
      Figure 2
     </xref>, further validates our approach. The ROC curves <xref ref-type="fig" rid="fig3">
      Figure 3
     </xref> confirm that FOCUS-Net dominates the top-left corner across all classification thresholds, demonstrating its robust discriminatory power essential for clinical application. The lambda optimization plot (<xref ref-type="fig" rid="fig4">
      Figure 4
     </xref>) justifies our parameter selection, showing a clear peak in performance at 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <mi>
         λ 
       </mi> 
       <mo>
         = 
       </mo> 
       <mn>
         0.7 
       </mn> 
      </mrow> 
     </math>. These results collectively affirm that the synergistic integration of hybrid denoising with intelligent, explainable model fusion creates a robust pipeline for accurate dementia diagnosis.</p>
    <fig id="fig4" position="float">
     <label>Figure 4</label>
     <caption>
      <title>
       <xref ref-type="bibr" rid="scirp.146013-"></xref>Figure 4. Effect of lambda parameter on FOCUS-net performance.</title>
     </caption>
     <graphic mimetype="image" position="float" xlink:type="simple" xlink:href="https://html.scirp.org/file/2440287-rId184.jpeg?20250930102415" />
    </fig>
   </sec>
  </sec><sec id="s5">
   <title>
    <xref ref-type="bibr" rid="scirp.146013-"></xref>5. Discussion and Conclusion</title>
   <p>This paper introduced FOCUS-Net, a novel end-to-end framework for robust Alzheimer’s Disease classification from MRI data. The proposed architecture synergistically integrates a hybrid denoising module with a confidence- and consistency-weighted ensemble fusion strategy, addressing two critical challenges in medical image analysis: data quality and model reliability.</p>
   <sec id="s5_1">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>5.1. Summary of Contributions</title>
    <p>The key contributions of this work are threefold:</p>
    <p>1) Hybrid Denoising for Enhanced Data Quality: We proposed a sequential denoising pipeline (Algorithm 3.1) that synergistically combines the strengths of multiple classical filters—Wavelet, Gaussian, Anisotropic Diffusion, and Non-Local Means (NLM)—with a learned component (a 3D U-Net CNN) for removing residual artifacts. Each traditional filter targets specific noise characteristics: Wavelet filtering excels in frequency-based noise separation, Gaussian smoothing reduces high-frequency noise, Anisotropic Diffusion preserves edges while suppressing noise, and NLM leverages non-local self-similarity. The subsequent CNN component is specifically trained to address the structured residual artifacts and subtle noise patterns that persist after this initial filtering stage. This preprocessing step is crucial for real-world clinical applicability, where MRI scans are often corrupted by complex, mixed-type noise and artifacts, yet comprehensive denoising is frequently overlooked in deep learning pipelines.</p>
    <p>2) Novel Confidence-Consistency Fusion Strategy: The core intellectual contribution lies in our fusion algorithm (Algorithms 3.4 and 3.2). It advances beyond simple averaging by dynamically weighting predictions from an ensemble of models based on two complementary signals of reliability:</p>
    <p>- Predictive Confidence: Quantified by the Shannon entropy of the prediction distribution, this metric prioritizes models that are certain in their classifications ( 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msub> 
        <mi>
          E 
        </mi> 
        <mi>
          k 
        </mi> 
       </msub> 
       <mo>
         ≈ 
       </mo> 
       <mn>
         0 
       </mn> 
      </mrow> 
     </math>), assigning them a higher weight 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msubsup> 
        <mstyle mathvariant="bold" mathsize="normal"> 
         <mi>
           w 
         </mi> 
        </mstyle> 
        <mi>
          k 
        </mi> 
        <mrow> 
         <mtext>
           conf 
         </mtext> 
        </mrow> 
       </msubsup> 
      </mrow> 
     </math>.</p>
    <p>- Spatial Consistency: Measured by the average Dice similarity of a model’s Grad-CAM attention mask with those of the ensemble, this metric promotes models that focus on anatomically plausible regions agreed upon by the collective, thereby mitigating the influence of outliers that are confidently wrong. This earns them a higher weight 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mrow> 
       <msubsup> 
        <mstyle mathvariant="bold" mathsize="normal"> 
         <mi>
           w 
         </mi> 
        </mstyle> 
        <mi>
          k 
        </mi> 
        <mrow> 
         <mtext>
           consist 
         </mtext> 
        </mrow> 
       </msubsup> 
      </mrow> 
     </math>.</p>
    <p>The learnable parameter 
     <math xmlns="http://www.w3.org/1998/Math/MathML"> <mi>
        λ 
      </mi> 
     </math> optimally balances these two objectives, a value determined empirically on a validation set.</p>
    <p>3) Improved Interpretability: The aggregated attention map A provides a visual explanation for the model’s decision, highlighting the brain regions deemed most salient by the consensus of the ensemble. This capability is paramount for building trust with clinicians and facilitating the integration of AI tools into diagnostic workflows.</p>
   </sec>
   <sec id="s5_2">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>5.2. Limitations and Future Work</title>
    <p>Despite the promising results demonstrated by FOCUS-Net, several limitations should be acknowledged alongside potential directions for future research.</p>
    <p>Computational Overhead: The generation of high-quality attention maps (e.g., via Grad-CAM) for each input sample across the entire ensemble introduces significant computational overhead during inference. This increased latency may present a practical constraint for real-time clinical applications where rapid analysis is paramount. Future work will investigate more efficient attention mechanisms and model distillation techniques to mitigate this computational burden.</p>
    <p>Assumption of Meaningful Consensus: Our fusion strategy relies on the fundamental assumption that spatial consensus among ensemble members corresponds to medically relevant features. However, there exists a potential risk that models could converge on consistent but erroneous salient regions, such as imaging artifacts or dataset-specific biases. The framework’s effectiveness is therefore contingent upon the validity and robustness of the individual models’ learned features. Future validation should include rigorous qualitative assessment by clinical experts to verify the anatomical and pathological relevance of the highlighted regions.</p>
    <p>Data Dependency: The performance of both the hybrid denoising module and the classification ensemble is inherently dependent on the quality and characteristics of the training data. The generalizability of our approach to MRI data acquired with different scanners, protocols, or from diverse patient populations requires further extensive validation. Subsequent research will explore advanced data augmentation, domain adaptation, and federated learning techniques to enhance the framework’s robustness across heterogeneous clinical settings.</p>
    <p>Addressing these limitations will be crucial for advancing the practical deployment and reliability of FOCUS-Net in real-world clinical environments.</p>
    <p>Our immediate future work will focus on the comprehensive empirical validation of FOCUS-Net. This will involve:</p>
    <p>• Large-Scale Validation: Rigorous testing on large, multi-center datasets like ADNI and AIBL to quantitatively benchmark performance against state-of-the-art baselines.</p>
    <p>• Ablation Studies: Systematically dissecting the framework to quantify the individual contribution of each component (denoising, confidence weighting, consistency weighting) to the overall performance gain.</p>
    <p>• Architectural Exploration: Investigating alternative attention mechanisms (e.g., self-attention, transformers) and different ensemble architectures to further boost performance and efficiency.</p>
    <p>• Multi-Modal Extension: A highly promising direction is to extend the fusion logic to incorporate multi-modal data, such as PET scans and CSF biomarkers. The confidence-consistency weighting principle could be adapted to balance the contributions of different data types within a unified diagnostic framework.</p>
   </sec>
   <sec id="s5_3">
    <title>
     <xref ref-type="bibr" rid="scirp.146013-"></xref>5.3. Conclusion</title>
    <p>In conclusion, FOCUS-Net presents a holistic and principled approach to automated AD diagnosis. By addressing data integrity through hybrid denoising and enhancing decision robustness through a novel fusion of confidence and spatial consensus, the framework moves beyond mere prediction accuracy towards developing a more reliable, interpretable, and ultimately clinically valuable tool for combating neurodegenerative disease.</p>
   </sec>
  </sec>
 </body><back>
  <ref-list>
   <title>References</title>
   <ref id="scirp.146013-ref1">
    <label>1</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Gan, C., You, J., Zhang, Y., Li, X., Huang, J. and Wang, K. (2024) The Prevalence and Incidence of Dementia: A Systematic Review and Meta-Analysis. Neurology Asia, 29, 15-30.
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref2">
    <label>2</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Nichols, E., D Steinmetz, J., Vollset, S.E., Fukutaki, K., Chalek, J., Abd-Allah, F., Abdoli, A., Abualhasan, A., Abu-Gharbieh, E., et al. (2024) Estimation of the Global Prevalence of Dementia in 2019 and Fore-Casted Prevalence in 2050: An Analysis for the Global Burden of Disease Study 2019. The Lancet Public Health, 9, e105-e125.
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref3">
    <label>3</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ou, Z., Liu, Y., Wang, Y., Tang, J., Lv, J., Zhang, H., et al. (2024) Global, Regional, and National Burden of Alzheimer’s Disease and Other Dementias, 1990-2021. Age and Ageing, 53, afae023.
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref4">
    <label>4</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Jack, C.R., Bernstein, M.A., Fox, N.C., Thompson, P., Alexander, G., Harvey, D., et al. (2010) The Alzheimer’s Disease Neuroimaging Initiative (ADNI): MRI Methods. Journal of Magnetic Resonance Imaging, 31, 685-691.
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref5">
    <label>5</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Litjens, G., Kooi, T., Bejnordi, B.E., Setio, A.A.A., Ciompi, F., Ghafoorian, M., et al. (2017) A Survey on Deep Learning in Medical Image Analysis. Medical Image Analysis, 42, 60-88. &gt;https://doi.org/10.1016/j.media.2017.07.005
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref6">
    <label>6</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Islam, J. and Zhang, Y. (2020) Brain MRI Analysis for Alzheimer’s Disease Diagnosis Using CNN-Based Deep Learning Methods. Knowledge-Based Systems, 5, 2.&gt;https://doi.org/10.1186/s40708-018-0080-3 
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref7">
    <label>7</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Korolev, S., Safiullin, A., Belyaev, M. and Dodonova, Y. (2017) Residual and Plain Convolutional Neural Networks for 3D Brain MRI Classification. 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, 18-21 April 2017, 835-838. &gt;https://doi.org/10.1109/isbi.2017.7950647
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref8">
    <label>8</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ocen, S., Muchemi, L. and Yohannis, M.A. (2025) Optimized CNN Ensemble with Class-Balanced MRI Data Augmentation for Accurate Multi-Class Dementia Diagnosis. Advances in Alzheimer’s Disease, 14, 53-76. &gt;https://doi.org/10.4236/aad.2025.143004
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref9">
    <label>9</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ocen, S., Yohannis, M.A. and Muchemi, L. (2024) Deep Learning for Neuroimaging-Based Brain Disorder Detection: Advancements and Future Perspectives. Advances in Alzheimer’s Disease, 13, 95-116. &gt;https://doi.org/10.4236/aad.2024.134007
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref10">
    <label>10</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Buades, A., Coll, B. and Morel, J.-M. (2005) A Non-Local Algorithm for Image Denoising. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, 20-25 June 2005, 60-65. 
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref11">
    <label>11</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Perona, P. and Malik, J. (1990) Scale-space and Edge Detection Using Anisotropic Diffusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12, 629-639. &gt;https://doi.org/10.1109/34.56205
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref12">
    <label>12</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ronneberger, O., Fischer, P. and Brox, T. (2015) U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds., Lecture Notes in Computer Science, Springer International Publishing, 234-241. &gt;https://doi.org/10.1007/978-3-319-24574-4_28
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref13">
    <label>13</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D. and Batra, D. (2017) Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, 22-29 October 2017, 618-626. &gt;https://doi.org/10.1109/iccv.2017.74
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref14">
    <label>14</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Breiman, L. (1996) Bagging Predictors. Machine Learning, 24, 123-140. &gt;https://doi.org/10.1023/a:1018054314350
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref15">
    <label>15</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Shafer, G. and Vovk, V. (2008) A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371-421.
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref16">
    <label>16</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Kendall, A. and Gal, Y. (2017) What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17), Long Beach, 4-9 December 2017, 5580-5590. &gt;https://proceedings.neurips.cc/paper_files/paper/2017/file/2650d6089a6d640c5e85b2b88265dc2b-Paper.pdf 
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref17">
    <label>17</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ocen, S., Muchemi, L. and Yohannis, M.A. (2025) Enhancing MRI Image Quality through Deep CNN-Augmented Denoising: A Comparative Study of Standard and Hybrid Filters. Neuroscience and Medicine, 16, 114-141. &gt;https://doi.org/10.4236/nm.2025.163013
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref18">
    <label>18</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Ebrahimi, A., Luo, S. and Chiong, R. (2020) Introducing Transfer Learning to 3D Resnet-18 for Alzheimer’s Disease Detection on MRI Images. 2020 35th International Conference on Image and Vision Computing New Zealand (IVCNZ), Wellington, 25-27 November 2020, 1-6. &gt;https://doi.org/10.1109/ivcnz51579.2020.9290616
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref19">
    <label>19</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Solano-Rojas, B., Villalón-Fonseca, R. and Marín-Raventós, G. (2020) Alzheimer’s Disease Early Detection Using a Low Cost Three-Dimensional Densenet-121 Architecture. In: Jmaiel, M., Mokhtari, M., Abdulrazak, B., Aloulou, H., Kallel, S., Eds., Lecture Notes in Computer Science, Springer International Publishing, 3-15. &gt;https://doi.org/10.1007/978-3-030-51517-1_1
    </mixed-citation>
   </ref>
   <ref id="scirp.146013-ref20">
    <label>20</label>
    <mixed-citation publication-type="other" xlink:type="simple">
     Lyu, L. (2025) Interpretability in Neural Information Retrieval. &gt;https://doi.org/10.4233/uuid:fbce75ab-4dca-432e-9388-475993c60105
    </mixed-citation>
   </ref>
  </ref-list>
 </back>
</article>