Evaluating Motion Artifact Removal in fNIRS: A Comprehensive Guide to Metrics and Methodologies

Jackson Simmons Dec 02, 2025 30

This article provides a systematic framework for evaluating motion artifact (MA) correction techniques in functional near-infrared spectroscopy (fNIRS).

Evaluating Motion Artifact Removal in fNIRS: A Comprehensive Guide to Metrics and Methodologies

Abstract

This article provides a systematic framework for evaluating motion artifact (MA) correction techniques in functional near-infrared spectroscopy (fNIRS). Aimed at researchers and professionals, it synthesizes current knowledge on MA characteristics, categorizes hardware and algorithmic removal strategies, and details key performance metrics for quantitative comparison. The content guides the selection of appropriate evaluation protocols, from foundational concepts to advanced validation, emphasizing robust and physiologically plausible assessment to enhance data quality in neuroscientific and clinical fNIRS applications.

Understanding Motion Artifacts: Origins, Types, and Impact on fNIRS Signal Quality

In functional near-infrared spectroscopy (fNIRS), the accurate interpretation of neurovascular data is fundamentally challenged by motion artifacts—extraneous signals that corrupt the true hemodynamic response. These artifacts originate from two primary sources: physical optode decoupling, which causes direct measurement disruption, and motion-induced systemic physiological noise, which introduces biologically-based confounding signals [1]. Understanding this distinction is critical for selecting appropriate correction strategies, as methods effective for one artifact type may perform poorly for the other. This guide systematically compares motion artifact correction techniques, providing experimental data and protocols to inform method selection for research and clinical applications.

Classification and Mechanisms of Motion Artifacts

Motion artifacts in fNIRS signals manifest in distinct morphological patterns, each with characteristic origins and properties. The table below categorizes the primary artifact types and their underlying mechanisms.

Table 1: Classification and Characteristics of Motion Artifacts in fNIRS

Artifact Type	Primary Cause	Key Characteristics	Impact on Signal
Spikes (Type A)	Sudden optode-skin decoupling from head jerks, speech	High amplitude, high frequency, short duration (≤1s) [2]	Sharp, transient signal deviation >50 SD from mean [2]
Peaks (Type B)	Sustained moderate movement	Moderate amplitude, medium duration (1-5s) [2]	Protracted deviation ~100 SD from mean [2]
Baseline Shifts	Slow optode displacement	Low frequency, long duration (5-30s) [2]	Signal drift ~300 SD from mean; slow recovery [2]
Low-Frequency Variations	Motion-induced systemic physiology (BP, HR changes)	Very slow oscillations (<0.1Hz) correlated with hemodynamic response [3] [4]	Mimics true hemodynamic response; task-synchronized [3]
Slow Baseline Shifts (Type D)	Major postural changes or prolonged decoupling	Very long duration (>30s), extreme amplitude [2]	Severe baseline disruption ~500 SD from mean [2]

Anatomical and Regional Vulnerability

The susceptibility to motion artifacts varies significantly across scalp regions. Research combining computer vision with fNIRS has demonstrated that:

The occipital and pre-occipital regions are particularly vulnerable to upward and downward head movements [5]
Temporal regions show heightened sensitivity to lateral movements (bend left, bend right, left, and right movements) [5]
Frontal regions are susceptible to artifacts from facial movements, including eyebrow raising and jaw motion during speech [3] [6]

This regional variability underscores the importance of considering both movement type and scalp location when designing experiments and implementing correction protocols.

Comparative Analysis of Motion Correction Techniques

Software-Based Correction Algorithms

Multiple algorithmic approaches have been developed to address motion artifacts in fNIRS data. The table below summarizes the performance characteristics of predominant methods based on comparative studies.

Table 2: Performance Comparison of Software-Based Motion Correction Techniques

Correction Method	Underlying Principle	Best For Artifact Type	Efficacy (Real Cognitive Data)	Efficacy (Pediatric Data)	Key Limitations
Wavelet Filtering [3]	Multi-scale decomposition & thresholding	Spikes, baseline shifts [6]	93% reduction in artifact area [3]	Superior outcomes [2]	Computationally intensive [7]
Moving Average [2]	Local smoothing	Spikes, gentle slopes	Not specifically reported	Superior outcomes [2]	May oversmooth valid signal
Spline Interpolation [3] [6]	Piecewise polynomial fitting	Spikes, baseline shifts [7]	Effective but less than wavelet [3]	Moderate outcomes [2]	Requires accurate artifact identification [7]
PCA-Based Methods [3]	Component separation & rejection	Global physiological noise [8]	Less effective for task-correlated artifacts [3]	Moderate outcomes [2]	Risk of cerebral signal removal
CBSI [3]	HbO/HbR anti-correlation	Low-frequency artifacts	Effective for specific artifact types [3]	Less effective [2]	Assumes specific HbO/HbR relationship
Kalman Filtering [3]	Recursive estimation	Slowly varying artifacts	Less effective than wavelet [3]	Not top performer [2]	Complex parameter tuning
tCCA-GLM [1]	Multimodal correlation	Systemic physiological noise	+45% correlation, -55% RMSE [1]	Not assessed	Requires multiple auxiliary signals

Hardware-Based and Hybrid Approaches

Hardware-based solutions incorporate additional sensors to directly measure motion for subsequent regression:

Accelerometer-Based Methods: Includes Active Noise Cancellation (ANC) and Accelerometer-Based Motion Artifact Removal (ABAMAR) [6]
Short-Separation Channels: Placed ~8mm from sources to preferentially capture superficial signals for regression from standard channels [8] [1]
Computer Vision Approaches: Using deep neural networks like SynergyNet to compute head orientation from video recordings [5]

Recent hybrid approaches combine multiple modalities. The BLISSA2RD framework integrates fNIRS with accelerometers and short-separation measurements using blind source separation, effectively addressing both direct optode decoupling and motion-induced physiological artifacts [1].

Emerging Learning-Based Techniques

Machine and deep learning methods represent the frontier of motion artifact correction:

Convolutional Neural Networks (CNNs): U-net architectures demonstrate superior HRF reconstruction compared to wavelet and autoregressive methods [9]
Denoising Autoencoders (DAE): Trained on synthetic data generated with autoregressive models, showing effective artifact removal on experimental data [9]
Fully Connected Neural Networks: Simplified residual architectures (sResFCNN) combined with traditional filtering [9]
Traditional Classifiers: SVM, KNN, and Linear Discriminant Analysis used to identify artifact-contaminated segments [9]

Experimental Protocols for Method Validation

Ground-Truth Movement Characterization

A rigorous protocol for validating motion correction techniques involves controlled head movements with simultaneous video recording:

Participant Preparation: 15 participants (age 22.27±2.62 years) with whole-head fNIRS montage [5]
Movement Protocol: Controlled head movements along three rotational axes (vertical, frontal, sagittal) at varying speeds (fast, slow) and types (half, full, repeated rotation) [5]
Video Recording & Computer Vision: Frame-by-frame analysis using SynergyNet deep neural network to compute head orientation angles [5]
Data Correlation: Maximal movement amplitude and speed extracted from orientation data correlated with spikes and baseline shifts in fNIRS signals [5]
Data Availability: Dataset and analytical scripts available at: https://gitlab.com/a.bizzego/computer-vision-fnirs [5]

Semi-Simulated Data Validation

For evaluating correction performance with known ground truth:

Resting-State Basis: Use real resting-state fNIRS data containing natural physiological noise [3]
HRF Addition: Add simulated hemodynamic response functions to resting data at known time points [3]
Performance Metrics: Calculate Mean-Squared Error (MSE) and Pearson's Correlation Coefficient between simulated and recovered HRF [3]
Motion Contamination: Introduce real motion artifacts from purposefully moved participants or artificial artifact models [3]

Performance Evaluation Metrics

Standardized metrics enable objective comparison of correction efficacy:

Signal Quality Metrics: ΔSignal-to-Noise Ratio (ΔSNR), Contrast-to-Noise Ratio (CNR) [9] [8]
Time-Domain Accuracy: Correlation (Corr), Root Mean Squared Error (RMSE) with known HRF [1]
Statistical Significance: F-Score, p-value of recovered HRF [1]
Classification Impact: Vigilance detection accuracy with/without correction [9]

Signaling Pathways and Artifact Origins

The diagram below illustrates the pathways through which motion generates artifacts in fNIRS signals, highlighting the distinction between direct optode decoupling and systemic physiological noise.

Table 3: Essential Research Tools for Motion Artifact Investigation

Tool/Category	Specific Examples	Research Application
Software Packages	Homer2/Homer3 [2] [7]	Comprehensive fNIRS processing with multiple MA correction algorithms
Analysis Toolboxes	fNIRSDAT (MATLAB-based) [2]	General Linear Model regression for individual and group-level analysis
Computer Vision Tools	SynergyNet deep neural network [5]	Frame-by-frame head orientation computation from video recordings
Auxiliary Sensors	Accelerometers, 3D motion capture, IMU [6]	Direct measurement of head motion for regression-based correction
Specialized Optodes	Short-separation channels (8-10mm) [8] [1]	Superficial signal regression to remove scalp contributions
Data Resources	Computer-vision fNIRS dataset [5]	Ground-truth movement data for algorithm validation
Performance Metrics	ΔSNR, CNR, F-Score, tCCA-GLM [9] [1]	Objective quantification of correction efficacy

The comparative analysis presented in this guide demonstrates that motion artifact correction in fNIRS requires careful method selection based on artifact type, experimental population, and research objectives. Key recommendations emerge:

For pediatric populations with high motion, moving average and wavelet methods yield superior outcomes [2]
For task-correlated low-frequency artifacts, wavelet filtering demonstrates particular efficacy [3]
When multiple artifact types are present, hybrid approaches combining spline interpolation with wavelet filtering may be optimal [7]
For systemic physiological noise contamination, advanced regression techniques like tCCA-GLM with short-separation channels provide significant improvement [1]
Emerging learning-based methods show promise for automated, data-driven correction but require further validation [9]

The field continues to evolve toward integrated solutions that address both direct optode decoupling and motion-induced physiological noise, with multimodal approaches leveraging auxiliary sensors and advanced computational methods showing particular promise for robust artifact correction in real-world research scenarios.

Functional near-infrared spectroscopy (fNIRS) has emerged as a preferred neuroimaging technique for studies requiring high ecological validity, allowing participants greater freedom of movement compared to traditional neuroimaging methods [5]. Despite its relative robustness against motion artifacts (MAs), fNIRS remains challenged by signal contamination from movement-induced disturbances that can compromise data integrity and interpretation [5] [10]. Effective management of these artifacts requires a fundamental understanding of their characteristic morphologies—categorized primarily as spikes, baseline shifts, and low-frequency oscillations [3]. Accurate characterization of these morphological subtypes provides the essential foundation for selecting appropriate correction algorithms and evaluating their efficacy, which is particularly crucial for advancing fNIRS applications in real-time neurofeedback and brain-computer interfaces [11].

The significance of motion artifact management extends across diverse fNIRS applications, from cognitive neuroscience and clinical neurology to motor rehabilitation and studies involving subject movements [9]. Artifacts induced by head movements, facial muscle activity, or jaw movements introduce noise that can obscure true hemodynamic responses, ultimately reducing the statistical power of studies and potentially leading to erroneous conclusions [3] [9]. This comparison guide systematically characterizes the primary motion artifact morphologies in fNIRS signals, provides experimental methodologies for their investigation, and evaluates the performance of leading correction approaches, with the broader aim of establishing a standardized framework for motion artifact removal evaluation metrics in fNIRS research.

Motion Artifact Morphologies: Characteristics and Origins

Motion artifacts in fNIRS signals manifest in distinct morphological patterns, each with unique characteristics and underlying physiological mechanisms. The classification into three primary categories—spikes, baseline shifts, and low-frequency variations—provides a framework for understanding artifact impact and selecting appropriate correction strategies [3].

Table 1: Comparative Characteristics of Motion Artifact Morphologies

Morphology Type	Frequency Content	Amplitude Profile	Primary Causes	Detection Difficulty
Spikes	High-frequency	High-amplitude, transient	Sudden head movements, quick optode displacement [3]	Low (easily detectable)
Baseline Shifts	Moderate-frequency	Sustained signal level change	Sustained head positioning, pressure changes at optode-skin interface [12]	Moderate
Low-Frequency Variations	Low-frequency	Slow drifts	Jaw movements, facial expressions, talking/eating [3] [12]	High (resemble hemodynamic response)

The regional susceptibility of fNIRS signals to motion artifacts varies significantly across the head. Recent research utilizing computer vision to characterize ground-truth movement information has demonstrated that repeated as well as upward and downward head movements particularly compromise fNIRS signal quality in the occipital and pre-occipital regions [5]. In contrast, temporal regions show greatest susceptibility to bend left, bend right, left, and right movements [5]. These findings underscore the importance of considering both movement type and scalp location when evaluating motion artifact morphologies in fNIRS studies.

Experimental Protocols for Motion Artifact Investigation

Controlled Movement Paradigms

Systematic investigation of motion artifact morphologies requires carefully designed experimental protocols that induce specific, controlled head movements. Bizzego et al. (2025) implemented a comprehensive approach where participants performed controlled head movements along three main rotational axes: vertical, frontal, and sagittal [5]. Movements were further categorized by speed (fast vs. slow) and type (half, full, or repeated rotation) to comprehensively characterize the association between specific movement parameters and resulting artifact morphologies [5].

Table 2: Experimental Movement Categorization for Motion Artifact Characterization

Movement Axis	Movement Types	Speed Variations	Data Collection Methods
Vertical	Nodding (upward/downward)	Fast vs. slow	Computer vision (SynergyNet DNN) [5]
Frontal	Bend left, bend right	Fast vs. slow	Video recording with frame-by-frame analysis [5]
Sagittal	Left, right rotations	Fast vs. slow	Head orientation angle computation [5]
Combined	Half, full, repeated rotations	Varied	fNIRS signal correlation with movement metrics [5]

Computer Vision Integration

A groundbreaking methodological advancement in motion artifact research involves the integration of computer vision techniques with fNIRS data collection. Experimental sessions are video recorded and analyzed frame-by-frame using deep neural networks such as SynergyNet to compute precise head orientation angles [5]. This approach enables researchers to extract maximal movement amplitude and speed from head orientation data while simultaneously identifying spikes and baseline shifts in the fNIRS signals [5]. The correlation of ground-truth movement data with artifact characteristics provides unprecedented insights into the specific movement parameters that generate different artifact morphologies.

Cognitive Task-Induced Artifacts

Beyond controlled movements, realistic cognitive tasks can induce motion artifacts that present particular challenges for identification and correction. Di Lorenzo et al. (2014) investigated artifacts caused by participants' jaw movements during vocal responses in a cognitive linguistic paradigm [3]. This approach revealed a particularly problematic artifact morphology characterized by low-frequency, low-amplitude disturbances that are temporally correlated with the evoked cerebral response [3]. Unlike easily identifiable spike artifacts, these task-correlated low-frequency variations closely resemble normal hemodynamic signals, making them exceptionally difficult to distinguish from true neural activity without sophisticated correction approaches.

Motion Artifact Correction Approaches: Performance Comparison

Multiple algorithmic approaches have been developed to address the challenge of motion artifacts in fNIRS data, with varying efficacy across different artifact morphologies.

Traditional Correction Methods

Traditional motion correction techniques include both hardware-based and algorithmic solutions. Hardware-based approaches often utilize accelerometers, with methods such as accelerometer-based motion artifact removal (ABAMAR) and active noise cancellation (ANC) showing promise for real-time applications [6]. Algorithm-based solutions include spline interpolation, wavelet filtering, principal component analysis (PCA), Kalman filtering, and correlation-based signal improvement (CBSI) [3].

Table 3: Performance Comparison of Motion Artifact Correction Techniques

Correction Method	Best For Artifact Type	Advantages	Limitations	Recovery Efficacy
Wavelet Filtering	Spikes, baseline shifts [3]	No pre-identification needed, powerful for high-frequency noise [12]	Computationally expensive, modifies entire time series [12]	93% artifact reduction in cognitive tasks [3]
Spline Interpolation	Spikes, identifiable artifacts [12]	Corrects only contaminated segments, simple and fast [12]	Requires reliable artifact identification, leaves high-frequency noise [12]	Dependent on accurate motion detection [12]
Spline + Wavelet Combined	Mixed artifact types [13]	Comprehensive approach for complex artifact profiles	Computational intensity	Best overall performance in infant data, saves nearly all corrupted trials [13]
tPCA	Spikes with clear identification [12]	Effective for targeted removal	Performance relies on optimal identification [12]	Varies with motion contamination degree [12]
CBSI	Low-frequency variations [3]	Correlation-based approach	May not address spike artifacts	Moderate performance on task-correlated artifacts [3]

Emerging Learning-Based Approaches

Recent advances in motion artifact correction have incorporated machine learning and deep learning methodologies. Convolutional Neural Networks (CNNs) based on U-net architectures have demonstrated promising results in reconstructing hemodynamic responses while reducing motion artifacts, producing lower mean squared error (MSE) and variance in HRF estimates compared to traditional methods [9]. Denoising auto-encoder (DAE) models, trained on synthetic fNIRS datasets generated through auto-regressive models, have also shown effectiveness in eliminating motion artifacts while preserving signal integrity [9]. These learning-based approaches represent the next frontier in motion artifact management, potentially offering more adaptive and comprehensive correction across diverse artifact morphologies.

Research Reagent Solutions: Essential Tools for fNIRS Motion Artifact Research

Table 4: Essential Research Tools for Motion Artifact Investigation

Tool Category	Specific Solutions	Function in Motion Artifact Research
Data Acquisition & Analysis Platforms	Homer2/Homer3 [12]	Standardized fNIRS data processing with multiple built-in motion correction algorithms
Computer Vision Tools	SynergyNet Deep Neural Network [5]	Frame-by-frame video analysis for ground-truth head movement quantification
Motion Detection Sensors	Accelerometers [6], 3D motion capture systems [10]	Supplementary movement data collection for artifact identification and correction
Algorithmic Toolboxes	Wavelet Filtering工具箱 [3], Spline Interpolation tools [12]	Implementation of specific correction techniques for different artifact morphologies
Performance Evaluation Metrics	ΔSignal-to-Noise Ratio (ΔSNR) [9], Mean Squared Error (MSE) [9]	Quantitative assessment of motion correction efficacy
Experimental Paradigms	Controlled head movement protocols [5], Cognitive tasks with vocalization [3]	Systematic artifact induction for methodology validation

The systematic characterization of motion artifact morphologies—spikes, baseline shifts, and low-frequency oscillations—provides an essential foundation for advancing fNIRS signal processing and analysis. Through controlled experimental protocols and emerging computer vision techniques, researchers can now precisely correlate specific movement parameters with resultant artifact profiles, enabling more targeted correction approaches. Performance comparisons of correction algorithms reveal that while traditional methods like wavelet filtering and spline interpolation remain effective for many artifact types, combined approaches and emerging learning-based methods show particular promise for complex artifact profiles. As fNIRS continues to expand into real-time applications and challenging populations, comprehensive understanding of motion artifact morphologies and their correction will remain crucial for ensuring data integrity and advancing neuroimaging research.

The Physiological and Non-Physiological Origins of Motion Artifacts in fNIRS

Functional near-infrared spectroscopy (fNIRS) has emerged as a pivotal neuroimaging technique due to its non-invasive nature, portability, and relatively high tolerance to participant movement. However, this tolerance is paradoxically paired with a significant vulnerability: motion artifacts (MAs) that can severely compromise data quality. These artifacts represent a complex interplay between physiological processes and non-physiological physical disturbances. For researchers and drug development professionals, understanding these origins is not merely an academic exercise but a fundamental prerequisite for selecting appropriate correction algorithms and ensuring the validity of experimental outcomes. This guide systematically compares the performance of prevalent MA correction techniques, providing a structured framework for their evaluation within the broader context of fNIRS methodology.

The Dual Nature of Motion Artifacts

Motion artifacts in fNIRS signals originate from two primary domains: non-physiological physical displacements and physiological processes that are unrelated to neural activity. Disentangling these origins is critical for developing and applying effective correction strategies.

Non-Physiological Origins

The predominant source of MAs is the physical decoupling of optodes from the scalp. Any movement that changes the orientation or distance between the optical fibers and the scalp can alter the impedance, generating noise in the measured signal [6] [14]. The manifestations of these physical disturbances are diverse and can be categorized as follows:

Spikes: Rapid, high-amplitude deflections occur when optodes move quickly and return to their original position [14].
Baseline Shifts: Irrevocable changes in the baseline signal intensity happen when the optode settles into a new, stable position on the scalp after movement [3] [15].
Slow Drifts: Gradual signal changes result from slow, continuous optode movement [14].

The specific head movements leading to these artifacts have been characterized using computer vision techniques, which identify movements along rotational axes—vertical, frontal, and sagittal—as primary culprits. Notably, repeated movements, as well as upward and downward motions, particularly compromise signal quality [5].

Physiological Origins

Beyond physical displacement, fNIRS signals are contaminated by physiological noise originating from systemic physiology in the scalp. These non-neural cerebral and extracerebral signals constitute a significant challenge, particularly in resting-state functional connectivity (RSFC) analyses [16]. The key physiological confounds include:

Cardiac Activity: Pulsatile blood flow creates oscillations typically around 1 to 1.2 Hz [16].
Respiratory Cycles: Breathing patterns introduce lower frequency noise in the 0.3 to 0.6 Hz range [16].
Blood Pressure Oscillations: Mayer waves, occurring at approximately 0.1 Hz, represent another source of low-frequency physiological noise [16].

This physiological noise induces temporal autocorrelation and increases spatial covariance between channels across the brain, violating the statistical assumptions of many connectivity models and potentially leading to spurious correlations [16].

The following diagram illustrates the pathways through which various sources lead to motion artifacts in the fNIRS signal.

Motion Artifact Correction Techniques: A Comparative Analysis

Multiple algorithmic approaches have been developed to correct for motion artifacts, each with distinct underlying principles, advantages, and limitations. The following table provides a structured comparison of the most prevalent techniques.

Table 1: Comparison of Primary Motion Artifact Correction Algorithms

Algorithm	Core Principle	Ideal Artifact Type	Key Advantages	Major Limitations
Wavelet Filtering [17] [14]	Decomposes signal using wavelet basis, zeros artifact-related coefficients, then reconstructs.	Spikes, slow drifts [14].	No MA detection needed; fully automatable; preserves signal integrity [14].	Performance depends on wavelet basis choice.
Spline Interpolation (MARA) [6] [15]	Identifies artifact segments, fits cubic splines to these intervals, and subtracts them.	High-amplitude spikes, baseline shifts [15].	Significant MSE reduction [15].	Requires accurate MA detection; multiple user-defined parameters [14].
Correlation-Based Signal Improvement (CBSI) [3] [14]	Assumes HbO and HbR are negatively correlated during neural activity but positively during MAs.	Large spikes, baseline shifts [14].	Fully automatable; no MA detection needed [14].	Relies on strong negative correlation assumption; may not hold in pathologies [14].
Targeted PCA (tPCA) [14] [18]	Applies PCA only to pre-detected motion artifact segments to avoid over-correction.	Artifacts identifiable via amplitude/SD thresholds.	Reduces over-correction risk vs. standard PCA [14].	Complex to use; performance depends on many parameters [14].
Temporal Derivative Distribution Repair (TDDR) [18]	Utilizes the statistical properties of the signal's temporal derivative to identify and correct outliers.	Not specified in reviewed literature.	Superior denoising for brain network analysis [18].	Not as widely validated as other methods.
WCBSI (Combined Method) [14]	Integrates wavelet filtering and CBSI into a sequential correction pipeline.	Mixed and severe artifacts [14].	Superior performance across multiple metrics; handles diverse artifacts [14].	Increased computational complexity.

Performance Evaluation in Experimental Studies

The theoretical strengths and limitations of these algorithms are validated through rigorous experimental testing. The following table summarizes key quantitative findings from comparative studies, providing a basis for objective performance assessment.

Table 2: Quantitative Performance of Correction Algorithms in Experimental Studies

Study & Population	Task	Top Performing Algorithms	Key Performance Metrics
Brigadoi et al. (2014) [17]Adults (Cognitive)	Color-naming task with vocalization	Wavelet Filtering	Reduced artifact area under the curve in 93% of cases.
Cooper et al. (2012) [15]Adults (Resting-state)	Resting-state	Spline InterpolationWavelet Analysis	55% avg. MSE reduction (Spline).39% avg. CNR increase (Wavelet).
Ayaz et al. (2021) [2]Children (Language task)	Grammatical judgment task	Moving AverageWavelet	Best outcomes across five predefined metrics.
Guan & Li (2024) [18]Simulated & Real FC data)	Brain network analysis	TDDRWavelet Filtering	Superior ROC results; best recovery of original FC and topological patterns.
Ernst et al. (2023) [14]Adults (Motor task with induced MAs)	Hand-tapping with head movements	WCBSI (Combined Method)	Exceeded average performance (p < 0.001); 78.8% probability of being best-ranked.

Experimental Protocols for Algorithm Validation

The evaluation of motion correction techniques relies on sophisticated experimental designs that enable comparison against a "ground truth" hemodynamic response. The following methodologies represent best practices in the field.

The Induced-Motion Paradigm

Ernst et al. (2023) established a robust protocol for directly comparing MA correction accuracy [14]:

Participants: 20 healthy adults.
Task Design: Participants performed a simple hand-tapping task under three conditions:
- Tapping Only: Provided the "ground truth" hemodynamic response without motion artifacts.
- Tapping with Mild Head Movement: Induced subtle motion artifacts.
- Tapping with Severe Head Movement: Induced pronounced motion artifacts.
Motion Tracking: Head movements were quantitatively monitored using accelerometers.
Data Analysis: Corrected signals from conditions 2 and 3 were compared against the "ground truth" from condition 1 using four quantitative metrics: Pearson's correlation coefficient (R), root mean square error (RMSE), mean absolute percentage error (MAPE), and change in area under the curve (ΔAUC).

Real Cognitive Data with Task-Correlated Artifacts

Brigadoi et al. (2014) utilized a cognitive paradigm that naturally produced motion artifacts correlated with the hemodynamic response [17] [3]:

Participants: 18 adult students.
Task: A color-naming Stroop task requiring vocal responses, which produced jaw movements that generated low-frequency, low-amplitude motion artifacts temporally correlated with the expected hemodynamic response.
Evaluation: Since the true HRF was unknown, performance was assessed using physiological plausibility metrics of the recovered HRF, including its shape, timing, and the negative correlation between HbO and HbR.

Pediatric Language Task Protocol

A study by Ayaz et al. (2021) focused on the critical challenge of motion correction in pediatric populations [2]:

Participants: 12 children (ages 6.8-12.6 years).
Task: An auditory grammatical judgment language task presented using a rapid event-related design.
Artifact Classification: MAs were categorized into four types (A-D) based on duration and amplitude, from brief spikes (Type A) to slow baseline shifts (Type D).
Analysis: Efficacy of six correction methods was evaluated using five predefined metrics tailored to pediatric data characteristics.

The workflow for developing and validating motion artifact correction methods typically follows a systematic process, as illustrated below.

Successful fNIRS research requiring motion artifact correction depends on both hardware and software components. The following table details key solutions and their functions in the experimental pipeline.

Table 3: Essential Research Tools for fNIRS Motion Artifact Studies

Tool Category	Specific Examples	Function in Research
Software Toolboxes	HOMER2, HOMER3 [2] [14]	Provides standardized implementations of major MA correction algorithms (PCA, spline, wavelet, CBSI, etc.) for reproducible analysis.
Auxiliary Motion Sensors	Accelerometers, IMUs, Gyroscopes [6] [14]	Offers objective, continuous measurement of head movement dynamics for MA identification and validation of correction methods.
Computer Vision Systems	SynergyNet Deep Neural Network [5]	Enables markerless tracking of head orientation and movement through video analysis, providing ground truth movement data.
Short-Separation Channels	fNIRS detectors at <1 cm distance [16]	Measures systemic physiological noise from superficial layers, used as a regressor in advanced correction pipelines.
Standardized Test Paradigms	Hand-tapping, Grammatical Judgment, Resting-State [17] [2] [14]	Provides reproducible experimental contexts for generating comparable hemodynamic responses and motion artifacts across studies.

The journey to mitigate motion artifacts in fNIRS is fundamentally about understanding their dual origins—both physiological and non-physiological. The evidence from comparative studies consistently indicates that while multiple correction algorithms exist, wavelet-based methods and their hybrids (like WCBSI) demonstrate superior and reliable performance across diverse experimental conditions and populations [17] [14] [18]. For brain functional connectivity analyses, TDDR also emerges as a particularly powerful option [18]. The selection of an appropriate algorithm must be guided by the specific artifact characteristics, the participant population, and the analytical goals of the study. As fNIRS continues to expand into more real-world applications, the development and rigorous validation of motion correction techniques will remain essential for ensuring the reliability and interpretability of fNIRS-derived biomarkers in both basic research and clinical drug development.

Functional near-infrared spectroscopy (fNIRS) has emerged as a preferred neuroimaging technique for studies requiring high ecological validity, allowing participants greater freedom of movement compared to traditional neuroimaging methods [5]. Despite this advantage, fNIRS signals are notoriously susceptible to motion artifacts (MAs)—unexpected changes in recorded signals caused by subject movement that severely degrade signal fidelity [19]. These artifacts represent a fundamental challenge for researchers and drug development professionals who require precise hemodynamic measurements for interpreting neural activity, assessing cognitive states, or evaluating pharmaceutical effects on brain function. Motion artifacts can introduce spurious components that mimic neural activity (creating false positives) or obscure actual neural activations (leading to false negatives), both of which compromise the reliability of neuroscientific findings and drug efficacy evaluations [19]. The significant deterioration in measurement quality caused by motion artifacts has become an essential research topic for fNIRS applications, particularly as the technology moves toward more portable and wearable devices used in real-world settings [10] [6].

Motion Artifact Origins and Characteristics

Motion artifacts in fNIRS signals originate from diverse physiological movements that disrupt the optimal coupling between optical sensors (optodes) and the scalp. The primary mechanism involves imperfect contact between optodes and the scalp, manifesting as displacement, non-orthogonal contact, and oscillation of the optodes [10] [6]. Research has systematically categorized movement sources based on their physiological origins:

Head movements: Including nodding, shaking, tilting, and rotational movements along three main axes (vertical, frontal, sagittal) introduce distinct artifact patterns [5] [10]. Recent research using computer vision to characterize motion artifacts has revealed that repeated movements as well as upward and downward movements particularly compromise fNIRS signal quality [5].
Facial muscle movements: Actions including raising eyebrows, frowning, and other facial expressions create localized artifacts, especially in frontal lobe measurements [10] [6].
Jaw movements: Talking, eating, and drinking produce two different types of motion artifacts that correlate with temporalis muscle activity and can be particularly challenging as they often coincide with cognitive tasks [10] [3].
Body movements: Movements of upper and lower limbs degrade fNIRS signals either by causing secondary head movements or through the inertia of the fNIRS device itself [10] [6]. This is especially problematic in mobile paradigms such as gait studies or rehabilitation exercises.

Types and Temporal Properties of Motion Artifacts

Motion artifacts manifest in fNIRS signals with distinct temporal characteristics that determine their impact on signal quality and the appropriate correction strategies:

High-frequency spikes: Sudden, brief disruptions appearing as sharp peaks in the fNIRS signal, typically resulting from rapid head movements or impacts [9] [3]. These are often easily detectable but can saturate signal processing systems.
Baseline shifts: Sustained deviations from the baseline signal caused by slow head rotations or changes in optode positioning that alter the coupling between optodes and scalp [3] [20]. These are particularly problematic as they can mimic low-frequency hemodynamic responses.
Low-frequency variations: Slower oscillations that blend with physiological signals, making them particularly challenging to distinguish from genuine hemodynamic responses [3]. These often occur during sustained movements or postural adjustments.

Table 1: Classification of Motion Artifact Types in fNIRS Signals

Artifact Type	Temporal Characteristics	Common Causes	Detection Difficulty
High-Frequency Spikes	Short duration (0.5-2s), high amplitude	Rapid head shaking, sudden movements	Low (easily distinguishable)
Baseline Shifts	Sustained deviation, slow return to baseline	Head repositioning, slow rotation	Moderate
Low-Frequency Variations	Slow oscillations (>5s duration)	Sustained movements, postural changes	High (mimics hemodynamic response)

Quantitative Impact of Motion Artifacts on SNR

Direct Effects on Signal Quality Metrics

The degradation of Signal-to-Noise Ratio (SNR) due to motion artifacts has been quantitatively established through multiple controlled studies. Motion artifacts reduce the SNR of fNIRS signals by introducing high-amplitude noise components that overwhelm the true hemodynamic signal of interest [10] [6]. Empirical evidence demonstrates that the amplitude of motion artifacts can exceed the true hemodynamic response by an order of magnitude, drastically reducing the detectability of neural activation patterns [3] [21]. In cognitive experiments, the presence of motion artifacts has been shown to ameliorate classification accuracy, directly impacting the reliability of brain-computer interface applications and cognitive state classification [10]. Research on vigilance level detection during walking versus seated conditions revealed that motion artifacts significantly reduced detection accuracy, underscoring the critical importance of effective artifact management for mobile paradigms [9].

Regional Vulnerability to Motion Artifacts

The impact of motion artifacts on SNR is not uniform across the cortex. Different brain regions show variable susceptibility to motion artifacts based on their anatomical location and the types of movements most likely to affect them. Computer vision studies combining ground-truth movement data with fNIRS signals have revealed that:

The occipital and pre-occipital regions are particularly susceptible to motion artifacts following upward or downward head movements [5].
Temporal regions are most affected by lateral movements (bend left, bend right, left, and right movements) [5].
Frontal regions show vulnerability to facial movements and jaw motions, creating particular challenges for studies of higher cognition and emotional processing [3].

This regional variability necessitates customized artifact correction approaches based on the brain region being studied and the experimental paradigm.

Table 2: Quantitative Impact of Motion Artifacts on fNIRS Signal Quality

Impact Metric	Without MAs	With MAs	Degradation	Measurement Context
Classification Accuracy	70-85%	45-60%	25-40% reduction	Vigilance detection during walking [9]
Contrast-to-Noise Ratio	100% (baseline)	40-60%	40-60% reduction	Cognitive task with speech [3]
HRF Amplitude Estimation	Accurate	Overestimated by 2-3x	200-300% error	Simulated data with added MAs [21]
Functional Connectivity	Stable patterns	Altered correlation	False positive/negative connections	Resting-state networks [19]

Experimental Protocols for Quantifying Motion Artifact Impact

Controlled Movement Paradigms

To systematically quantify the impact of motion artifacts on SNR, researchers have developed controlled experimental protocols that induce specific, reproducible movements:

Standardized head movements: Participants perform controlled head movements along three rotational axes (pitch, yaw, roll) at varying speeds (slow, fast) and movement types (half, full, repeated rotations) while fNIRS data is collected [5]. These movements are typically guided by visual cues to ensure consistency across participants.
Task-embedded movements: Incorporating movements naturally occurring during cognitive tasks, such as jaw movements during speech in color-naming tasks [3]. This approach captures artifacts that are temporally correlated with the hemodynamic response, representing a particularly challenging scenario for correction algorithms.
Whole-body movements: Having participants perform walking, reaching, or other gross motor activities while wearing fNIRS systems, especially relevant for rehabilitation research and mobile brain imaging [9].

Signal Quality Assessment Methodologies

The quantitative evaluation of motion artifact impact employs several well-established methodological approaches:

Semi-simulated data: Adding simulated hemodynamic responses to real resting-state fNIRS data containing actual motion artifacts, creating a ground truth for evaluating artifact impact and correction efficacy [3] [21]. This approach allows precise calculation of metrics like Mean Squared Error (MSE) and Pearson's Correlation Coefficient between known and recovered signals.
Computer vision integration: Using video recordings analyzed frame-by-frame with deep neural networks (e.g., SynergyNet) to compute head orientation angles, providing objective ground-truth movement data synchronized with fNIRS acquisition [5]. This enables precise correlation between specific movement parameters (amplitude, speed) and artifact characteristics.
Artefact induction and recovery: Purposely asking participants to perform specific movements during designated periods to create motion artifacts, then evaluating how these artifacts impact the recovery of known functional responses [3].

Experimental Setup and Hardware Solutions

Computer Vision Systems: Video recording equipment with deep neural network analysis (e.g., SynergyNet) for extracting ground-truth head movement parameters including orientation angles, movement amplitude, and velocity [5]. These systems provide objective movement quantification without physical contact with participants.
Inertial Measurement Units (IMUs): Wearable accelerometers, gyroscopes, and magnetometers that provide complementary motion data for adaptive filtering approaches such as Active Noise Cancellation (ANC) and Accelerometer-Based Motion Artifact Removal (ABAMAR) [10] [6]. These are particularly valuable for capturing high-frequency movement data.
Collodion-Fixed Optical Fibers: Specialized optode-scalp coupling methods using prism-based optical fibers fixed with collodion to improve adhesion and reduce motion-induced decoupling [10] [6]. This hardware solution addresses the root cause of motion artifacts but requires more expertise to implement.
Polarization-Based Systems: fNIRS systems using linearly polarized light sources with orthogonally polarized analyzers to distinguish between motion artifacts and true hemodynamic signals based on their polarization properties [10].

Software and Analytical Tools

Wavelet Analysis Toolboxes: Software implementations of wavelet filtering algorithms that effectively isolate motion artifacts in the wavelet domain by identifying and thresholding outlier coefficients [3] [19]. These are particularly effective for spike artifacts and low-frequency oscillations.
Spline Interpolation Algorithms: Tools for motion artifact reduction (e.g., MARA) that identify corrupted segments and reconstruct them using spline interpolation, especially effective for baseline shifts [19] [20].
Hybrid Correction Frameworks: Combined approaches that integrate multiple correction strategies (e.g., spline interpolation for severe artifacts, wavelet methods for slight oscillations) to address different artifact types within a unified processing pipeline [20].
Deep Learning Architectures: Denoising Autoencoder (DAE) models and convolutional neural networks (CNNs) specifically designed for motion artifact removal, offering assumption-free correction without extensive parameter tuning [9] [21].

Table 3: Research Reagent Solutions for Motion Artifact Management

Tool Category	Specific Examples	Primary Function	Implementation Complexity
Hardware Solutions	Inertial Measurement Units (IMUs)	Capture independent movement data for adaptive filtering	Medium
	Computer Vision Systems	Provide ground-truth movement metrics without physical contact	High
	Collodion-Fixed Fibers	Improve optode-scalp coupling to prevent artifacts	High
Algorithmic Solutions	Wavelet Filtering	Remove spike artifacts and oscillations in time-frequency domain	Low-Medium
	Spline Interpolation	Correct baseline shifts and severe artifacts	Medium
	Temporal Derivative Distribution Repair (TDDR)	Online artifact removal using robust statistical estimation	Low
Evaluation Metrics	ΔSNR (Change in SNR)	Quantify noise suppression after correction	Low
	Contrast-to-Noise Ratio (CNR)	Evaluate functional contrast preservation	Low
	Mean Squared Error (MSE)	Assess fidelity of recovered hemodynamic response	Low

Implications for Motion Artifact Correction Algorithm Development

The systematic quantification of motion artifact impact on SNR provides critical guidance for developing and validating correction algorithms. Research demonstrates that correction is always preferable to rejection; even simple artifact correction methods outperform the practice of discarding contaminated trials, which reduces statistical power and introduces selection bias [3]. However, the efficacy of correction algorithms varies significantly based on artifact characteristics:

Wavelet-based methods have shown particular effectiveness, reducing the area under the curve where artifacts are present in 93% of cases for certain artifact types [3]. More recent evaluations identify Temporal Derivative Distribution Repair (TDDR) and wavelet filtering as the most effective methods for functional connectivity analysis [19].
Hybrid approaches that combine multiple correction strategies (e.g., spline interpolation for baseline shifts with wavelet methods for oscillations) demonstrate superior performance compared to individual methods alone, addressing the diverse nature of motion artifacts [20].
Deep learning methods represent a promising emerging approach, with Denoising Autoencoder (DAE) architectures demonstrating competitive performance while minimizing the need for expert parameter tuning [9] [21].

The development of standardized evaluation metrics incorporating both noise suppression (ΔSNR, artifact power attenuation) and signal distortion (percent root difference, correlation coefficients) is essential for objective comparison of correction methods across different research contexts [10] [9]. This quantitative framework enables researchers to select the most appropriate artifact management strategy based on their specific experimental paradigm, participant population, and research objectives.

In functional near-infrared spectroscopy (fNIRS) research, motion artifacts (MAs) represent a significant source of signal contamination that can severely compromise data integrity and lead to spurious scientific conclusions [6] [3]. These artifacts arise from imperfect contact between optodes and the scalp during participant movement, resulting in signal components that can mimic or obscure genuine hemodynamic responses [6] [5]. The evaluation of motion artifact removal techniques consequently hinges on two competing objectives: effectively suppressing noise while faithfully preserving the underlying physiological signal of interest [6] [3]. This fundamental trade-off between noise suppression and signal preservation forms the critical framework for assessing methodological performance in fNIRS research, particularly in drug development studies where accurate hemodynamic measurement is paramount.

Motion artifacts manifest in diverse forms, including high-frequency spikes, baseline shifts, and low-frequency variations, each presenting distinct challenges for correction algorithms [3] [22]. These artifacts can be temporally correlated with the hemodynamic response function (HRF), making simple filtering approaches insufficient [3]. The pursuit of optimal motion correction therefore requires sophisticated evaluation metrics that quantitatively assess both noise reduction and signal integrity across varied experimental conditions.

Core Evaluation Metrics Framework

The assessment of motion artifact correction techniques employs distinct metric categories targeting noise suppression and signal preservation objectives. The following table summarizes the key evaluation metrics employed in fNIRS research:

Table 1: Core Evaluation Metrics for Motion Artifact Correction Techniques

Evaluation Goal	Metric	Definition	Interpretation
Noise Suppression	Signal-to-Noise Ratio (SNR)	Ratio of signal power to noise power	Higher values indicate better noise suppression
	Pearson's Correlation Coefficient (R)	Linear correlation between corrected signals and reference	Values closer to 1 indicate better noise removal
	Contrast-to-Noise Ratio (CNR)	Ratio of hemodynamic response amplitude to background noise	Higher values indicate improved functional sensitivity
	Within-Subject Standard Deviation	Variability of repeated measurements in the same subject	Lower values indicate better reliability
	Area Under Curve (AUC) of ROC	Ability to distinguish true activations from false positives	Higher values indicate better detection specificity
Signal Preservation	Mean-Squared Error (MSE)	Average squared difference between estimated and true HRF	Lower values indicate better preservation of signal shape
	Pearson's Correlation with True HRF	Linear relationship between recovered and simulated HRF	Values closer to 1 indicate faithful signal reconstruction

These metrics enable researchers to quantitatively compare the performance of different correction techniques and select the most appropriate method for their specific research context [6] [3]. The noise suppression metrics primarily evaluate the effectiveness of artifact removal, while the signal preservation metrics assess how faithfully the underlying hemodynamic response is maintained after processing [6].

Experimental Protocols for Metric Validation

Semisynthetic Simulation with Experimental Data

The receiver operating characteristic (ROC) simulation approach provides a robust framework for evaluating metric performance under controlled conditions [23]:

Background Signal Acquisition: Collect real fNIRS data during resting state or breath-hold tasks to capture authentic physiological noise characteristics [23]
Synthetic HRF Addition: Add known, simulated "brain" responses at varying amplitudes to the background signals, creating a ground truth for validation [23] [3]
Algorithm Application: Apply multiple motion correction techniques to the semisynthetic data
Performance Quantification: Calculate sensitivity and specificity by comparing detected activations with known added responses [23]
ROC Curve Generation: Plot true positive rates against false positive rates across varying detection thresholds
AUC Calculation: Compute the area under the ROC curve as a comprehensive performance metric [23]

This methodology enables direct comparison of correction techniques with perfect knowledge of the true hemodynamic response, allowing precise quantification of both noise suppression and signal preservation capabilities [23] [3].

Real Functional Data with Physiological Plausibility Assessment

When the true HRF is unknown, as with real task data, researchers employ physiologically plausible HRF parameters for validation [3]:

Data Collection: Acquire fNIRS data during cognitive or motor tasks known to produce specific artifacts (e.g., speaking tasks that generate jaw movement artifacts) [3]
Motion Correction: Apply multiple artifact removal algorithms to the contaminated data
HRF Parameter Extraction: Derive key parameters from the recovered hemodynamic response, including time-to-peak, response amplitude, and full-width at half-maximum [3]
Plausibility Assessment: Evaluate whether the extracted parameters fall within physiologically reasonable ranges established by prior literature
Spatial Specificity Evaluation: Assess whether activation patterns conform to neuroanatomical expectations [24]

This approach provides practical validation of correction techniques under real-world conditions where motion artifacts may be correlated with the task paradigm itself [3].

Performance Comparison of Correction Techniques

Empirical comparisons of motion artifact correction methods reveal performance variations across different evaluation metrics. The following table synthesizes findings from multiple experimental studies:

Table 2: Comparative Performance of Motion Artifact Correction Techniques

Correction Method	Noise Suppression Performance	Signal Preservation Performance	Best Application Context
Wavelet Filtering	Highest performance for spike and low-frequency artifacts [3]	Preserves HRF shape effectively (93% artifact reduction) [3]	General purpose, various artifact types
Spline Interpolation	Effective for baseline shifts [22]	Best improvement in Mean-Squared Error [3]	Slow head movements causing baseline shifts
Moving Average	Good overall noise reduction [2]	Moderate signal preservation	Pediatric populations [2]
tPCA	Effective for specific artifact segments [25]	Good HRF recovery for motion spikes	Isolated motion artifacts in children [2]
CBSI	Removes large spikes effectively [22]	Assumes perfect negative HbO/HbR correlation	Scenarios with strong anti-correlation
Short-Separation Channels + GLM	Superior noise suppression (best AUC in ROC) [23]	Maintains physiological accuracy	When hardware supports short-separation measurements
Hybrid Methods	Combined strengths of multiple approaches [22]	Balanced performance across metrics	Complex artifacts with different characteristics

The comparative evidence indicates that wavelet-based methods generally provide the most effective balance between noise suppression and signal preservation for typical artifact types [3]. However, method performance is context-dependent, with certain techniques excelling in specific scenarios, such as spline interpolation for baseline shifts or moving average approaches for pediatric data [22] [2].

Research Reagent Solutions: Experimental Toolkit

Table 3: Essential Research Materials for fNIRS Motion Artifact Investigation

Research Tool	Function/Purpose	Implementation Considerations
Short-Separation Channels	Measures superficial layer contamination	0.5-1.0 cm source-detector distance; requires specialized hardware [23] [24]
Accelerometers/IMU	Provides independent motion measurement	Synchronization with fNIRS data crucial; placement on head optimal [6]
Computer Vision Systems	Quantifies head movement without physical contact	Deep neural networks (e.g., SynergyNet) for head orientation [5]
Auxiliary Physiological Monitors	Records cardiac, respiratory, blood pressure signals	Helps distinguish motion artifacts from physiological noise [8]
Semisynthetic Data Algorithms	Generates validation datasets with known ground truth	Combines experimental noise with simulated hemodynamic responses [23] [3]
Specialized Optical Fibers	Improves optode-scalp coupling	Collodion-fixed fibers minimize motion-induced decoupling [6]

Methodological Workflow and Decision Pathway

The following diagram illustrates the logical relationship between evaluation goals, metrics, and correction approaches in fNIRS motion artifact research:

Decision Framework for fNIRS Motion Correction

The systematic evaluation of motion artifact correction techniques in fNIRS research requires careful consideration of both noise suppression and signal preservation metrics. Evidence from comparative studies indicates that wavelet-based filtering generally provides superior performance for common artifact types, while spline interpolation excels specifically for baseline shifts [3] [22]. The emerging approach of incorporating short-separation channels within a general linear model framework demonstrates particularly promising results for comprehensive noise suppression [23] [8].

Researchers should select evaluation metrics that align with their specific research objectives, giving consideration to the nature of expected artifacts, participant population characteristics, and the critical balance between false positives and false negatives in their experimental context. The implementation of standardized evaluation protocols, particularly semisynthetic simulations with ground truth validation, enables direct comparison between methodological approaches and facilitates the selection of optimal correction strategies for specific research scenarios in both basic neuroscience and applied drug development studies.

A Taxonomy of Motion Artifact Removal Techniques: From Hardware to Algorithms

Motion artifacts (MAs) represent a significant challenge in functional near-infrared spectroscopy (fNIRS) research, often compromising data quality and interpretation. These artifacts arise from imperfect contact between optodes and the scalp due to movement-induced displacement, non-orthogonal contact, or oscillation of the optodes [6]. As fNIRS expands into studies involving naturalistic behaviors, pediatric populations, and clinical cohorts with involuntary movements, effective MA management becomes increasingly critical for data integrity. Hardware-based solutions offer a proactive approach to this problem by providing direct measurement of motion dynamics, enabling more targeted and physiologically informed artifact correction compared to purely algorithmic methods [6] [25].

The fundamental advantage of hardware approaches lies in their ability to capture independent, time-synchronized information about the source of artifacts—whether from head movements, facial muscle activity, jaw movements, or body displacements [6] [5]. This review systematically compares three principal hardware-based solutions: accelerometer-based systems, inertial measurement units (IMUs), and short-separation channels (SSCs). We evaluate their operational principles, implementation requirements, correction efficacy, and suitability for different research scenarios, providing experimental data and performance metrics to guide researchers in selecting appropriate solutions for their specific fNIRS applications.

Comparative Analysis of Hardware Solutions

Table 1: Overview of Hardware-Based Motion Artifact Correction Methods

Method	Primary Components	Measured Parameters	Implementation Complexity	Key Advantages
Accelerometer	Single- or multi-axis accelerometer	Linear acceleration	Low to moderate	Cost-effective; well-established signal processing pipelines [6]
IMU (Inertial Measurement Unit)	Accelerometer, gyroscope, (magnetometer)	Linear acceleration, angular velocity, orientation	Moderate to high	Comprehensive movement capture; rich kinematic data [6]
Short-Separation Channels	Additional fNIRS optodes at short distances (~8-15mm)	Superficial hemodynamic fluctuations	Moderate	Direct measurement of systemic artifacts; no additional hardware synchronization [25]

Table 2: Performance Comparison of Hardware-Based Correction Methods

Method	Artifact Types Addressed	Compatibility with Real-Time Processing	Evidence of Efficacy	Key Limitations
Accelerometer	Head movements, gross body movements [6]	Yes (multiple methods support real-time application) [6]	Improved signal-to-noise ratio; validated in multiple studies [6]	Limited to detecting acceleration forces only [6]
IMU	Head rotations, displacements, complex movement patterns [6] [5]	Yes (with sufficient processing capacity) [6]	Superior for characterizing movement along multiple axes [5]	Higher cost; more complex data integration [6]
Short-Separation Channels	Systemic physiological noise, superficial scalp blood flow changes [25]	Limited (primarily used in offline analysis)	Effective for separating cerebral from extracerebral signals [25]	Limited effectiveness for abrupt, high-amplitude motion artifacts [25]

Detailed Methodologies and Experimental Protocols

Accelerometer-Based Approaches

Accelerometer-based methods employ miniature sensors attached to the fNIRS headgear to record head movement dynamics simultaneously with hemodynamic measurements. The fundamental principle involves using acceleration signals as reference inputs for adaptive filtering techniques that distinguish motion-induced artifacts from neural activity-related hemodynamic changes [6].

Active Noise Cancellation (ANC) implements a recursive least-squares adaptive filter that continuously adjusts its parameters to minimize the difference between the measured fNIRS signal and a reference signal derived from the accelerometer [6]. The algorithm models the measured fNIRS signal (z(n)) as a combination of the true hemodynamic signal and motion-induced noise correlated with accelerometer readings.

Accelerometer-Based Motion Artifact Removal (ABAMAR) employs a two-stage process where motion-contaminated segments are first identified via threshold-based detection on accelerometer data, followed by correction using interpolation or model-based approaches [6]. The correction phase typically involves piecewise cubic spline interpolation or autoregressive modeling to reconstruct the signal within artifact periods.

Experimental Protocol Validation: In validation studies, participants perform controlled head movements (rotations, nods, tilts) at varying speeds and amplitudes while simultaneous fNIRS and accelerometer data are collected [5]. Performance metrics include signal-to-noise ratio improvement, correlation with ground-truth hemodynamic responses, and reduction in false activation rates [6] [5].

IMU-Based Solutions

Inertial Measurement Units integrate multiple sensors—typically a triaxial accelerometer, triaxial gyroscope, and sometimes a magnetometer—providing comprehensive kinematic data including linear acceleration, angular velocity, and orientation relative to the Earth's magnetic field [6]. This multi-modal capture enables more sophisticated movement characterization compared to accelerometer-only systems.

Implementation Framework: IMUs are typically secured to the fNIRS headgear at strategic locations, often on the forehead or temporal regions. The gyroscope component is particularly valuable for detecting rotational movements that may produce minimal linear acceleration but significant optode displacement [6]. Data from all sensors are time-synchronized with fNIRS measurements and often fused using Kalman filtering to create a unified movement reference signal [6].

Blind Source Separation with IMU Reference (BLISSA2RD) represents an advanced approach combining hardware and algorithmic methods. This technique uses IMU data to inform blind source separation algorithms, particularly independent component analysis (ICA), facilitating more accurate identification and removal of motion-related components from fNIRS signals [6].

Experimental Validation: Controlled studies have participants perform specific head movements categorized by axis (vertical, frontal, sagittal), speed (fast, slow), and type (half, full, repeated rotations) while head orientation is simultaneously tracked using computer vision systems for ground-truth comparison [5]. Research demonstrates that occipital and pre-occipital regions are particularly susceptible to upwards or downwards movements, while temporal regions are most affected by lateral bending movements [5].

Short-Separation Channels

Short-separation channels employ additional source-detector pairs placed at minimal distances (typically 8-15mm) compared to standard channels (25-35mm). The fundamental principle is that these short-distance channels primarily detect hemodynamic changes in superficial layers (scalp, skull) rather than cerebral cortex, providing a reference for systemic physiological noise and motion artifacts affecting the scalp circulation [25].

Implementation Configuration: SSCs are integrated directly into the fNIRS cap design, interspersed with conventional channels. Optimal placement varies by brain region studied, with typical configurations including 1-2 SSCs per region of interest. The shallow photon path of SSCs makes them particularly sensitive to motion-induced hemodynamic changes in extracerebral tissues [25].

Signal Processing Approaches: SSC signals are used as regressors in general linear models (GLM) to remove shared variance with standard channels, or in adaptive filtering configurations. More advanced implementations employ SSC data in component-based methods (e.g., principal component analysis) to identify and remove motion-related signal components [25].

Validation Methodology: Efficacy is typically demonstrated by comparing activation maps with and without SSC regression, measuring reductions in false positive activations, and assessing the specificity of retained neural signals using tasks with well-established hemodynamic response profiles [25].

Visualizing Experimental Workflows and Signaling Pathways

Hardware Correction Workflow

Signal Pathways Diagram

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 3: Essential Materials for Hardware-Based Motion Artifact Research

Item	Specification	Research Function	Example Applications
Triaxial Accelerometer	Range: ±8g, Sensitivity: 8-16g, Sampling: ≥50Hz [26]	Measures linear acceleration in three dimensions	Head movement detection, artifact reference signal [6]
IMU (Inertial Measurement Unit)	6-axis (accelerometer + gyroscope) or 9-axis (plus magnetometer), Sampling: ≥52Hz [26] [6]	Comprehensive movement capture (acceleration, rotation, orientation)	Complex motion characterization, multi-parameter artifact correction [6] [5]
fNIRS System with Auxiliary Inputs	Analog/digital ports for external sensor synchronization, customizable sampling rates	Integration of motion sensor data with hemodynamic measurements	Hardware-based artifact correction implementations [6]
Short-Separation Optodes	Source-detector distance: 8-15mm, compatible with standard fNIRS systems	Isolation of superficial hemodynamic fluctuations	Systemic noise regression, scalp blood flow monitoring [25]
Motion Capture System	Video-based tracking with computer vision algorithms (e.g., SynergyNet DNN) [5]	Ground-truth movement validation	Method validation, movement parameter quantification [5]
Custom Headgear	Secure mounting solutions for sensors and optodes	Stabilization of equipment during movement studies	Motion artifact research in naturalistic settings [5]

Hardware-based solutions for motion artifact management in fNIRS offer distinct advantages for researchers requiring high data quality in movement-rich environments. Accelerometers provide a cost-effective solution for general motion detection, while IMUs deliver comprehensive kinematic data for complex movement patterns. Short-separation channels address the specific challenge of superficial physiological noise often confounded with motion artifacts.

The selection of appropriate hardware solutions depends on multiple factors including research population, experimental paradigm, and analysis requirements. For pediatric studies or clinical populations with frequent movement, IMU-based systems provide the most robust movement characterization. For studies focusing on hemodynamic specificity, short-separation channels offer unique advantages in disentangling cerebral and extracerebral signals. Combining multiple hardware approaches often yields superior results compared to any single method.

Future research directions should include standardized validation protocols for hardware solutions, improved real-time processing capabilities, and development of integrated systems that seamlessly combine multiple hardware approaches. As fNIRS continues to expand into naturalistic research paradigms, hardware-based motion artifact management will play an increasingly vital role in ensuring data quality and physiological validity.

Functional near-infrared spectroscopy (fNIRS) has emerged as a vital neuroimaging tool, particularly for populations such as infants and children, due to its portability and relative tolerance to movement [2] [27]. However, the signals it acquires are highly susceptible to motion artifacts (MAs), which are among the most significant sources of noise and can severely compromise data quality [2] [6]. These artifacts arise from relative movement between optical sensors (optodes) and the scalp, leading to signal contaminants that can obscure the underlying hemodynamic responses associated with neural activity [15]. The challenge is especially pronounced in pediatric and developmental studies, where participants are naturally more active and data collection times are often limited [2] [27].

To address this problem, numerous software-based algorithmic correction methods have been developed, allowing researchers to salvage otherwise unusable data segments. This guide provides a comparative analysis of three fundamental approaches: Spline Interpolation, Moving Average (MA), and Principal Component Analysis (PCA). The objective is to equip researchers, scientists, and drug development professionals with a clear understanding of these techniques' performance, supported by experimental data and detailed protocols, to inform their analytical choices in motion artifact correction.

Spline Interpolation

The spline interpolation method identifies segments of data contaminated by motion artifacts and models these artifactual periods using a cubic spline. This modeled artifact is then subtracted from the original signal to recover the true physiological data [15] [22]. The process relies on accurate artifact detection, often based on analyzing the moving standard deviation of the signal and setting thresholds for peak identification [22]. Its primary strength lies in effectively correcting baseline shifts and slower, sustained artifacts [22].

Moving Average (MA)

The Moving Average method functions as a high-pass filter, primarily aimed at removing slow drifts from the fNIRS signal [2] [22]. It operates by calculating the average of data points within a sliding window and subtracting this trend from the signal. While effective for slow drifts, it is not typically classified as a dedicated motion correction method like wavelet filtering but is often used in combination with other techniques to improve overall performance [2].

Principal Component Analysis (PCA)

PCA is a multivariate technique that decomposes multi-channel fNIRS data into a set of orthogonal components ordered by the amount of variance they explain [14]. Since motion artifacts often have large amplitudes, they are likely to be captured in the first few principal components. Correction is achieved by removing these components before reconstructing the signal [15] [14]. A significant advancement is Targeted PCA (tPCA), which applies the PCA filter exclusively to segments pre-identified as containing motion artifacts, thereby reducing the risk of over-correction and preserving more of the physiological signal [27] [14].

The following diagram illustrates the core workflow for a motion artifact correction process that incorporates these methods.

Performance Comparison in Experimental Settings

Direct comparisons of these techniques across various populations and task paradigms reveal context-dependent performance.

Comparative Performance Table

Table 1: Comparative performance of Spline, Moving Average, and PCA correction methods across key studies.

Correction Method	Study Population	Task Paradigm	Key Performance Findings	Cited Limitations
Spline Interpolation	Adult stroke patients [15]	Resting-state	Produced the largest average reduction in Mean-Squared Error (MSE) (55%).	Requires accurate MA detection; many user-defined parameters [22] [14].
Spline Interpolation	Young children (3-4 years) [27]	Visual working memory	Retained a high number of trials; performed robustly across metrics.	Consistently outperformed by tPCA in head-to-head comparison [27].
Moving Average (MA)	Children (6-12 years) [2]	Language acquisition	Yielded one of the best outcomes according to five predefined metrics.	Serves more as a filter for slow drifts than a dedicated MA corrector [22].
Principal Component Analysis (PCA)	Adult stroke patients [15]	Resting-state	Significantly reduced MSE and increased Contrast-to-Noise Ratio (CNR).	Can over-correct the signal, removing physiological data [14].
Targeted PCA (tPCA)	Young children (3-4 years) [27]	Visual working memory	An effective technique; consistently outperformed spline interpolation.	Performance depends on multiple user-set parameters for MA detection [14].

Study with Children (Ages 6-12): A direct comparison of six techniques applied to data from a language acquisition task found that Moving Average (MA) and Wavelet methods yielded the best outcomes. In this specific pediatric context, these methods outperformed Spline Interpolation, PCA, and other approaches [2].
Study with Young Children (Ages 3-4): Research on a visual working memory paradigm concluded that tPCA was the most effective technique for correcting motion artifacts, consistently outperforming Spline interpolation and other methods. Both tPCA and Spline were noted for retaining a high number of trials, which is critical in studies with limited data [27].
Study with Adult Patients: A systematic comparison using data from stroke patients found that Spline interpolation produced the largest reduction in mean-squared error (MSE), while Wavelet analysis produced the greatest increase in contrast-to-noise ratio (CNR). Both methods, along with PCA, performed significantly better than no correction or trial rejection [15].

Detailed Experimental Protocols from Key Studies

To ensure reproducibility and provide context for the data in the comparison table, this section outlines the methodologies of two pivotal studies.

Protocol 1: fNIRS during Language Task in Children

Objective: To compare the efficacy of six motion artifact correction techniques (including Spline, MA, and PCA) on fNIRS data from children [2].
Participants: Twelve children (eight females, age range: 6.8 to 12.6 years) [2].
Task: An auditory grammatical judgment language task based on the Test of Early Grammatical Impairment. Children listened to sentences and judged whether they were correct or contained mistakes, pressing a button to respond. The task was a rapid event-related design with 60 trials and a total duration of approximately 7.6 minutes [2].
fNIRS Recording: A TechEN-CW6 system was used with one emitter and three detectors spaced 2.7 cm apart, placed over the left inferior frontal gyrus (a language-related area). Data was sampled at 10 Hz [2].
Data Processing: Data was processed using the Homer2 fNIRS package. Motion artifacts were categorized into four distinct types (spikes, peaks, gentle slopes, and slow baseline shifts) to better evaluate correction efficacy. The performance of each algorithm was evaluated based on five predefined metrics [2].

Protocol 2: fNIRS during Working Memory Task in Young Children

Objective: To evaluate motion correction algorithms, including tPCA and Spline, with fNIRS data from young children performing a cognitive task [27].
Participants: Twenty-five young children (11 aged 3.5 years and 14 aged 4.5 years) [27].
Task: A visual working memory change detection task. Children were shown an array of 1-3 colored squares, followed by a delay and a test array. They had to indicate if all objects matched the memory array or if one had changed color. The task involved multiple trials with variable inter-trial intervals [27].
fNIRS Recording: Data was collected at 50 Hz using a TechEn CW6 system. A cap with 12 sources and 20 detectors was used, creating 36 channels covering frontal, temporal, and parietal cortices [27].
Data Processing: Motion correction algorithms were compared using metrics related to the physiological properties of the recovered hemodynamic response function (HRF). The study quantitatively compared how well each technique preserved the expected HRF shape and retained usable trials [27].

The Scientist's Toolkit: Essential Research Reagents & Materials

Table 2: Key materials and software tools used in fNIRS motion artifact research.

Item Name	Function/Application	Example in Cited Research
TechEN CW6 fNIRS System	A continuous-wave fNIRS instrument for measuring changes in oxy- and deoxy-hemoglobin concentrations.	Used in multiple studies for data acquisition [2] [15] [27].
Homer2 / Homer3 Software Package	An open-source MATLAB toolbox for fNIRS data visualization and processing, including implementation of major MA correction algorithms.	Used as the primary data processing platform [2] [27] [14].
fNIRS Optode Caps	Headgear to hold sources and detectors in place on the scalp. Custom caps are often made for different head sizes to improve stability.	Used with foam and optode holders; a wrapping band was added for security [2].
E-Prime Software	A tool for designing and running experimental paradigms, presenting stimuli, and recording behavioral responses.	Used to present the grammatical judgment and visual working memory tasks [2] [27].
MATLAB	A high-level programming and numerical computing platform used as the base for running analysis toolboxes like Homer2/3.	The core environment for data analysis and algorithm execution [2] [14].

The comparative analysis of Spline Interpolation, Moving Average, and Principal Component Analysis reveals that there is no single "best" motion artifact correction technique universally applicable to all fNIRS studies. The optimal choice is highly dependent on the research context, including the participant population, the nature of the experimental task, and the specific types of motion artifacts present.

Spline Interpolation excels at correcting baseline shifts and is a robust, well-understood method, though its performance depends on accurate motion detection.
Moving Average is effective as a filter for slow drifts and has shown superior performance in certain pediatric language studies.
PCA, particularly its targeted variant (tPCA), offers powerful correction by leveraging multi-channel data and has demonstrated leading performance in studies with young children, minimizing the data loss that is a critical concern in developmental research.

Researchers must consider their specific constraints and goals—whether prioritizing the retention of trials, the accuracy of the recovered hemodynamic response, or the minimization of specific artifact types—when selecting an algorithmic approach for motion artifact correction.

Motion artifacts represent a significant source of noise in functional near-infrared spectroscopy (fNIRS) data, particularly in experiments involving pediatric populations, clinical patients, or any paradigm where subject movement is likely [2] [28]. While hardware-based solutions exist, algorithmic corrections offer a versatile and widely applicable approach for mitigating these artifacts without modifying experimental setups. This guide provides an objective comparison of three prominent software-based motion artifact correction techniques: Wavelet Transform, Kalman Filtering, and Correlation-Based Signal Improvement (CBSI). The performance of these methods is evaluated within the critical context of fNIRS research, with a focus on empirical findings and practical implementation for researchers and scientists.

Wavelet Transform

Method Overview: The Wavelet Transform method decomposes a signal into wavelets—localized waveforms of limited duration. The core principle involves performing a Discrete Wavelet Transform (DWT) to generate wavelet coefficients for different frequency bands [29]. Artifacts are identified as coefficients that are statistical outliers within their respective distributions. These outlier coefficients are then "zeroed" or thresholded, and the signal is reconstructed via an inverse wavelet transform, effectively removing the artifact [28] [29]. An advanced variant, kurtosis-based Wavelet Filtering (kbWF), uses the fourth moment (kurtosis) of the wavelet coefficient distribution to more diagnostically identify outliers, offering improved performance, especially with high-SNR signals [29].

Typical Experimental Protocol:

Data Acquisition: fNIRS signals are collected during a task or resting state. For validation, a simulated hemodynamic response function (HRF) is often added to real resting-state data containing known motion artifacts [29].
Signal Decomposition: The signal is decomposed into multiple frequency scales using the DWT [29].
Artifact Identification: Wavelet coefficients are analyzed for each frequency scale. The kbWF method calculates the kurtosis of the coefficient distribution; a super-Gaussian distribution (kurtosis > 3) indicates contamination. The most extreme coefficients are iteratively identified and removed [29].
Signal Reconstruction: The corrected signal is reconstructed using the inverse DWT.

Kalman Filtering

Method Overview: Kalman Filtering is a recursive algorithm that estimates the state of a dynamic system from a series of incomplete and noisy measurements. In fNIRS, it is used to predict the "true" hemodynamic state by modeling the underlying physiological processes and noise characteristics [30]. Recent implementations have been enhanced by integrating multimodal regressors, such as signals from accelerometers and short-separation channels, optimized using time-embedded Canonical Correlation Analysis (tCCA) to account for non-instantaneous coupling between signals [30]. This makes it particularly suited for real-time applications like brain-computer interfaces (BCIs).

Typical Experimental Protocol:

Baseline Recording: A short resting-state fNIRS dataset is acquired alongside auxiliary signals (e.g., short-separation channels, accelerometer) to tune the filter parameters [30].
Model Definition: A state-space model is established, defining the relationship between the observed fNIRS signals, the hidden hemodynamic state, and the nuisance regressors [30].
Real-Time Processing: The Kalman filter processes data sample-by-sample. It first predicts the current state based on the previous state and then updates this prediction using the new measurement, recursively producing an optimal estimate of the brain activity while regressing out confounds [30].

Correlation-Based Signal Improvement (CBSI)

Method Overview: CBSI is a lightweight method based on a physiological assumption: functionally evoked changes in oxy-hemoglobin (HbO) and deoxy-hemoglobin (HbR) are negatively correlated, while motion artifacts typically induce positively correlated changes in both chromophores [28]. The algorithm leverages this anti-correlation to suppress artifacts. The corrected HbO and HbR signals are calculated as a linear combination of the original signals, enhancing the negative correlation between them [28].

Typical Experimental Protocol:

Data Input: Simultaneously measured HbO and HbR time-series data are used as input [28].
Calculation of Corrected Signals: The corrected signals are computed directly using the CBSI algorithm. The formula for the corrected HbO signal is derived based on the temporal derivative and correlation of the original HbO and HbR signals, and the corrected HbR is set to be proportional to the negative of the corrected HbO [28].
Output: The method outputs the artifact-corrected HbO and HbR time-series.

Performance Comparison

The following table summarizes key performance metrics and characteristics of the three methods, synthesized from comparative studies.

Table 1: Performance Comparison of Motion Artifact Correction Methods

Metric	Wavelet Transform	Kalman Filtering	CBSI
Corrected Signal Characteristics	Effectively removes transient spikes; preserves signal shape [28].	Provides high contrast-to-noise ratio; effective in real-time regression [30].	Enforces negative correlation between HbO and HbR; may distort true HbR dynamics [28].
Best Suited Artifact Type	Broad efficacy; particularly powerful for task-correlated, low-frequency artifacts [28].	Physiological confounds and motion artifacts, especially with auxiliary signals [30].	Artifacts causing positively correlated HbO/HbR changes [28].
Computational Load	Moderately computationally intensive due to multi-scale decomposition [29].	Recursive and efficient for real-time use; requires initial tuning [30].	Very low computational load; simple calculation [28].
Key Advantages	Does not assume spatial artifact propagation; handles a broad frequency range [29].	Adapts dynamically; integrates multimodal data; suitable for online processing [30].	Simple, parameter-light; requires no auxiliary hardware [28].
Key Limitations	Less effective for very slow baseline shifts; kbWF is iterative [29].	Performance depends on accurate model and tuning [30].	Relies on a strong physiological assumption which may not always hold [28].

Experimental Data and Validation

To quantitatively compare the methods, studies often use real fNIRS data with simulated motion artifacts and a known, added hemodynamic response. This allows for the calculation of objective metrics like Signal-to-Noise Ratio (SNR) improvement.

Table 2: Quantitative Performance Data from Experimental Validations

Study & Method	Key Performance Findings	Validation Methodology
Kurtosis-based Wavelet (kbWF) [29]	Yielded results with higher SNR than other existing methods (PCA, Spline, standard WF) across a wide range of signal and noise amplitudes.	Simulated functional HRFs added to real resting-state fNIRS recordings corrupted by movement artifacts.
Kalman Filter with tCCA [30]	Achieved a two-order-of-magnitude decrease in cardiac signal power and a sixfold increase in contrast-to-noise ratio vs. non-regressed signals.	Testing on a finger-tapping dataset for left/right classification; also used resting data augmented with synthetic HRFs.
CBSI [28]	Improved metrics compared to no correction; however, its performance was less consistent than wavelet filtering in recovering physiological hemodynamic responses.	Applied to real cognitive data (linguistic task) containing task-related, low-frequency artifacts. Metrics based on physiologically plausible HRF properties.
Wavelet Filtering [28]	Corrected artifacts in 93% of cases where they were present; deemed the most effective technique for the cognitive data tested.	Comparison of multiple techniques on real functional data using metrics like AUC and within-subject standard deviation of the HRF.

The Scientist's Toolkit

Table 3: Essential Research Reagents and Tools for fNIRS Motion Correction Research

Tool / Reagent	Function in Research
Homer2 Software Package [2]	A standard fNIRS processing toolbox used for implementing and testing various motion correction algorithms, including conversion of raw intensity to optical density.
MATLAB [2] [31]	The primary programming environment used for developing custom motion artifact correction algorithms, signal processing, and data analysis.
Accelerometer [10] [30]	Auxiliary hardware integrated into the fNIRS cap to provide a reference signal correlated with head motion, used for regression in methods like Kalman filtering.
Short-Separation Channels [30]	fNIRS detectors placed very close (~8 mm) to the source, sensitive primarily to extracerebral physiology and scalp hemodynamics, used as nuisance regressors.
Infrared Thermography (IRT) Camera [31] [32]	A contactless method for tracking optode movement via video, providing a reference signal for artifact correction without adding physical hardware to the subject.
Synthetic Hemodynamic Response (HRF) [29] [30]	A simulated brain activation signal added to resting-state data, enabling quantitative validation of correction algorithms by comparing the recovered signal to the known ground truth.

Workflow and Signaling Pathways

The following diagram illustrates the logical workflow and decision-making process for selecting and applying these motion artifact correction methods, based on experimental objectives and constraints.

Figure 1: Decision workflow for selecting motion artifact correction methods

The selection of an optimal motion artifact correction method depends heavily on the specific research context. Wavelet Transform, particularly the kurtosis-based variant, has been consistently validated as a powerful and robust choice for offline analysis, demonstrating superior performance in handling challenging, task-correlated artifacts [28] [29]. Kalman Filtering is the premier option for real-time applications such as BCIs, especially when enhanced with multimodal regressors to account for physiological and motion confounds [30]. CBSI serves as a useful, parameter-light tool for quick preliminary analysis or in situations where computational resources are severely limited, though researchers should be cautious of its underlying physiological assumptions [28]. Ultimately, these algorithmic corrections are an essential component of the fNIRS processing pipeline, ensuring that the interpreted brain signals reflect true cortical activity rather than movement-induced noise.

Motion artifacts (MAs) remain a significant challenge in functional near-infrared spectroscopy (fNIRS), often compromising data quality and interpretation. While numerous individual correction algorithms exist, each possesses inherent strengths and weaknesses; no single method addresses the full spectrum of artifact types effectively. This limitation has catalyzed the development of advanced hybrid approaches that strategically combine multiple algorithms to achieve superior correction performance. By integrating complementary techniques, these methods target diverse artifact characteristics—from high-frequency spikes to slow baseline shifts—offering a more robust solution for cleaning fNIRS signals in real-world research scenarios. This guide compares the performance, experimental protocols, and practical implementation of these emerging hybrid methodologies.

The Rationale for Hybrid Correction Methods

Motion artifacts in fNIRS are heterogeneous, manifesting as sudden spikes, sustained oscillations, and baseline shifts of varying durations and amplitudes [6] [20]. This diversity stems from different types of head movements, such as nodding, shaking, or tilting, which cause optode displacement and disrupt scalp coupling [6] [5]. The frequency and amplitude characteristics of these artifacts often overlap with the hemodynamic response function (HRF), making simple filtering ineffective [14].

Single correction algorithms excel in specific niches but struggle with others. For instance, wavelet-based methods effectively handle high-frequency spikes and slight oscillations but perform poorly against baseline shifts [20] [33]. Conversely, spline interpolation effectively corrects baseline shifts and severe oscillations but cannot address high-frequency spikes [20]. This complementary efficacy provides the foundational rationale for hybrid approaches, which sequentially apply specialized algorithms to target different artifact categories within the same signal [20] [14].

Prominent Hybrid Approaches and Methodologies

WCBSI: Wavelet and Correlation-Based Signal Improvement

The WCBSI algorithm integrates wavelet filtering with correlation-based signal improvement (CBSI) in a sequential pipeline [14]. Wavelet filtering first decomposes the signal using a discrete wavelet transform, identifies and thresholds coefficients contaminated by motion artifacts, then reconstructs a partially cleaned signal. The CBSI component subsequently exploits the physiological principle that HbO and HbR concentrations are typically negatively correlated during genuine brain activity, whereas motion artifacts often induce positive correlations [14] [33]. The combined approach leverages wavelet's strength in removing spike-like artifacts while using CBSI to address residual artifacts and enhance the anti-correlation between HbO and HbR signals.

Hybrid Spline-Wavelet and CBSI Approach

Another sophisticated framework employs a categorization-driven strategy [20]. Artifacts are first detected and classified into three distinct categories:

Severe oscillations corrected using cubic spline interpolation
Baseline shifts removed via spline interpolation
Slight oscillations reduced through a dual-threshold wavelet-based method

This method intelligently applies the most suitable algorithm to each artifact type, preventing the limitations of one method from compromising overall correction efficacy. The workflow ensures that spline interpolation handles slow drifts and baseline shifts, while wavelet filtering targets high-frequency components, followed by correlation-based methods to refine the final output.

Table 1: Comparison of Key Hybrid Motion Artifact Correction Approaches

Method Name	Component Algorithms	Targeted Artifacts	Key Advantages
WCBSI [14]	Wavelet Filtering + Correlation-Based Signal Improvement	Spikes, slight oscillations, correlated artifacts in HbO/HbR	Fully automated; enhances negative HbO/HbR correlation; handles multiple artifact types simultaneously
Hybrid Spline-Wavelet & CBSI [20]	Spline Interpolation + Wavelet Filtering + CBSI	Severe oscillations, baseline shifts, slight oscillations	Category-specific correction; robust against diverse artifact types; improves signal stability
Spline & Wavelet Combination [2]	Spline Interpolation + Wavelet Filtering	Baseline shifts, spikes	Leverages spline for slow drifts and wavelet for fast spikes; proven effective in pediatric data

Experimental Performance and Comparative Data

Rigorous validation studies demonstrate that hybrid methods consistently outperform individual correction algorithms across multiple metrics. In a comprehensive comparison evaluating eight different techniques, the WCBSI approach was the only one to exceed average performance across all quality measures, with a 78.8% probability of being ranked as the best-performing algorithm [14].

Table 2: Quantitative Performance Comparison of Motion Artifact Correction Methods

Correction Method	Signal-to-Noise Ratio (SNR) Improvement	Pearson Correlation Coefficient (R)	Root Mean Square Error (RMSE)	Mean Absolute Percentage Error (MAPE)	ΔAUC
WCBSI [14]	Significant	High (superior)	Low (superior)	Low (superior)	Minimal
Spline Interpolation [20] [2]	Moderate	Moderate	Moderate	Moderate	Moderate
Wavelet Filtering [2] [33]	Moderate-High	Moderate-High	Moderate	Moderate	Low
TDDR [33]	High	High	Low	Low	Minimal
CBSI Alone [14]	Moderate	Moderate	Moderate	Moderate	Moderate
PCA/tPCA [14]	Low-Moderate	Low-Moderate	High	High	Significant

When applied to fNIRS data acquired during whole-night sleep monitoring, the hybrid spline-wavelet-CBSI approach showed significant improvements in both SNR and Pearson's correlation coefficient (R) with strong stability compared to individual methods [20]. Similarly, in functional connectivity analysis, hybrid methods incorporating wavelet filtering demonstrated superior denoising capability and enhanced recovery of original connectivity patterns [33].

Experimental Protocols and Implementation

Workflow for Hybrid Correction Methods

The generalized workflow for implementing hybrid correction approaches involves sequential processing stages that leverage the strengths of each component algorithm. The following diagram illustrates this pipeline:

Implementation Protocols

WCBSI Protocol [14]:

Data Preparation: Convert raw light intensity to optical density using the modified Beer-Lambert law
Wavelet Processing:
- Apply discrete wavelet transform (e.g., using Daubechies wavelet)
- Identify artifact-contaminated coefficients using probability distribution analysis
- Apply thresholding to contaminated coefficients
- Reconstruct signal using inverse wavelet transform
CBSI Processing:
- Calculate the standard deviation ratio: α = std(HbO)/std(HbR)
- Compute corrected signals: HbO_corrected = (HbO - α·HbR)/(1 + α²)
- Derive HbRcorrected from HbOcorrected using the negative correlation principle
Output: Final corrected HbO and HbR time series

Hybrid Spline-Wavelet-CBSI Protocol [20]:

Artifact Detection:
- Compute moving standard deviation (MSD) with a sliding window (e.g., W = 2k+1, where k = 3×Fs)
- Identify artifact segments using adaptive thresholding based on sorted MSD
Artifact Categorization:
- Classify artifacts as baseline shifts, severe oscillations, or slight oscillations based on amplitude and duration characteristics
Category-Specific Correction:
- Apply cubic spline interpolation to severe oscillations and baseline shifts
- Use dual-threshold wavelet method for slight oscillations
Final Enhancement: Apply CBSI to further improve HbO-HbR correlation

Table 3: Essential Research Tools for Implementing Hybrid Motion Artifact Correction

Tool/Resource	Type	Function	Implementation Platform
HOMER2/HOMER3 [2] [14]	Software Toolbox	Provides implemented algorithms for spline, wavelet, CBSI, PCA, and tPCA	MATLAB
Moving Standard Deviation (MSD) [20]	Detection Algorithm	Identifies motion-contaminated segments based on signal variability	Custom implementation in MATLAB or Python
Discrete Wavelet Transform [20] [14]	Processing Algorithm	Decomposes signals for artifact identification and removal	MATLAB Wavelet Toolbox, PyWavelets
Cubic Spline Interpolation [20]	Correction Algorithm	Models and subtracts low-frequency baseline shifts and severe oscillations	Various programming languages
Accelerometer/IMU Data [6] [5]	Hardware Supplement	Provides ground-truth movement information for validation	Integrated with fNIRS systems
Computer Vision Systems [5]	Validation Tool	Tracks head movements and orientation changes for artifact characterization	External camera systems with deep learning (e.g., SynergyNet)

Hybrid motion artifact correction approaches represent a significant advancement in fNIRS signal processing, offering researchers powerful tools to enhance data quality in movement-prone experimental paradigms. By strategically combining complementary algorithms like wavelet filtering, spline interpolation, and correlation-based methods, these techniques address the fundamental challenge of artifact diversity more effectively than any single method. The experimental evidence consistently demonstrates superior performance across multiple metrics, including SNR improvement, correlation with ground truth, and error reduction. While implementation complexity increases with hybrid methods, the substantial gains in signal fidelity justify their adoption, particularly in challenging research contexts involving pediatric populations, clinical patients, or naturalistic study designs. As the field progresses, further refinement of these hybrid frameworks—potentially incorporating deep learning elements and real-time processing capabilities—will continue to enhance their utility and accessibility for the research community.

In functional near-infrared spectroscopy (fNIRS) research, the robust estimation of evoked brain activity is critically dependent on the effective reduction of nuisance signals originating from systemic physiology and motion. The current best practice for addressing this challenge incorporates short-separation (SS) fNIRS measurements as regressors in a General Linear Model (GLM). However, this approach fails to fully address several challenging signal characteristics, including non-instantaneous and non-constant coupling, and does not optimally exploit additional auxiliary signals [34]. The integration of auxiliary data represents a methodological frontier in fNIRS analysis, particularly for applications requiring single-trial analysis such as brain-computer interfaces (BCI) and neuroergonomics.

Building upon recent advancements in unsupervised multivariate analysis of fNIRS signals using Blind Source Separation (BSS) methods, researchers have developed an extension of the GLM that incorporates regularized temporally embedded Canonical Correlation Analysis (tCCA). This innovative approach allows flexible integration of any number of auxiliary modalities and signals, providing a sophisticated framework for physiological noise regression that significantly outperforms conventional methods [34]. The development of this methodology addresses a critical gap in fNIRS analysis, where confounder correction has historically remained limited to basic filtering or motion removal, especially when compared to the more robust artifact handling commonly implemented in electroencephalography (EEG) studies [35].

This article examines the role of GLM with tCCA in the broader context of motion artifact removal evaluation metrics for fNIRS research, providing a comprehensive comparison of its performance against established alternative techniques. By synthesizing evidence from multiple experimental studies, we aim to establish a reference framework for researchers seeking to optimize their fNIRS preprocessing pipelines, particularly for applications requiring high contrast-to-noise ratio in real-world environments.

Methodological Framework

Theoretical Foundations of GLM with tCCA

The GLM with temporally embedded Canonical Correlation Analysis represents a significant evolution in fNIRS noise regression methodology. At its core, this approach combines the well-established theoretical framework of the General Linear Model—widely used in neuroimaging for its ability to statistically model hemodynamic responses—with the multivariate correlation analysis capabilities of CCA. The temporal embedding aspect addresses a key limitation of previous methods by accounting for non-instantaneous and non-constant coupling between physiological nuisance signals and brain activity [34].

The mathematical foundation of this method involves creating optimal nuisance regressors through canonical correlation analysis between the fNIRS signals and available auxiliary measurements. Unlike conventional GLM with short-separation regression, which assumes a fixed relationship between SS channels and long-separation measurements, tCCA adaptively determines the optimal combination of auxiliary signals to remove physiological noise while preserving neuronal activity. This is particularly valuable given the complex nature of physiological noise in fNIRS, which includes cardiac, respiratory, blood pressure oscillations, and motion artifacts that manifest with distinct temporal, spatial, and amplitude characteristics [35] [36].

The procedure involves several key steps: temporal embedding of both fNIRS and auxiliary signals to capture delayed relationships, computation of canonical components that maximize correlation between signal sets, regularized selection of relevant components, and finally construction of nuisance regressors that are incorporated into the GLM framework. This integrated approach simultaneously estimates evoked hemodynamic responses while filtering confounding signals, resulting in significantly improved contrast-to-noise ratio for single-trial analysis [34] [36].

Experimental Protocols for Method Evaluation

The evaluation of GLM with tCCA against alternative methods follows rigorous experimental protocols designed to quantify performance improvements under controlled conditions. Most studies employ a combination of simulated ground truth data and real experimental measurements to comprehensively assess method performance [34] [37].

In typical simulation protocols, resting-state fNIRS data is augmented with synthetic hemodynamic response functions (HRFs) at known intervals, creating a ground truth benchmark. The added HRFs are typically spaced by random intervals with a mean of 21 seconds and standard deviation of 3 seconds, providing multiple repetitions for statistical analysis [37]. Performance is then quantified using metrics such as Pearson's correlation coefficient between recovered and ground truth HRFs, root mean square error (RMSE), F-score, and p-value significance testing [34].

For real data validation, researchers often employ paradigms that elicit known physiological artifacts, such as language tasks involving vocalization that produce jaw movement artifacts, or motor tasks with controlled head movements. These artifacts are particularly challenging as they often correlate temporally with the expected hemodynamic response [38] [3]. Complementary hardware including accelerometers, inertial measurement units (IMU), or short-separation channels are frequently used to provide auxiliary signals for methods like tCCA that can incorporate multiple data sources [6] [14].

Table 1: Standard Performance Metrics for fNIRS Noise Correction Evaluation

Metric	Calculation	Interpretation	Optimal Value
Correlation (R)	Pearson's R between recovered and ground truth HRF	Similarity in shape and timing	Closer to +1
Root Mean Square Error (RMSE)	√[Σ(estimated - actual)²/n]	Magnitude of estimation error	Closer to 0
F-Score	Harmonic mean of precision and recall	Balance of true positive rate and false discovery rate	Higher values
Contrast-to-Noise Ratio (CNR)	Signal amplitude relative to noise floor	Detectability of evoked responses	Higher values
Power Spectral Density	Frequency distribution of residual noise	Effectiveness of physiological noise removal	Reduced in cardiac/respiratory bands

Comparative Performance Analysis

Quantitative Comparison Against Conventional Methods

The performance advantages of GLM with tCCA become evident when examining quantitative metrics from controlled studies. When compared to conventional GLM with short-separation regression, the tCCA extension demonstrates statistically significant improvements across all standard metrics for both oxygenated hemoglobin (HbO) and deoxygenated hemoglobin (HbR) recovery [34].

For HbO signals, the method achieves a maximum improvement of +45% in correlation with the ground truth HRF while simultaneously reducing root mean square error by up to 55%. The F-score, which balances precision and recall in activation detection, shows particularly dramatic improvement with increases up to 3.25-fold compared to conventional approaches. These improvements are especially pronounced in challenging low-contrast scenarios and when few stimuli or trials are available, making the method particularly valuable for pediatric studies or clinical populations where data collection opportunities are limited [34] [2].

In time-domain fNIRS (TD-fNIRS), which offers improved sensitivity to brain hemodynamics through time-gating of photon time-of-flight, the GLM framework adapted for temporal moment data shows similar advantages. Properly covariance-scaled TD moment techniques incorporating GLM demonstrate 98% and 48% improvement in HRF recovery correlation for HbO and HbR respectively compared to continuous wave (CW) GLM, with corresponding decreases of 56% and 52% in RMSE [37].

Table 2: Performance Comparison of fNIRS Noise Correction Methods

Method	HbO Correlation Improvement	HbO RMSE Reduction	Key Advantages	Limitations
GLM with tCCA	+45% (max)	-55% (max)	Flexible auxiliary signal integration, optimal nuisance regressors	Computational complexity, parameter sensitivity
Wavelet Filtering	Not quantified	Not quantified	Effective for spike artifacts, automated operation	Potential signal distortion, limited for slow drifts
TDDR	Not quantified	Not quantified	Effective for functional connectivity analysis	Limited validation across paradigms
CBSI	Not quantified	Not quantified	Simple calculation, preserves HbO-HbR anticorrelation	Assumes perfect negative correlation
GLM with SS	Baseline	Baseline	Established method, intuitive implementation	Limited efficacy for non-instantaneous coupling

Performance in Motion Artifact Correction

Motion artifacts represent one of the most significant challenges in fNIRS signal processing, particularly in real-world applications and with challenging populations such as children or clinical patients. The efficacy of GLM with tCCA must be understood within the broader context of motion correction techniques, which range from hardware-based approaches to purely algorithmic solutions [6].

Comparative studies have evaluated numerous motion artifact correction techniques using both simulated and real fNIRS data. The wavelet filtering method has consistently demonstrated strong performance, particularly for spike-type artifacts, with one study showing it reduces the area under the curve where artifact is present in 93% of cases [38] [3]. Similarly, temporal derivative distribution repair (TDDR) has emerged as a leading method, particularly for functional connectivity analysis where it demonstrates superior denoising ability and enhanced recovery of original FC patterns [33].

While direct comparisons between GLM with tCCA and these motion-specific techniques are limited in the literature, the fundamental advantage of the tCCA approach lies in its ability to integrate multiple auxiliary signals that may contain information about motion-related artifacts. For instance, when accelerometer data is available, it can be incorporated alongside short-separation channels and other physiological measurements to create comprehensive nuisance regressors that address both motion and physiological noise simultaneously [34] [6].

Recent innovations in motion correction include hybrid approaches such as WCBSI (wavelet and correlation-based signal improvement), which combines the spike detection capabilities of wavelet filtering with the hemodynamic fidelity of CBSI. In one comprehensive comparison, WCBSI was the only algorithm exceeding average performance across all metrics (p < 0.001) and had the highest probability (78.8%) of being the best-ranked algorithm [14]. This suggests potential for future methodologies that might integrate the adaptive multivariate capabilities of tCCA with the motion-specific strengths of these specialized techniques.

Implementation Considerations

Experimental Workflow and Signaling Pathways

The implementation of GLM with tCCA follows a structured workflow that systematically transforms raw fNIRS measurements into cleaned signals with optimized contrast-to-noise ratio. The process begins with the acquisition of multiple data streams, including conventional long-separation fNIRS channels, short-separation channels, and any available auxiliary signals such as accelerometer, EEG, or physiological monitoring data [34] [35].

The following diagram illustrates the complete experimental workflow for implementing GLM with tCCA in fNIRS studies:

The signaling pathways involved in fNIRS noise correction reveal the complex physiological interactions that methods like GLM with tCCA must address. The measured fNIRS signal represents a composite of numerous underlying components, including actual neurovascular coupling responses, systemic physiological interference (cardiac, respiratory, Mayer waves), motion artifacts from optode-tissue decoupling, and instrumental noise [35] [36].

The following diagram illustrates the key signaling pathways and components in a typical fNIRS measurement:

Parameter Selection and Optimization

The effective implementation of GLM with tCCA requires careful attention to parameter selection, as inappropriate choices can diminish performance benefits. Key parameters include the temporal embedding dimension, regularization strength for the CCA, selection of auxiliary signals to incorporate, and the specific form of the hemodynamic response function model [34].

Studies provide guidance for optimal parameter selection based on systematic evaluation. For temporal embedding, windows capturing the characteristic time scales of physiological noise (typically 1-2 seconds for cardiac, 3-6 seconds for respiratory, and 10-30 seconds for Mayer waves) have proven effective. Regularization parameters should be tuned to balance overfitting and underfitting, often through cross-validation procedures. The selection of auxiliary signals should prioritize those with known physiological relevance to the experimental context, with short-separation channels consistently demonstrating high value across studies [34] [37].

When implementing the method in cross-validation schemes for single-trial analysis, it is crucial to apply the GLM with tCCA independently within each fold rather than as a preprocessing step applied to the entire dataset before classification. Failure to maintain this separation can lead to overoptimistic performance estimates and overfitting, as information from the test set would leak into the training procedure [36].

The Scientist's Toolkit

Essential Research Reagents and Materials

Successful implementation of GLM with tCCA for fNIRS analysis requires access to specific hardware, software, and methodological resources. The following table details key solutions and their functions in the experimental workflow:

Table 3: Essential Research Reagents and Solutions for fNIRS with GLM+tCCA

Resource Category	Specific Examples	Function/Role	Implementation Notes
fNIRS Hardware	Kernel Flow TD-fNIRS, TechEN-CW6, ISS Imagent	Signal acquisition with multiple source-detector separations	TD-fNIRS offers enhanced depth sensitivity [37]
Auxiliary Sensors	Accelerometers, IMU, EEG systems, physiological monitors	Provide supplementary signals for noise regression	Critical for motion artifact correction [6] [14]
Software Platforms	HOMER2, HOMER3, fNIRSDAT, custom MATLAB toolboxes	Implement preprocessing and GLM analysis	HOMER3 supports multiple MA correction algorithms [14]
Analysis Methods	tCCA, Wavelet filtering, TDDR, CBSI	Noise regression and signal enhancement	Method selection depends on artifact type [33]
Validation Tools	Synthetic HRF augmentation, experimental ground truth paradigms	Performance quantification and method validation	Essential for objective comparison [34] [37]

The integration of auxiliary data through GLM with temporally embedded Canonical Correlation Analysis represents a significant advancement in fNIRS signal processing, offering statistically superior performance compared to conventional approaches like GLM with short-separation regression alone. The method's flexibility in incorporating diverse auxiliary signals, adaptive optimization of nuisance regressors, and ability to address challenging signal characteristics make it particularly valuable for real-world applications where high contrast-to-noise ratio is essential.

When evaluated against the broader landscape of motion artifact correction techniques, the tCCA approach complements rather than replaces specialized methods like wavelet filtering or TDDR. Each technique demonstrates particular strengths depending on the artifact characteristics, signal quality, and experimental objectives. For spike-type motion artifacts, wavelet methods remain highly effective, while for functional connectivity analysis, TDDR shows particular promise. The GLM with tCCA framework excels in comprehensive physiological noise regression, especially when multiple auxiliary signals are available.

Future methodological development will likely focus on hybrid approaches that combine the multivariate adaptive capabilities of tCCA with the specific strengths of motion-focused correction algorithms. Additionally, as fNIRS continues to expand into real-world applications including neuroergonomics, clinical monitoring, and brain-computer interfaces, the efficient implementation of these advanced methods will become increasingly important. The GLM with tCCA represents a powerful tool in this evolving landscape, providing researchers with a mathematically robust framework for enhancing signal quality and reliability in fNIRS studies.

Selecting and Applying Evaluation Metrics for Optimal fNIRS Preprocessing

In functional near-infrared spectroscopy (fNIRS) research, motion artifacts present a significant challenge that can severely compromise data quality and interpretation. These artifacts, induced by subjects' movements including head motion, facial muscle activity, or even jaw movements during speech, introduce substantial noise into hemodynamic measurements [6] [3]. The significant deterioration in measurement caused by motion artifacts has become an essential research topic for fNIRS applications, particularly as the technology expands into naturalistic settings and challenging populations where movement is inevitable [6] [39].

The evaluation of motion artifact removal techniques demands robust, quantitative metrics that can objectively assess performance. Among these metrics, Signal-to-Noise Ratio (SNR) and Contrast-to-Noise Ratio (CNR) have emerged as fundamental tools for quantifying the effectiveness of noise suppression methods [40] [41]. These metrics provide critical insights into different aspects of signal quality: SNR measures the overall strength of a signal relative to background noise, while CNR specifically quantifies how well a signal of interest can be distinguished from its surrounding background [41] [42]. Understanding the proper application, calculation, and interpretation of these metrics is essential for researchers developing and comparing artifact correction methods in fNIRS studies.

Theoretical Foundations: SNR and CNR Defined

Signal-to-Noise Ratio (SNR)

SNR is a fundamental metric in measurement systems that quantifies how much a signal of interest stands above the ever-present background noise. In its most basic form for a Poisson-distributed signal, SNR is defined as the ratio of the signal strength to the standard deviation of the noise [41]:

SNR = S/σ = N/√N = √N

Where S represents the signal, σ represents the noise in terms of standard deviation, and N is the number of detected photons or measurements in quantum-limited systems. This relationship highlights that SNR improves with increasing signal strength, following a square root relationship for photon-limited measurements [41].

In fNIRS applications, SNR calculations typically involve defining specific regions of interest (ROIs) in both signal and background areas. The practical implementation involves measuring the mean signal intensity in a target region and dividing it by the standard deviation of a background region [40] [42]. This approach allows researchers to quantify the overall quality of fNIRS measurements and compare the performance of different systems and processing techniques.

Contrast-to-Noise Ratio (CNR)

While SNR measures overall signal quality, CNR specifically addresses the ability to distinguish between different regions or components within a signal. This distinction is particularly relevant in fNIRS research, where the primary goal is often to detect task-evoked hemodynamic changes against background physiological activity [41] [23].

CNR is mathematically defined as the difference between signals from two regions divided by the overall noise level [41]:

CNR = (SA - SB)/σ

Where S_A and S_B represent signals from two different components or regions, and σ represents the noise voltage. In medical imaging applications including fNIRS, this is often modified to:

CNR = (SA - SB)/S_ref × N

Where S_ref is a fully recovered reference signal (often from water or another suitable reference), and N is the noise voltage [41]. For detecting lesions or functional activations, the CNR can be expressed as [41]:

CNRℓ = |Cℓ| × dℓ × √(Ro × t)

Where |C_ℓ| is the absolute contrast of the lesion or activation, d_ℓ is its diameter, R_o is the background counting rate, and t is the imaging time.

According to the Rose criterion, a CNR of 3-5 is typically required for reliable detection of features in noisy data, with the exact threshold depending on factors such as object size, edge sharpness, and viewing conditions [41].

Key Distinctions and Complementary Roles

SNR and CNR provide complementary information about signal quality, with each metric emphasizing different aspects important for fNIRS research:

SNR reflects the overall quality and reliability of the measurement, indicating how much confidence researchers can have in the detected signal regardless of its functional relevance.
CNR specifically quantifies the detectability of functionally relevant hemodynamic changes by comparing task-related signals to background physiological activity.

This distinction is crucial in fNIRS studies where strong background physiological noises (e.g., cardiac, respiratory, and blood pressure fluctuations) are always present and can obscure the much smaller task-evoked brain signals [23]. A processing technique might yield high SNR values while simultaneously reducing CNR if it suppresses both noise and the functional signal of interest, highlighting why both metrics must be considered together when evaluating artifact removal methods.

Quantitative Comparison of Motion Artifact Correction Techniques

Various motion artifact correction techniques have been developed for fNIRS, each with distinct impacts on SNR and CNR metrics. The table below summarizes the performance characteristics of major correction methods based on empirical comparisons:

Table 1: Performance Comparison of fNIRS Motion Artifact Correction Techniques

Method Category	Specific Techniques	Impact on SNR	Impact on CNR	Key Limitations
Hardware-Based Solutions	Accelerometer (ABAMAR, ABMARA), Headpost fixation, 3D motion capture [6]	High improvement with proper implementation	Moderate improvement	Requires additional equipment; may limit experimental paradigms
Wavelet-Based Methods	Wavelet filtering, Kurtosis-based wavelet [3] [43]	Significant improvement (up to 93% artifact reduction) [3]	High improvement for task-evoked responses	Complex implementation; parameter selection critical
Adaptive Filtering	RLS with exponential forgetting, Kalman filtering [43] [23]	77% improvement in HbO, 99% in HbR for channels with higher CNR [43]	Significant improvement	Computationally intensive; model-dependent
Component Analysis	PCA, tPCA, ICA [23]	Moderate improvement	Variable; may reduce signal of interest	Risk of removing physiological signals along with artifacts
Regression Methods	CBSI, SS channel regression [23]	Moderate improvement	Highest performance when using multiple SS channels [23]	Collinearity issues with task-related physiology
Hybrid Methods	Sequential layered pipelines [44]	Potentially optimal through staged approach	Potentially optimal through staged approach	Complex implementation and validation

The performance of these techniques varies significantly depending on the artifact type (spikes, baseline shifts, or low-frequency variations), amplitude, and temporal correlation with the hemodynamic response [3]. Techniques that effectively improve CNR are particularly valuable as they enhance the detectability of true functional activation amidst physiological noise.

Experimental Protocols for Method Evaluation

Semisynthetic Simulation Approach

To quantitatively evaluate motion correction techniques, researchers have developed robust experimental protocols that combine real physiological data with simulated brain activity:

Background Signal Acquisition: Record resting-state fNIRS data or data during a breath-hold task to capture realistic physiological noise (cardiac, respiratory, Mayer waves) without functional brain activation [23].
Synthetic HRF Addition: Add a known simulated hemodynamic response function (HRF) to the background data at precise timings, creating a ground truth for validation [3] [23].
Motion Artifact Introduction: Incorporate real motion artifacts from experimental data or simulate artifacts with characteristics matching common movement types (head motion, jaw movement, etc.) [3].
Correction Application: Apply various motion artifact correction techniques to the contaminated data.
Performance Quantification: Calculate performance metrics including SNR, CNR, mean-squared error (MSE), and Pearson's correlation coefficient (R²) between the recovered HRF and the original simulated HRF [3].

This approach enables objective comparison of different correction methods with known ground truth, overcoming the challenge of not knowing the true brain signal in experimental data.

Real Cognitive Task Validation

While simulations provide controlled comparisons, validation with real experimental data is essential:

Task Design: Implement cognitive tasks known to produce specific artifacts, such as language tasks requiring vocal responses that induce jaw movement artifacts temporally correlated with the hemodynamic response [3].
Data Collection: Acquire fNIRS data during task performance with simultaneous recording of potential artifact sources (accelerometers, short-separation channels) [6] [23].
Physiological Plausibility Assessment: Evaluate corrected signals based on physiological expectations including appropriate hemodynamic response shape, spatial localization, and contrast between hemoglobin species [3].

This combined approach of simulation-based quantification and real-data validation provides the most comprehensive assessment of motion correction techniques and their impact on SNR and CNR metrics.

Research Reagent Solutions: Essential Tools for fNIRS Motion Artifact Research

Table 2: Essential Materials and Tools for fNIRS Motion Artifact Research

Tool Category	Specific Examples	Function in Motion Artifact Research
Auxiliary Hardware	Accelerometers, IMUs, 3D motion capture systems, gyroscopes [6]	Provide independent measurement of motion for artifact detection and correction
Optical Configurations	Short-separation channels (0.5-1.0 cm), multi-distance arrangements [43] [23]	Enable separation of superficial (scalp) and deep (cerebral) components
Software Toolboxes	HOMER2, NIRS-KIT, fNIRS Processing Modules [6]	Provide standardized implementations of artifact correction algorithms for comparison
Physical Phantoms	Dynamic flow phantoms, 3D-printed anthropomorphic phantoms [40]	Enable controlled testing of artifact correction methods with known ground truth
Physiological Monitoring	Pulse oximeters, respiratory belts, blood pressure monitors [23]	Characterize physiological noise sources for improved modeling and removal
Data Standards	SNIRF file format, BIDS extension for fNIRS [45]	Facilitate reproducible research and method comparison across laboratories

These essential research tools enable comprehensive evaluation of motion artifact correction techniques and their impacts on SNR and CNR metrics. The selection of appropriate tools depends on the specific research goals, with hardware solutions providing direct motion measurement but potentially limiting experimental paradigms, while algorithmic approaches offer broader applicability but may require careful parameter optimization.

Methodological Workflows and Signaling Pathways

The relationship between motion artifact types, correction approaches, and evaluation metrics follows a logical pathway that can be visualized as follows:

Diagram 1: Motion Artifact Management Workflow

The process of calculating and applying CNR and SNR metrics to evaluate fNIRS signal quality follows this computational pathway:

Diagram 2: SNR and CNR Calculation Pipeline

The evaluation of motion artifact removal techniques in fNIRS research requires careful consideration of both SNR and CNR metrics to fully characterize method performance. While SNR provides information about overall signal quality, CNR specifically addresses the detectability of functionally relevant hemodynamic changes against background physiological activity—often the primary concern in fNIRS studies.

Current research indicates that correction techniques based on wavelet filtering, adaptive filtering with recursive least squares, and short-separation channel regression generally provide the most significant improvements in both SNR and CNR [3] [43] [23]. However, method performance is highly dependent on artifact characteristics, with different techniques excelling for different artifact types (spikes vs. slow drifts) and different experimental contexts.

The field would benefit from increased standardization in how SNR and CNR metrics are calculated and reported, as current variability in definitions and implementations complicates cross-study comparisons [40] [45]. Future work should focus on establishing consensus definitions for these metrics specific to fNIRS applications, developing comprehensive validation frameworks combining simulated and experimental data, and creating optimized processing pipelines that sequentially address different artifact types to maximize both SNR and CNR for improved functional brain monitoring.

In functional near-infrared spectroscopy (fNIRS) research, motion artifacts (MAs) represent a fundamental challenge, significantly limiting the reliability and interpretability of hemodynamic data. These artifacts, induced by subject movement causing optode displacement, manifest as signal spikes, baseline shifts, and slow drifts that often overlap with the frequency range of genuine hemodynamic responses [46] [6]. The development of numerous MA correction algorithms has created an urgent need for standardized, quantitative evaluation metrics to objectively compare their performance. Within this landscape, Pearson's Correlation Coefficient (R) and Root Mean Squared Error (RMSE) have emerged as two cornerstone validation metrics, providing complementary and critical insights into algorithm efficacy [14] [47]. This guide examines the application of these metrics in contemporary fNIRS research, providing a structured comparison of how they are used to validate artifact correction methods across diverse experimental paradigms.

Metric Definitions and Theoretical Foundations

Pearson's Correlation Coefficient (R)

Pearson's Correlation Coefficient (R) quantifies the strength and direction of a linear relationship between two signals. In the context of fNIRS algorithm validation, it measures how closely a processed or corrected signal aligns with a known reference or "ground truth" signal [47]. The mathematical definition is:

$$R = \frac{\sum{i=1}^{n}(xi - \bar{x})(yi - \bar{y})}{\sqrt{\sum{i=1}^{n}(xi - \bar{x})^2 \sum{i=1}^{n}(y_i - \bar{y})^2}}$$

where (xi) represents the ground truth signal, (yi) is the corrected signal, and (n) is the number of data points.

Interpretation: Values range from -1 to +1. An R value of +1 indicates perfect positive linear agreement, 0 indicates no linear relationship, and -1 indicates perfect negative linear agreement. In fNIRS validation, higher positive R values signify that the corrected signal's morphology and temporal dynamics more accurately reflect the expected hemodynamic response [47].

Root Mean Squared Error (RMSE)

Root Mean Squared Error (RMSE) measures the magnitude of the average squared difference between the estimated values and the actual value. It is a standard metric for assessing estimation error, giving a higher weight to large errors due to the squaring operation. The formula is:

$$RMSE = \sqrt{\frac{\sum{i=1}^{n}(xi - y_i)^2}{n}}$$

where (xi) is the ground truth value, (yi) is the corrected signal value, and (n) is the number of observations.

Interpretation: RMSE is always non-negative, with values closer to 0 indicating superior performance and lower residual error after artifact correction [47]. It is particularly sensitive to large, episodic errors, making it excellent for identifying algorithms that fail to correct major motion artifacts or that introduce significant new distortions.

Experimental Applications and Performance Benchmarking

The utility of R and RMSE is best demonstrated through their application in empirical studies comparing motion artifact correction algorithms. The table below synthesizes quantitative findings from key research that has employed both metrics for algorithm validation.

Table 1: Performance Comparison of fNIRS Motion Artifact Correction Algorithms Using R and RMSE

Study Context	Correction Algorithm	Pearson's R (Performance)	RMSE (Performance)	Key Findings
Neonatal Resting-State Data [47]	Proposed Adaptive Method	0.732 ± 0.155 (Best)	0.536 ± 0.339 (Best)	Significantly outperformed traditional methods (paired t-test, p<0.01) in correcting baseline shifts and spikes.
	Spline Interpolation	Lower	Higher	Effective for baseline shifts but left spike noise uncorrected.
	Wavelet Filtering (WAVE)	Lower	Higher	Effective for spikes but weak on baseline shifts.
	Correlation-Based Signal Improvement (CBSI)	Lower	Higher	Performance limited when HbO/HbR correlation assumption was violated.
Adult Head Movement Data [14]	Wavelet + CBSI (WCBSI)	Highest	Lowest	Ranked best overall; consistently favorable across all metrics including R and RMSE.
	Targeted PCA (tPCA)	Intermediate	Intermediate	Complex parameter tuning required.
	Spline-Savitzky–Golay	Intermediate	Intermediate	Moderate performance across different artifact types.

Experimental Protocols for Metric Calculation

The reliable computation of R and RMSE depends on rigorous experimental designs that establish a trustworthy ground truth for comparison. The following protocols are representative of high-quality research in the field:

Controlled Task with Induced Artifacts: Von Lühmann et al. (2023) measured brain activity in 20 participants performing a hand-tapping task to evoke a consistent hemodynamic response [14]. To evaluate correction algorithms, this task was performed under two conditions: once with minimal movement to establish a "ground truth" activation, and again while participants performed deliberate head movements at different levels of severity. This design allows the clean tapping response to serve as the reference signal ((xi)) against which the artifact-corrupted tapping response ((yi)), after processing by each algorithm, is compared using both R and RMSE [14].
Semi-Simulated Data with Visual Correction as Benchmark: In neonatal studies where a true ground truth is unavailable, Chen et al. (2024) used expert visual identification and manual correction of artifacts as their benchmark [47]. The performance of various automated algorithms was then assessed by calculating R and RMSE between their outputs and this expert-corrected signal. This approach is common in clinical populations where inducing artifacts is unethical or impractical.

The following diagram visualizes this multi-stage experimental workflow for validating motion artifact correction algorithms.

Successfully implementing R and RMSE validation requires a suite of methodological tools and software resources. The table below details key components of this research toolkit.

Table 2: Essential Research Tools for fNIRS Metric Validation

Tool Category	Specific Example	Function in Validation
Software & Toolboxes	HOMER3 [14]	A widely used MATLAB software environment that provides standardized implementations of major MA correction algorithms (e.g., PCA, Spline, CBSI, Wavelet), enabling fair comparison.
Data Acquisition	Accelerometers / Inertial Measurement Units (IMUs) [6]	Auxiliary hardware synchronized with fNIRS to provide objective, ground-truth movement data for precise artifact timing and characterization.
Computer Vision	Deep Neural Networks (e.g., SynergyNet) [5]	Provides markerless, ground-truth head orientation data from video recordings, useful for correlating specific movements (e.g., pitch, roll) with artifact morphology.
Experimental Paradigm	Controlled Motor Tasks (e.g., hand-tapping) [14]	Generates a robust and reproducible hemodynamic response that serves as the physiological "ground truth" signal for calculating R and RMSE.
Data Simulation	Synthetic Hemodynamic Response Function (HRF) [36]	Allows for the precise addition of known artifact types to a clean signal, creating an ideal benchmark for testing algorithm performance where real ground truth is unavailable.

Pearson's R and RMSE are not merely mathematical abstractions but are fundamental to advancing fNIRS methodology. Their complementary nature provides a more complete picture of algorithm performance than either metric alone: R ensures the corrected signal's temporal trajectory is physiologically plausible, while RMSE penalizes large residual errors that could lead to false positives or negatives [14] [47]. The consistent finding across studies is that hybrid correction approaches (e.g., WCBSI, Spline-Wavelet) tend to outperform single-method solutions, likely because they can address the diverse spectrum of motion artifact types [14] [22]. For researchers and drug development professionals, selecting an algorithm based on its validated performance across both metrics is critical for generating reliable, interpretable cortical data, especially in real-world applications where motion is unavoidable. Future work should continue to standardize the use of these metrics and explore their integration into a unified scoring system for fNIRS signal fidelity.

In functional near-infrared spectroscopy (fNIRS) research, the hemodynamic response function (HRF) serves as a fundamental physiological benchmark for evaluating the performance of motion artifact (MA) correction algorithms. The HRF describes the characteristic temporal pattern of cerebral blood flow changes in response to neural activity, typically featuring an initial increase in oxygenated hemoglobin (HbO) and a subsequent decrease in deoxygenated hemoglobin (HbR) [48] [49]. Unlike simple signal quality metrics that assess noise reduction, the HRF provides a biologically-grounded reference for determining whether motion correction methods preserve the underlying neurovascular signals of interest. This guide objectively compares prominent MA removal techniques by examining their impact on HRF morphology, using both quantitative performance data and standardized experimental protocols to inform method selection for research and clinical applications.

The physiological plausibility of an estimated HRF is paramount because an effective motion correction technique must do more than merely suppress noise; it must retain the functional signatures of brain activity. As neural activation is inherently convolved with and temporally blurred by the HRF, accurately modeling HRF variability during deconvolution significantly improves neural activity recovery [48]. Techniques that distort the HRF shape, timing, or amplitude can lead to false positives or negatives in brain activation mapping, ultimately compromising the validity of neuroscientific findings and drug development research.

Comparative Analysis of Motion Artifact Correction Methods

Motion artifacts in fNIRS signals arise from imperfect contact between optodes and the scalp due to head movements, facial muscle activity, or body movements [6] [5]. These artifacts significantly deteriorate measurement quality by reducing the signal-to-noise ratio (SNR) and can manifest as rapid spikes or slow baseline shifts in the data [6]. Over decades, numerous correction approaches have been developed, broadly categorized into hardware-based and algorithmic solutions.

Table 1: Classification of Motion Artifact Removal Techniques in fNIRS

Method Category	Specific Techniques	Key Mechanism	Compatible Signal Types	Online Application Potential
Hardware-Based	Accelerometer-based methods (ANC, ABAMAR), Collodion-fixed fibers, 3D motion capture	Direct motion detection via auxiliary sensors	Any fNIRS signal type	Yes (for most methods)
Signal Processing-Based	GLM with tCCA [34], Wiener filtering, Kalman filtering, Correlation-based signal improvement (CBSI)	Statistical decomposition and noise regression	HbO and HbR signals	Limited (most are offline)
Hybrid Methods	BLISSA2RD [6], Multi-stage cascaded adaptive filtering	Combines hardware input with advanced signal processing	Any fNIRS signal type	Yes

Table 2: Quantitative Performance Comparison of Motion Correction Methods on HRF Metrics

Correction Method	HRF Correlation Improvement (HbO)	RMSE Reduction (HbO)	F-Score Enhancement	Key Strengths	Significant Limitations
GLM with tCCA [34]	Up to +45%	Up to -55%	Up to 3.25-fold	Optimal nuisance regressors; flexible auxiliary signal integration	Requires parameter tuning; computationally intensive
Accelerometer-Based Methods [6]	Moderate (15-25%)*	Moderate (20-30%)*	Moderate (1.5-2x)*	Real-time capability; direct motion measurement	Additional hardware required; limited spatial specificity
Conventional GLM with Short-Separation Regression [34]	+15-25%	-15-25%	1.8-2.2x	Standardized implementation; superficial noise removal	Suboptimal auxiliary signal exploitation; struggles with non-instantaneous coupling
Wiener Filtering [49]	+20-30%*	-20-30%*	2.0-2.5x*	Effective for known noise characteristics	Requires noise profile estimation; can oversmooth signal
BLISSA2RD [6]	+25-35%*	-25-35%*	2.2-2.8x*	Combines blind source separation with accelerometer data	Complex implementation; hardware dependency

Note: Values marked with an asterisk () are estimates based on literature reviews and comparative studies [6], whereas exact values are from specific validation studies [34].*

Key Insights from Comparative Analysis

The data reveal that advanced regression techniques like General Linear Model (GLM) with temporally embedded Canonical Correlation Analysis (tCCA) demonstrate superior performance in HRF recovery, particularly under challenging conditions with low contrast-to-noise ratios and limited numbers of stimuli [34]. This method flexibly integrates any available auxiliary signals into optimal nuisance regressors, effectively addressing limitations of conventional approaches in handling non-instantaneous and non-constant coupling between physiological noises and the signal of interest.

The hardware-based approaches provide reliable motion artifact detection and are particularly valuable for real-time applications, though they require additional equipment and may not fully capture the complex relationship between movement and signal artifacts [6] [5]. Recent research combining computer vision with ground-truth movement data has advanced our understanding of how specific head movements (e.g., upward and downward rotations) compromise fNIRS signal quality in particular brain regions [5].

Experimental Protocols for HRF-Based Method Validation

Ground-Truth Simulation Paradigm

Objective: To quantitatively evaluate motion correction performance using simulated fNIRS data with known HRF parameters and controlled motion artifacts.

Procedure:

Generate canonical HRF: Create the baseline hemodynamic response using a double-gamma function with parameters for response delay, undershoot delay, dispersion of response, and dispersion of undershoot [49].
Create experimental paradigm: Convolve the HRF with a block-based or event-related stimulus pattern to generate the ideal, noise-free fNIRS signal.
Introduce realistic artifacts: Add characterized motion artifacts (spikes, baseline shifts) derived from real motion data [5] at known timepoints.
Incorporate physiological noise: Include physiological confounds (cardiac pulsation ~1 Hz, respiratory rate ~0.2-0.3 Hz, Mayer waves ~0.1 Hz) with varying amplitudes and frequencies [49].
Apply correction algorithms: Process the contaminated signal with each target motion correction method.
Quantify HRF recovery: Calculate correlation coefficients, RMSE, and F-scores between the corrected HRF and the original simulated HRF [34].

Experimental Data Validation with Computer Vision

Objective: To validate motion correction methods using real fNIRS data with precisely quantified movement parameters.

Procedure:

Participant preparation: Fit participants with a whole-head fNIRS cap and position them for video recording during task performance.
Structured movement tasks: Instruct participants to perform controlled head movements along three rotational axes (pitch, roll, yaw) at varying speeds (slow, fast) and ranges (half, full, repeated rotations) [5].
Computer vision analysis: Process video recordings frame-by-frame using the SynergyNet deep neural network to compute precise head orientation angles [5].
Movement parameter extraction: Calculate maximal movement amplitude and speed from head orientation data.
Artifact identification: Detect spikes and baseline shifts in fNIRS signals synchronized with movement data.
Region-specific analysis: Assess motion artifact susceptibility across different brain regions (e.g., occipital regions are particularly vulnerable to upward/downward movements) [5].

Toeplitz Deconvolution for HRF Estimation

Objective: To estimate latent HRF and neural activity from motion-corrected fNIRS signals.

Procedure:

Signal preprocessing: Apply motion correction and bandpass filtering to isolate the frequency band of interest (typically 0.01-0.5 Hz).
Formulate Toeplitz matrix: Construct the design matrix H based on the stimulus paradigm.
Perform regularized deconvolution: Solve for the latent HRF or neural activity using Moore-Penrose pseudoinversion with Tikhonov regularization: x = (H^T H + λL^T L)^{-1} H^T y where λ is the regularization hyperparameter [48].
Address edge artifacts: Implement edge expansion before deconvolution followed by post-estimation trimming to remove Toeplitz edge artifacts [48].
Validate HRF morphology: Assess estimated HRFs for canonical shape characteristics (peak latency, undershoot, duration).

Table 3: Essential Tools and Resources for HRF-Focused fNIRS Research

Tool/Resource	Function/Purpose	Implementation Notes
HRfunc Tool [48]	Python-based tool for estimating local HRF distributions and neural activity from fNIRS via deconvolution	Stores HRFs in tree-hash table hybrid structure; enables collaborative HRF sharing
HRtree Database [48]	Collaborative HRF database capturing variability across brain regions, ages, and experimental contexts	Facilitates sharing of HRF estimates; enables meta-analysis of HRF characteristics
Computer Vision Systems [5]	Provides ground-truth movement data for motion artifact characterization using deep neural networks (e.g., SynergyNet)	Enables precise correlation between specific movements and artifact profiles
Accelerometer/IMU Sensors [6]	Hardware components for direct motion detection in wearable fNIRS systems	Critical for real-time motion correction approaches; provides complementary movement data
GLM with tCCA Implementation [34]	Advanced regression combining general linear model with temporally embedded canonical correlation analysis	Optimally combines auxiliary signals; significantly improves HRF recovery versus standard GLM
Short-Separation Detectors [34]	Specialized fNIRS channels with minimal source-detector distance (~8mm) to capture superficial signals	Enables separation of cerebral hemodynamics from extracortical physiological noises
Toeplitz Deconvolution Algorithm [48]	Mathematical approach for estimating underlying HRF and neural activity from convolved fNIRS signals	Employed with Moore-Penrose pseudoinversion and Tikhonov regularization for stability

The hemodynamic response function provides an indispensable physiological benchmark for evaluating motion artifact correction techniques in fNIRS research. Through systematic comparison, advanced multivariate methods like GLM with tCCA demonstrate superior performance in preserving HRF characteristics compared to conventional approaches [34]. The growing availability of collaborative resources such as the HRtree database and computer vision validation frameworks promises to further standardize evaluation protocols across the field [48] [5].

For researchers and drug development professionals, selecting appropriate motion correction methods requires careful consideration of experimental context, subject population, and analysis goals. Methods that optimize HRF recovery significantly enhance the detection of evoked brain activity, particularly in challenging scenarios with low contrast-to-noise ratios or limited trials [34]. As the field moves toward more naturalistic study designs and wearable fNIRS technology, maintaining physiological plausibility through HRF benchmarking will remain essential for generating valid, reproducible findings in cognitive neuroscience and clinical research.

Motion artifacts (MAs) represent a significant challenge in functional near-infrared spectroscopy (fNIRS) research, often compromising data quality and interpretation. Effectively evaluating motion artifact correction techniques requires careful selection of metrics aligned with specific experimental paradigms and research objectives. As fNIRS continues to gain prominence in neuroscience and clinical applications—from studying infant brain development to monitoring neurological patients—the need for standardized evaluation frameworks has become increasingly important [6] [3]. This guide provides a comprehensive overview of available metrics, their mathematical foundations, and practical considerations for selecting appropriate validation approaches based on your experimental needs.

The evaluation landscape for MA correction methods has evolved significantly, moving beyond simple qualitative assessment to sophisticated quantitative frameworks that capture different aspects of algorithm performance [44]. Researchers now have access to diverse metrics ranging from basic signal quality indicators to complex similarity measures that require ground-truth comparisons. Understanding the strengths, limitations, and appropriate applications of each metric is essential for robust method validation and meaningful comparison across studies.

Core Metrics for Evaluating Motion Artifact Correction

Quantitative Performance Metrics

Table 1: Core Metrics for Evaluating Motion Artifact Correction Performance

Metric Category	Specific Metric	Mathematical Definition	Interpretation	Best For
Signal Quality Metrics	Signal-to-Noise Ratio (SNR)	$SNR = \frac{\sigma{signal}^2}{\sigma{noise}^2}$	Higher values indicate better noise suppression	Overall signal quality assessment [22] [50]
	Area Under the Curve (AUC) Difference (ΔAUC)	$ΔAUC = \| AUC{corrected} - AUC{true} \|$	Smaller values indicate better preservation of hemodynamic response [3] [14]	Evaluating shape preservation in ground-truth paradigms
Similarity Metrics	Pearson Correlation Coefficient (R)	$R = \frac{\sum{i=1}^n (xi - \bar{x})(yi - \bar{y})}{\sqrt{\sum{i=1}^n (xi - \bar{x})^2} \sqrt{\sum{i=1}^n (y_i - \bar{y})^2}}$	Values closer to 1 indicate higher similarity to ground truth [3] [14]	Template-matching experiments with known hemodynamic responses
	Root Mean Square Error (RMSE)	$RMSE = \sqrt{\frac{1}{n} \sum{i=1}^n (yi - \hat{y}_i)^2}$	Smaller values indicate better accuracy	Ground-truth comparisons where true signal is known [14] [50]
	Mean Absolute Percentage Error (MAPE)	$MAPE = \frac{100\%}{n} \sum_{i=1}^n \left	\frac{yi - \hat{y}i}{y_i} \right	$	Lower values indicate better performance	Quantifying percentage error in amplitude estimation [14]

Practical Implementation of Metrics

Each metric captures distinct aspects of motion correction performance. SNR is particularly valuable for assessing the overall effectiveness of noise suppression, with studies reporting SNR improvements as a key indicator of algorithm performance [22] [50]. For example, a novel Hammerstein-Wiener approach demonstrated significant SNR increases compared to traditional methods, making this metric crucial for evaluating pure noise reduction capabilities [50].

Similarity metrics like Pearson correlation and RMSE are especially valuable in experimental paradigms incorporating ground-truth comparisons. These metrics were effectively employed in studies that designed experiments with known hemodynamic responses, allowing direct comparison between corrected signals and true activation patterns [14]. The area under the curve difference (ΔAUC) specifically quantifies how well the shape and amplitude of the hemodynamic response are preserved, which is critical for experiments aiming to accurately characterize brain activation timing and magnitude [3] [14].

Experimental Paradigms for Metric Validation

Ground-Truth Experimental Designs

Table 2: Experimental Paradigms for Validating Motion Correction Metrics

Paradigm Type	Experimental Design	Motion Induction Method	Advantages	Limitations
Known Hemodynamic Response	Participants perform tasks with and without head movements	Controlled head movements along rotational axes [5]	Provides direct ground-truth comparison [14] [50]	May not capture all real-world movement types
Semi-Simulated Data	Adding artificial artifacts to clean resting-state data [3]	Inserting simulated spikes, baseline shifts, and low-frequency variations	Full control over artifact type and timing [12]	Artificial artifacts may not fully represent real motion
Task-Embedded Artifacts	Cognitive or motor tasks with intentional movements	Speaking aloud, head turning, jaw movements [3]	Represents realistic artifact scenarios	Ground truth is estimated rather than known
Resting-State with Controlled Contamination	Adding real artifacts from highly contaminated datasets	Extracting MAs from patient data and adding to healthy controls [12]	Uses real artifact morphology	May introduce unknown physiological confounds

Implementation Considerations

The choice of experimental paradigm significantly influences which metrics are most appropriate. For ground-truth designs with known hemodynamic responses, similarity metrics like RMSE and Pearson correlation provide direct quantification of algorithm accuracy [14] [50]. These paradigms often involve participants performing simple motor or cognitive tasks (e.g., hand-tapping) both with and without head movements, creating ideal conditions for comparing corrected signals to uncontaminated responses.

For realistic scenarios where ground truth is unavailable, such as studies using task-embedded artifacts or semi-simulated data, signal quality metrics like SNR become more valuable [22]. These approaches allow researchers to quantify improvement even when the true underlying neural signal remains unknown. Recent research has demonstrated the effectiveness of combining multiple paradigms—using semi-simulated data for initial validation followed by real artifact experiments to confirm practical utility [12].

Metric Selection Framework

Figure 1: Decision framework for selecting appropriate evaluation metrics based on experimental paradigm and available data.

Application-Specific Guidance

Different research scenarios demand tailored metric selection strategies. For clinical populations where motion artifacts are frequent and ground truth is rarely available, SNR combined with visual inspection provides practical assessment of correction quality [2]. Studies with pediatric populations, for instance, have successfully employed SNR to validate methods when other metrics were infeasible due to the inherent challenges of testing children [2].

In resting-state functional connectivity studies, where the correlation structure between brain regions is of primary interest, researchers should prioritize metrics that preserve inter-channel relationships. Studies have shown that different correction approaches can significantly impact computed connectivity, making careful metric selection essential [12].

For method development and comparison, a comprehensive approach using multiple metrics is recommended. Recent studies evaluating novel correction techniques typically report 3-4 complementary metrics (e.g., SNR, RMSE, R, and ΔAUC) to provide a complete picture of algorithm performance across different dimensions [14] [50]. This multi-faceted assessment strategy helps researchers identify methods that excel in specific aspects of correction while potentially compromising others.

Software and Analytical Tools

Table 3: Essential Tools for fNIRS Motion Artifact Research

Tool Category	Specific Tool/Resource	Primary Function	Application in Metric Evaluation
Processing Software	Homer2/Homer3 [2] [12]	Comprehensive fNIRS processing	Implementation of various correction algorithms and metric calculation
	fNIRSDAT [2]	General linear model analysis	Statistical evaluation of corrected signals
	MATLAB System Identification Toolbox [50]	Nonlinear system identification	Advanced modeling for artifact correction
Data Resources	Openly shared datasets [14] [50]	Benchmark datasets with ground truth	Standardized evaluation across research groups
	Computer vision-analyzed movement data [5]	Ground-truth movement information	Validation of motion artifact characterization
Algorithmic Approaches	Wavelet-based methods [3] [22] [14]	Multi-scale artifact correction	Benchmark for noise suppression performance
	Hybrid methods (WCBSI) [14]	Combined correction approaches	Performance comparison for complex artifacts
	Hardware-assisted methods (IMU) [50]	Direct motion measurement	Reference for motion artifact detection

Implementation Guidelines

Successful implementation of these tools requires careful consideration of experimental parameters. When using software toolboxes like Homer2/3, researchers should document all parameter settings as these significantly impact metric values and complicate cross-study comparisons [2] [12]. For algorithm evaluation, establishing standardized processing pipelines ensures consistent metric calculation across different correction methods.

Openly available datasets with ground-truth information have become invaluable resources for metric validation [5] [14] [50]. These datasets enable researchers to benchmark new methods against established techniques using identical evaluation frameworks, promoting reproducibility and transparent comparison. When selecting algorithmic approaches for comparison studies, researchers should include both classical methods (e.g., spline interpolation, wavelet filtering) and recent innovations (e.g., WCBSI, Hammerstein-Wiener) to contextualize performance advances [14] [50].

Selecting appropriate metrics for evaluating motion artifact correction in fNIRS requires careful alignment with experimental paradigms, available ground truth, and research objectives. No single metric captures all aspects of correction quality, making multi-metric approaches essential for comprehensive method assessment. As the field advances toward standardized evaluation frameworks, researchers should prioritize transparency in metric selection, justification of chosen approaches based on experimental constraints, and validation across multiple datasets when possible. By applying the principles outlined in this guide, researchers can make informed decisions about metric selection that strengthen methodological rigor and facilitate meaningful comparisons across the growing landscape of motion artifact correction techniques.

Common Pitfalls in Metric Application and Strategies to Avoid Them

In functional near-infrared spectroscopy (fNIRS) research, accurate motion artifact (MA) removal is paramount for valid interpretation of neurovascular data. However, the evaluation of MA correction techniques itself is fraught with methodological challenges. The selection of inappropriate metrics can lead to misleading conclusions about algorithm performance, potentially undermining the integrity of neuroscientific findings. This guide examines the prevalent pitfalls in metric application for evaluating MA removal and provides evidence-based strategies for robust assessment, equipping researchers with the framework needed for critical evaluation of correction methodologies.

The Metric Selection Landscape: A Comparative Analysis

The evaluation of motion artifact correction methods relies on a diverse set of metrics, each with specific strengths, limitations, and appropriate contexts for application. The table below summarizes the key metrics, their primary uses, and inherent limitations.

Table 1: Key Metrics for Evaluating Motion Artifact Correction Performance

Metric	Primary Application	Key Limitations and Pitfalls
QC-FC Correlation	Measures residual motion influence in functional connectivity	Can produce misleading results; limited robustness as a standalone metric [51]
Network Modularity	Assesses quality of network organization after correction	Limited utility as a primary metric for artifact removal evaluation [51]
Test-Retest Reliability	Measures consistency across repeated scans	Identified as a more robust metric for evaluating artifact removal effectiveness [51]
Signal-to-Noise Ratio (SNR)	Quantifies noise suppression after correction	Performance varies based on artifact type; may not capture signal distortion [20] [52]
Mean Squared Error (MSE)	Measures deviation from ground truth	Requires known hemodynamic response; may not reflect physiological plausibility [3] [21]
Pearson's Correlation (R)	Assesses waveform similarity to ground truth	Sensitive to amplitude differences; assumes linear relationship [20] [52]
Contrast-to-Noise Ratio (CNR)	Evaluates task-related signal detectability	Dependent on experimental paradigm; may not generalize across studies [9]

Critical Pitfalls in Metric Application

Overreliance on Single-Metric Evaluation

A fundamental pitfall in MA correction evaluation is the dependence on a single metric, which provides an incomplete picture of algorithm performance. Research demonstrates that metrics popular in the literature have significant limitations when used alone. For instance, the QC-FC correlation, which measures the relationship between head motion and functional connectivity, shows limited robustness as a standalone metric for evaluating motion correction pipelines [51]. Similarly, network modularity quality has been identified as having limitations for evaluating artifact removal effectiveness [51].

Strategy for Mitigation: Implement a multi-metric framework that assesses different aspects of performance. A balanced approach should include:

Noise suppression metrics: SNR, MSE
Physiological plausibility metrics: Test-retest reliability, resting-state network recovery
Task-related metrics: CNR for block designs, HRF reconstruction accuracy for event-related designs

Studies that have successfully evaluated MA correction typically employ multiple metrics simultaneously. For example, robust evaluations combine metrics for noise suppression (SNR), waveform similarity (Pearson's correlation), and physiological plausibility [20] [52].

Ground Truth Assumptions in Physiological Data

Many common metrics assume the availability of a known ground truth signal, which is rarely available in real fNIRS experiments. This limitation particularly affects metrics like MSE and Pearson's correlation, which require a clean reference signal for comparison [3].

Strategy for Mitigation: Utilize semi-simulation approaches where a known hemodynamic response function (HRF) is added to real resting-state data containing motion artifacts. This method, successfully employed in multiple studies [3] [52], creates a controlled environment with known truth while maintaining realistic noise characteristics. The protocol involves:

Collecting resting-state fNIRS data with inherent motion artifacts
Adding a simulated HRF with known amplitude and timing parameters
Applying MA correction techniques
Comparing the recovered HRF to the known simulated response

This approach enables calculation of MSE, Pearson's correlation, and other ground truth-dependent metrics while maintaining ecological validity.

Ignoring Metric Interdependencies and Trade-offs

Different metrics often capture competing aspects of performance, and optimization of one metric may come at the expense of another. For instance, aggressive filtering might improve SNR while distorting the physiological signal of interest, leading to poor test-retest reliability [51].

Strategy for Mitigation: Conduct correlation analyses between metrics across multiple datasets to identify potential conflicts. Research indicates that test-retest reliability offers a more comprehensive assessment of correction quality compared to single-timepoint metrics [51]. When metric conflicts arise, prioritize metrics that align with your specific research goals:

For clinical applications: Emphasize test-retest reliability
For event-related designs: Prioritize HRF shape recovery
For functional connectivity: Focus on network integrity measures

Experimental Protocols for Robust Metric Validation

Test-Retest Reliability Assessment Protocol

Test-retest reliability has emerged as one of the most robust metrics for evaluating MA correction techniques, particularly because it doesn't require ground truth data and reflects real-world usage scenarios [51].

Table 2: Experimental Protocol for Test-Retest Reliability Assessment

Step	Procedure	Parameters	Output Metrics
Data Collection	Acquire fNIRS data from the same participants during multiple sessions	Session interval: 1 day to 2 weeks; consistent experimental conditions	Raw intensity/optical density data
Motion Corruption	Identify and characterize motion artifacts in all sessions	Use moving standard deviation threshold; categorize artifact type and amplitude	Artifact frequency, amplitude, duration statistics
Correction Application	Apply multiple MA correction pipelines to all sessions	Identical parameter settings across sessions; multiple algorithm classes	Processed hemoglobin concentration data
Reliability Calculation	Compute consistency of derived measures across sessions	Intraclass correlation coefficient (ICC); Pearson correlation between sessions	ICC values for HbO/HbR amplitudes; connectivity strength

Hybrid Metric Evaluation Framework

For comprehensive evaluation, we recommend a hybrid framework that combines multiple metric classes. This approach addresses the limitation of individual metrics and provides a more balanced assessment of correction techniques.

Table 3: Hybrid Evaluation Framework for MA Correction Methods

Metric Category	Specific Metrics	Data Requirements	Interpretation Guidelines
Noise Suppression	ΔSNR, MSE, residual artifact count	Pre- and post-correction data	Higher ΔSNR, lower MSE indicates better performance
Signal Fidelity	Pearson's R, HRF parameter recovery	Ground truth (simulated) data	R > 0.7 indicates good waveform preservation
Physiological Plausibility	Test-retest reliability, QC-FC correlation	Repeated measures or multi-channel data	ICC > 0.6 indicates acceptable reliability
Spatial Specificity	Contrast-to-noise ratio, activation topography	Multi-channel layout	Higher CNR indicates better task-related detection

Visualization of Experimental Workflows

The following diagram illustrates the comprehensive experimental workflow for validating motion artifact correction metrics, integrating both semi-simulation approaches and test-retest reliability assessments:

The Scientist's Toolkit: Essential Research Reagents and Solutions

Table 4: Essential Research Tools for Motion Artifact Metric Validation

Tool Category	Specific Solutions	Function in Metric Validation	Implementation Considerations
Data Simulation Platforms	HOMER2 [2], AR Model-based simulators [21]	Generate controlled datasets with known ground truth	Parameter selection critical for ecological validity
Motion Correction Algorithms	Wavelet, Spline, PCA, CBSI, Hybrid methods [20] [52]	Provide comparative baseline for new methods	Default parameters often need adjustment for specific data
Metric Calculation Packages	Custom MATLAB/Python scripts, fNIRSDAT [2]	Compute standardized metrics across studies	Ensure consistent implementation across research groups
Statistical Analysis Tools	ICC packages, ROC analysis utilities [52]	Quantify reliability and classification performance	Address multiple comparisons in multi-metric frameworks

The rigorous evaluation of motion artifact correction methods requires careful metric selection that acknowledges the limitations and interdependencies of different assessment approaches. By moving beyond single-metric evaluation and adopting the multi-faceted frameworks presented here, researchers can make more informed decisions about MA correction techniques, ultimately leading to more robust and reproducible fNIRS research. The field would benefit from continued development of standardized evaluation protocols and benchmark datasets to enable more direct comparison across studies and methods.

Benchmarking Performance: Comparative Validation of Correction Techniques

Functional near-infrared spectroscopy (fNIRS) has emerged as a prominent neuroimaging technology that uses near-infrared light to measure cortical concentration changes of oxygenated and deoxygenated hemoglobin associated with brain metabolism [53]. Unlike functional magnetic resonance imaging (fMRI), fNIRS offers a cost-effective, portable, and more motion-tolerant alternative for functional brain imaging, making it particularly valuable for studying diverse populations including infants, clinical patients, and children in naturalistic settings [3] [54]. However, fNIRS signals are notoriously susceptible to contamination by multiple noise sources, with motion artifacts representing the most significant challenge to data quality and interpretation [3] [6].

Motion artifacts arise from imperfect contact between optodes and the scalp during participant movement, causing signal distortions that can manifest as high-frequency spikes, baseline shifts, or low-frequency variations [3] [6]. These artifacts can completely mask underlying neural signals, leading to false positives or negatives in brain activation studies. The problem is particularly acute in pediatric and clinical populations where motion is frequent and trial numbers are often limited [2]. In response, researchers have developed numerous motion artifact correction techniques, including wavelet filtering, spline interpolation, principal component analysis, Kalman filtering, correlation-based signal improvement, and accelerometer-based methods [3] [6].

The proliferation of correction algorithms has created an urgent need for robust validation frameworks to objectively assess their performance. Without standardized evaluation methodologies, comparing techniques and selecting appropriate methods for specific research contexts becomes challenging. This guide systematically compares three established validation paradigms—simulations, resting-state data with synthetic hemodynamic responses, and experimental ground truth—providing researchers with structured approaches for rigorous motion correction algorithm evaluation.

Comparative Analysis of Validation Frameworks

Table 1: Comparison of Primary Validation Frameworks for fNIRS Motion Artifact Correction

Validation Framework	Key Characteristics	Primary Advantages	Inherent Limitations	Best Use Cases
Simulation-Based Approaches	Artificially generated signals with controlled noise profiles [3]	Complete ground truth knowledge; Perfect control over artifact type, timing, and amplitude [3]	Limited realism compared to actual experimental data; Difficulties replicating complex artifact characteristics [3]	Initial algorithm development; Controlled performance benchmarking; Parameter optimization
Resting-State with Synthetic Hemodynamic Responses	Real resting-state data with added synthetic hemodynamic response functions (HRFs) [53]	Realistic physiological noise and artifact content; Known ground truth HRF [53]	Synthetic responses may not capture full complexity of true neural activation; Potential interactions with underlying physiology [53]	Technique validation under realistic noise conditions; Performance comparison across methods
Experimental Ground Truth	Data from specially designed experiments with expected activation patterns [55]	Real neural activation with true hemodynamic responses; High ecological validity [55]	Indirect ground truth based on expected activation; Limited to paradigms with well-established responses [55]	Final validation stage; Clinical application testing; Protocol-specific evaluation

Table 2: Quantitative Performance Metrics for Motion Correction Techniques Across Validation Contexts

Motion Correction Technique	Simulation Performance (MSE Reduction)	Resting-State with Synthetic HRF (Detection Accuracy)	Experimental Ground Truth (Sensitivity/Specificity)	Computational Demand	Implementation Complexity
Wavelet Filtering	93% artifact reduction in simulations [3]	Superior recovery of synthetic HRF in resting-state data [3]	High sensitivity to true activations in cognitive tasks [3]	Medium	Medium
Moving Average	Effective for spike artifacts [2]	Good performance with pediatric data [2]	Reliable for child studies [2]	Low	Low
Spline Interpolation	Good for isolated artifacts [3]	Moderate synthetic HRF recovery [3]	Variable performance across artifact types [3]	Medium	Medium
Principal Component Analysis	Effective for global artifacts [3]	Dependent on component selection [3]	Can remove neural signal if not carefully implemented [3]	Medium-High	High
Accelerometer-Based Methods	Highly dependent on motion measurement quality [6] [56]	Excellent when precise motion tracking available [56]	Limited by hardware compatibility (e.g., MRI environments) [56]	Low-Medium	Medium

Simulation-Based Validation Frameworks

Methodological Approach

Simulation-based validation creates computationally generated fNIRS signals with precisely controlled motion artifacts, enabling exact knowledge of ground truth hemodynamic responses. The typical workflow involves generating a canonical hemodynamic response function (HRF) using gamma functions with standard time-to-peak values (e.g., 6 seconds) and total duration (e.g., 16.5 seconds) [53]. Researchers then superimpose these synthetic HRFs onto baseline signals and add simulated motion artifacts with specific characteristics.

Artifacts are typically categorized into four distinct types: Type A (spikes with standard deviation >50 from mean within 1 second), Type B (peaks with standard deviation >100 from mean lasting 1-5 seconds), Type C (gentle slopes with standard deviation >300 from mean over 5-30 seconds), and Type D (slow baseline shifts >30 seconds with standard deviation >500 from mean) [2]. This classification enables targeted testing of correction algorithms against specific artifact challenges. The simulated signals undergo motion correction processing, and algorithm performance is quantified by comparing the reconstructed HRF against the known ground truth using metrics like mean-squared error (MSE) and Pearson's correlation coefficient [3].

Experimental Protocols

A robust simulation protocol begins with establishing baseline optical intensity measurements resembling real fNIRS data, typically incorporating physiological noise components such as cardiac oscillations (∼1 Hz), respiratory variations (0.2-0.3 Hz), and Mayer waves (∼0.1 Hz) [3]. The synthetic HRF is generated using a gamma function with parameters like time-to-peak = 6s and full duration = 16.5s, with amplitudes typically representing 100%, 50%, and 20% of a typical task-evoked HRF amplitude to simulate varying contrast-to-noise conditions [53].

Motion artifacts are introduced based on the four-type classification system, with Type A spikes implemented as rapid, high-amplitude deviations; Type B as moderate sustained shifts; Type C as gradual baseline changes; and Type D as very slow drifts [2]. For comprehensive evaluation, artifacts should be added at varying time points relative to the HRF, including pre-stimulus, during the rising phase, at peak activation, and during recovery. Performance metrics including MSE, Pearson's correlation, and temporal derivative root mean square should be calculated between the known ground truth and corrected signals across multiple noise realizations to ensure statistical reliability [3].

Resting-State Data with Synthetic Hemodynamic Responses

Methodological Approach

The resting-state validation framework addresses simulation limitations by incorporating real physiological noise and artifact content from experimentally collected resting-state fNIRS data [53]. This approach preserves the complex statistical properties of actual fNIRS signals while maintaining the advantage of known ground truth through carefully added synthetic hemodynamic responses. The methodology involves collecting extended resting-state recordings from participants (typically 5-10 minutes) under controlled conditions where they focus on a fixation cross while fNIRS data is acquired [53].

These authentic resting-state datasets contain the full spectrum of physiological confounds including cardiac pulsation, respiratory oscillations, blood pressure variations (Mayer waves), and real motion artifacts, providing a biologically realistic noise background [53] [57]. Synthetic HRFs are then added to this resting-state data in the intensity domain at predetermined intervals and specific channels, creating a hybrid dataset with known activation timing and amplitude against a realistic noise background. This enables precise quantification of how effectively different motion correction techniques can recover the known signal in the presence of realistic confounding factors [53].

Experimental Protocols

The experimental protocol begins with collecting resting-state data from healthy participants seated comfortably while viewing a fixation cross. Data should include multimodal physiological recordings such as photoplethysmography (PPG) for cardiac monitoring, respiratory belt transducers for breathing patterns, and accelerometers for head movement tracking [53]. For comprehensive validation, datasets should include both short-separation (∼1 cm source-detector distance) and long-separation (∼3 cm) channels, as short-separation measurements specifically help separate superficial physiological confounds from cerebral signals [53] [57].

Synthetic HRFs are generated using gamma functions with parameters consistent with typical hemodynamic responses (time-to-peak = 6s, total duration = 16.5s) and added at random onset times within repeated windows (e.g., 20s windows) for a randomly selected half of all available long-separation channels [53]. The HRF amplitudes should vary (e.g., 100%, 50%, 20% of typical evoked response amplitude) to simulate different contrast-to-noise ratio conditions [53]. Performance evaluation should focus on the accuracy of recovered HRF shape, amplitude estimation precision, temporal specificity, and the false positive rate in channels without added synthetic responses.

Table 3: Research Reagent Solutions for fNIRS Validation Experiments

Research Reagent	Technical Specification	Primary Function in Validation	Implementation Considerations
CW-fNIRS Systems	Continuous wave technology with 690nm and 830nm wavelengths [53]	Measures hemodynamic changes via light absorption differences [53]	Limited to relative concentration changes; requires additional methods for absolute quantification
Short-Separation Channels	∼1 cm source-detector distance [53]	Measures superficial physiological noise for signal correction [53] [57]	Optimal distance 0.8-1.5cm; requires integration into probe design
Accelerometers	3-axis motion sensors (e.g., ADXL335) [53]	Directly measures head motion for artifact correction [53] [6]	MR-incompatibility issues in concurrent fMRI studies [56]
Auxiliary Physiological Monitors	PPG, respiratory transducers, blood pressure monitors [53]	Records systemic physiological fluctuations for noise modeling [53] [57]	Increases participant setup complexity but provides valuable noise regressors
Synthetic HRF Algorithms	Gamma functions with adjustable parameters [53]	Creates known ground truth signals for validation [53]	Enables controlled performance assessment with realistic noise

Experimental Ground Truth Validation

Methodological Approach

Experimental ground truth validation employs carefully designed task paradigms with well-established neural activation patterns to provide indirect validation of motion correction techniques [55]. While this approach doesn't offer the precise ground truth knowledge of simulations or synthetic HRF methods, it provides the advantage of evaluating algorithm performance with true neural activation patterns in ecologically valid contexts. The most common paradigms include cognitive tasks like verbal fluency tests, n-back working memory tasks, and motor tasks like finger-tapping, all of which produce robust, well-characterized hemodynamic responses in specific brain regions [55] [54].

In this framework, motion correction techniques are evaluated based on their ability to enhance the detection of expected activation patterns, improve contrast-to-noise ratios between task conditions, increase statistical significance of activation, and produce more physiologically plausible hemodynamic response shapes [55]. For clinical validation studies, additional criteria include improved separation between patient and control groups and enhanced correlation with clinical symptom severity [55].

Experimental Protocols

A comprehensive experimental ground truth protocol begins with selecting well-validated task paradigms with robust and reproducible activation patterns. For prefrontal cortex studies, the verbal fluency task (generating words beginning with a specific letter) reliably activates frontal and temporal regions, while n-back tasks probe working memory networks [55] [54]. For motor cortex validation, finger-tapping paradigms produce highly consistent activation in contralateral motor areas [58].

The participant population should be appropriately sized with statistical power considerations and include both healthy controls and when relevant, clinical populations to test algorithm performance across different noise characteristics [55]. For rigorous validation, the protocol should incorporate intentional motion conditions, such as asking participants to make specific head movements at designated times, to create realistic artifact challenges while preserving knowledge of when artifacts occurred [3]. Performance evaluation should assess both neural signal preservation (through expected activation effect sizes, HRF shape plausibility, and network connectivity patterns) and artifact reduction effectiveness (through timecourse quality metrics, trial-to-trial consistency, and reproducibility across sessions) [55] [54].

Integrated Validation Recommendations

Framework Selection Guidelines

Each validation framework offers distinct advantages that make it appropriate for specific stages of algorithm development and evaluation. Simulation-based approaches are ideal for initial algorithm development and parameter optimization due to their complete ground truth knowledge and flexibility [3]. Resting-state with synthetic HRFs provides the most effective methodology for direct comparison of multiple correction techniques under realistic noise conditions, offering an optimal balance between experimental control and biological realism [53]. Experimental ground truth validation represents an essential final step for assessing ecological validity and readiness for real research applications [55].

For comprehensive validation, researchers should employ a sequential approach beginning with simulations, progressing to resting-state with synthetic HRFs, and culminating with experimental ground truth testing. This multi-stage process ensures both technical efficacy and practical utility. When working with specialized populations such as children or clinical groups, it's particularly important to include validation data from those specific populations, as motion artifact characteristics and physiological noise profiles may differ substantially from healthy adults [2].

Emerging Trends and Future Directions

The field of fNIRS validation is rapidly evolving with several promising emerging methodologies. Multimodal integration, particularly simultaneous fNIRS-fMRI, provides powerful validation opportunities through cross-modal comparison, though this approach requires addressing technical challenges like MR compatibility of fNIRS components and temporal resolution mismatches [56]. Advanced analytical approaches including machine learning techniques are being increasingly applied to motion correction, creating new validation demands for these data-driven methods [56].

There is growing emphasis on standardized evaluation metrics and reporting practices to enhance reproducibility and comparability across studies [59]. The Society for Functional Near-Infrared Spectroscopy has developed comprehensive reporting guidelines to enhance the reliability, repeatability, and traceability of fNIRS studies [59]. Future validation frameworks will need to address new fNIRS applications including functional connectivity analysis, resting-state networks, and naturalistic paradigms, all of which present unique motion correction challenges that require specialized validation approaches [57].

Comparative Analysis of Technique Efficacy on Real Cognitive and Pediatric Data

Functional near-infrared spectroscopy (fNIRS) has emerged as a vital neuroimaging tool for studying brain function in real-world settings and across diverse populations, from infants to clinical patients. Unlike other neuroimaging modalities, fNIRS offers portability, tolerance to movement, and relatively low cost, making it particularly suitable for studying natural cognitive processes and pediatric populations [60]. However, a significant challenge compromising data quality in fNIRS research is the presence of motion artifacts (MAs)—signal disturbances caused by the movement of participants during data acquisition [3] [6].

Motion artifacts can manifest as high-frequency spikes, baseline shifts, or low-frequency variations that often correlate with the hemodynamic response, making them particularly difficult to distinguish from genuine brain activity [3] [28]. While numerous MA correction techniques have been developed, their relative efficacy varies considerably when applied to different populations and experimental paradigms. This comparative analysis examines the performance of various motion correction techniques applied specifically to real cognitive and pediatric fNIRS data, providing evidence-based recommendations for researchers and clinicians.

Motion Artifact Correction Techniques

Motion artifact correction methods can be broadly categorized into hardware-based and algorithmic approaches. Hardware-based solutions often involve additional sensors such as accelerometers, gyroscopes, or short-separation channels to detect and correct motion artifacts [2] [6]. While effective, these approaches increase experimental complexity and may not be feasible in all settings, particularly with pediatric populations [2].

Algorithmic approaches, which operate on the captured fNIRS signal without requiring additional hardware, include:

Wavelet Filtering: Decomposes signals into wavelet coefficients to identify and remove outliers caused by motion artifacts [3] [61]
Spline Interpolation: Models motion artifacts using cubic spline interpolation and subtracts them from the original signal [13] [61]
Principal Component Analysis (PCA): Identifies and removes components representing motion artifacts through orthogonal transformation [3] [61]
Moving Average (MA): Applies a simple smoothing filter to reduce high-frequency motion artifacts [2]
Correlation-Based Signal Improvement (CBSI): Leverages the negative correlation between HbO and HbR concentrations to remove motion artifacts [3] [61]
Kalman Filtering: Uses a recursive algorithm to estimate the true hemodynamic signal by filtering out motion artifacts [3] [28]
Learning-Based Approaches: Emerging techniques using artificial neural networks, convolutional neural networks, and denoising auto-encoders [9]

The following diagram illustrates the classification of these major correction techniques:

Comparative Analysis on Real Cognitive Data

Studies utilizing real cognitive data provide valuable insights into motion correction performance under ecologically valid conditions. Brigadoi et al. conducted a comprehensive comparison using fNIRS data from a color-naming task where participants spoke aloud, generating low-frequency, low-amplitude motion artifacts correlated with the hemodynamic response [3] [17] [28]. This paradigm is particularly challenging as the artifacts closely resemble genuine hemodynamic responses.

Table 1: Performance Comparison of Motion Correction Techniques on Real Cognitive Data

Technique	AUC_0-2 Improvement	AUC Ratio	Within-Subject SD	Key Findings
Wavelet Filtering	Significant improvement	Closest to ideal ratio (2.5-3.5)	Reduced variability	Most effective approach, corrected 93% of artifacts [3] [28]
Spline Interpolation	Moderate improvement	Variable	Moderate reduction	Performance depends on accurate artifact detection [3]
PCA	Moderate improvement	Variable	Moderate reduction	Effective when motion is principal variance source [3] [28]
Kalman Filtering	Limited improvement	Less optimal	Limited reduction	Less effective for low-frequency artifacts [3]
CBSI	Limited improvement	Less optimal	Limited reduction	Constrained by HbO-HbR correlation assumption [3] [61]
Trial Rejection	-	-	-	Not feasible with limited trials; reduces statistical power [3]

The superior performance of wavelet filtering in handling motion artifacts in cognitive tasks stems from its ability to localize artifacts in both time and frequency domains, effectively distinguishing them from true hemodynamic signals [3]. The study conclusively demonstrated that correcting motion artifacts is always preferable to trial rejection, as the latter approach significantly reduces the number of available trials and statistical power, particularly problematic in studies with limited trials or challenging populations [3] [28].

Comparative Analysis on Pediatric Data

Pediatric fNIRS data presents unique challenges due to more frequent and pronounced motion artifacts compared to adult data [2] [13]. Children and infants have shorter attention spans, cannot follow instructions as effectively, and make more spontaneous movements, resulting in different artifact profiles requiring specialized correction approaches.

Table 2: Performance Comparison of Motion Correction Techniques on Pediatric Data

Technique	Population	HRF Recovery	Trial Retention	Key Findings
Moving Average	Children (6-12 years)	Good	High	One of the best performers for pediatric data [2]
Wavelet Filtering	Children (6-12 years)	Good	High	Among most effective for pediatric data [2]
Spline + Wavelet	Infants (5, 7, 10 months)	Excellent	Highest (nearly all trials)	Best performance for infant data; optimal for low and high noise [13]
Spline Alone	Infants (5, 7, 10 months)	Moderate	Moderate	Less effective than combined approach [13]
Wavelet Alone	Infants (5, 7, 10 months)	Good	Good	Effective but enhanced by combination with spline [13]
Adaptive Spline + Gaussian	Neonates	Good (R=0.732)	High	Effective for baseline shifts, spikes, and serial disturbances [61]

The combination of spline interpolation and wavelet filtering has demonstrated particular efficacy for infant data, successfully addressing both baseline shifts (via spline) and spike artifacts (via wavelet) while preserving a maximum number of trials [13]. This is crucial in infant research where data collection opportunities are limited, and trial loss significantly impacts study power.

Emerging Learning-Based Approaches

Recent advances in machine learning have introduced novel approaches for motion artifact correction in fNIRS data. These methods learn the characteristics of both clean signals and motion artifacts from training data, potentially offering more adaptive and powerful correction capabilities [9].

Wavelet Regression Neural Networks: Combine wavelet decomposition with artificial neural networks, using an unbalance index from entropy cross-correlation of neighboring channels to identify contaminated optodes [9]
Denoising Auto-Encoders (DAE): Employ specialized deep learning networks with custom loss functions to remove motion artifacts while preserving hemodynamic signals, trained using synthetic fNIRS data generated through auto-regression models [9]
U-Net Convolutional Neural Networks: Utilize CNN architectures based on U-net to reconstruct hemodynamic responses while reducing motion artifacts, showing superior performance to wavelet decomposition and auto-regressive models in terms of mean squared error and HRF estimate variance [9]
Simplified Residual Fully Connected Neural Networks (sResFCNN): Combine neural networks with low-pass finite impulse response filters to effectively remove motion artifacts [9]

While learning-based approaches show promise, their current limitations include dependence on large training datasets and limited generalizability across different experimental paradigms and populations [9]. As these methods evolve, they hold potential for more automated and effective motion artifact correction.

Experimental Protocols and Methodologies

Cognitive Data Experimental Protocol

The foundational study by Brigadoi et al. utilized a color-naming task where participants verbally identified the color of displayed words [3] [28]. Key methodological elements included:

Participants: 22 university students (18 retained after quality screening)
fNIRS System: Multi-channel frequency-domain NIR spectrometer (ISS Imagent) with 32 laser diodes (16 at 690 nm, 16 at 830 nm) and 4 photo-multiplier tubes
Source-Detector Configuration: 3 cm separation, sampling at 7.8 Hz, positioned over frontal and premotor areas
Task Structure: 160 trials total (4 conditions × 40 trials), word presentation until vocal response (~850 ms), inter-stimulus interval of 10-12 seconds
Motion Artifact Characteristics: Low-frequency, low-amplitude artifacts caused by jaw movement during vocalization, temporally correlated with hemodynamic response

The following diagram illustrates the experimental workflow for comparative studies:

Pediatric Data Experimental Protocol

Studies evaluating motion correction techniques in pediatric populations have employed various approaches:

Multiple Dataset Validation: Di Lorenzo et al. used three independent datasets from infants of different ages (5, 7, and 10 months) with varying tasks (auditory, visual, tactile) and different NIRS systems [13]
Semi-Simulation Approach: Combination of real resting-state fNIRS data contaminated with actual motion artifacts with simulated hemodynamic response functions to establish ground truth [13]
Performance Metrics: Hemodynamic response recovery error, within-subject standard deviation, between-subjects standard deviation, and number of trials retained after correction [13]
Adaptive Thresholding: Novel approaches that estimate physiological oscillation thresholds for each channel to improve artifact detection in neonatal data [61]

The Researcher's Toolkit

Table 3: Essential Research Tools for fNIRS Motion Artifact Correction

Tool/Resource	Function	Application Context
Homer2 Software Package	Comprehensive fNIRS processing including MA correction algorithms	Standardized preprocessing and implementation of various correction techniques [2] [9]
Wavelet Filtering Implementation	Multi-resolution analysis for artifact identification and removal	Particularly effective for spike artifacts and low-frequency artifacts correlated with HRF [3] [13]
Spline Interpolation	Cubic spline modeling of baseline shifts and slow drifts	Ideal for correcting baseline shifts; often combined with wavelet approach [13] [61]
Accelerometer/IMU Sensors	Hardware-based motion detection and recording	Provides reference signal for adaptive filtering and artifact detection [6]
Semi-Simulation Validation	Combining real artifacts with simulated hemodynamic responses	Method validation with known ground truth [13] [61]
Moving Standard Deviation (MSD)	Statistical method for artifact detection	Identifies signal segments exceeding physiological oscillation thresholds [61]

This comparative analysis demonstrates that optimal motion artifact correction in fNIRS research depends significantly on the population studied and the nature of the experimental paradigm. For real cognitive data involving adults, wavelet filtering emerges as the most effective technique, particularly for challenging low-frequency artifacts correlated with the hemodynamic response [3] [28]. For pediatric populations, especially infants, the combined spline-wavelet approach provides superior performance, effectively addressing diverse artifact types while maximizing trial retention [13].

The field continues to evolve with promising learning-based approaches, though these require further validation across diverse datasets and populations [9]. Future research should focus on developing standardized evaluation metrics and validation frameworks to facilitate direct comparison of existing and emerging correction techniques [9] [39]. As fNIRS continues to grow as a neuroimaging tool, particularly for naturalistic studies and challenging populations, robust motion artifact correction remains essential for ensuring data quality and validity of neuroscientific findings.

Functional near-infrared spectroscopy (fNIRS) has emerged as a portable, non-invasive neuroimaging technology that measures brain activity by detecting hemodynamic changes in cerebral blood flow. While its advantages over other neuroimaging modalities include relative tolerance to motion, portability, and lower operational costs, the field faces significant challenges in standardizing performance assessment methodologies [62] [63]. The lack of community standards for applying machine learning to fNIRS data and the absence of open-source benchmarks have created a situation where published works often claim high generalization capabilities but with poor practices or missing details, making it difficult to evaluate model performance for brain-computer interface (BCI) applications [62]. This comparison guide provides a comprehensive performance benchmarking analysis of key fNIRS signal processing and classification methods, focusing specifically on motion artifact removal techniques and machine learning algorithms, to establish evidence-based best practices for researchers and practitioners.

Performance Comparison of fNIRS Machine Learning Algorithms

Classification Accuracy Across Algorithm Types

Table 1: Performance benchmarking of fNIRS machine learning algorithms across multiple studies

Algorithm Type	Specific Model	Reported Accuracy Range	Task Context	Notes/Limitations
Traditional ML	Linear Discriminant Analysis (LDA)	59.81% - 98.7% [62] [64]	Motor tasks, mental classification	Performance varies significantly based on features and task type
	Support Vector Machine (SVM)	59.81% - 77% [62] [64]	Mental arithmetic, motor tasks	Lower performance on complex tasks
	k-Nearest Neighbors (kNN)	~52.08% [62]	Mental workload levels	Lower performance on n-back tasks
Deep Learning	Artificial Neural Network (ANN)	63.0% - 89.35% [62]	Mental signing, motor tasks	Varies significantly by architecture
	Convolutional Neural Network (CNN)	85.16% - 92.68% [62] [65]	Hand-gripping, finger tapping	Higher performance on motor tasks
	Long Short-Term Memory (LSTM)	79.46% - 83.3% [62] [65]	Mental arithmetic, hand-gripping	Better for temporal patterns
	Bidirectional LSTM (Bi-LSTM)	81.88% [65]	Hand-gripping tasks	Moderate improvement over LSTM
Hybrid/Advanced	Stack Model (with DL features)	87.00% [65]	Hand-gripping motor activity	Combines multiple DL architectures
	FFT-Enhanced Stack Model	90.11% [65]	Hand-gripping motor activity	Highest performance in comparison
	CSP-Enhanced LDA	84.19% [64]	Hand motion and motor imagery	Significant improvement over base LDA
	CSP-Enhanced SVM	81.63% [64]	Hand motion and motor imagery	21.82% improvement over base SVM

The benchmarking data reveals several critical patterns. First, reported performance in literature shows extremely high variability, with some studies claiming accuracies above 98% while robust benchmarking frameworks report more modest results [62]. Second, without standardized benchmarking, claims of high classification accuracy (often suggesting technology readiness for industry transfer) may be overstated, as performance on unseen data remains challenging [62]. The BenchNIRS framework established that performance is typically lower than often reported and without dramatic differences between models when evaluated with robust methodology like nested cross-validation [62].

Feature Extraction and Dimensionality Reduction Methods

The common spatial pattern (CSP) algorithm has demonstrated significant improvements in classification performance when applied as a dimensionality reduction technique before classification. As shown in Table 1, CSP integration improved LDA classifier accuracy from 69% to 84.19% (15.19% increase) and SVM accuracy from 59.81% to 81.63% (21.82% increase) for hand motion and motor imagery tasks [64]. Additionally, statistical features including mean, variance, slope, skewness, and kurtosis have been successfully employed as input features, with mean and slope identified as the most discriminative features for motor tasks [64]. For deep learning approaches, stacking of frequency domain features extracted through Fast Fourier Transformation (FFT) has shown superior performance (90.11%) compared to conventional deep learning architectures [65].

Motion Artifact Removal: Performance Evaluation and Metrics

Comparative Performance of Motion Correction Algorithms

Table 2: Motion artifact correction algorithm performance comparison

Algorithm Category	Specific Methods	Performance Characteristics	Limitations/Requirements
Hardware-Based Solutions	Accelerometer-based (ANC, ABAMAR, ABMARA) [6]	Enables real-time rejection; improves feasibility for real-world applications	Requires additional hardware; increases setup complexity
	Collodion-fixed fibers [2]	Improved stability for problematic artifacts	Specialized equipment needed
	Head immobilization [6]	Reduces motion occurrence	Limits ecological validity
Software-Based Solutions	Moving Average (MA) [2]	Among best outcomes for pediatric data	Offline processing
	Wavelet-based methods [2]	Top performance on pediatric data; effective for various artifact types	Parameter sensitivity
	Spline Interpolation [6] [2]	Recommended for minimizing impact; preserves signal quality	Requires accurate artifact detection
	Targeted PCA (tPCA) [25]	Effective for children's data; better HRF recovery than wavelet/spline	Complex implementation
	Correlation-Based Signal Improvement (CBSI) [2]	Moderate performance	Limited artifact type coverage
	Principal Component Analysis (PCA) [2]	Variable performance depending on data type	May remove neural signals
Hybrid Approaches	Wavelet + MA combination [2]	Good performance on diverse artifacts	Multiple processing stages
	Short-separation channel regression [6]	Effective for superficial artifact removal	Requires specific optode setup

Motion artifacts remain the most significant noise source in fNIRS, particularly challenging for pediatric populations where data is typically noisier than adult data [2]. These artifacts originate from various movements including head motion (nodding, shaking, tilting), facial muscle movements (eyebrow raising), body movements, and even talking, eating, or drinking [6]. The direct cause is imperfect contact between optodes and the scalp, including displacement, non-orthogonal contact, and oscillation of the optodes [6].

Motion Artifact Evaluation Metrics and Methodologies

Comprehensive evaluation of motion artifact correction techniques requires multiple metrics to assess different aspects of performance. For noise suppression, researchers typically employ ΔSignal-to-Noise Ratio (SNR improvement), Mean Squared Error (MSE), and correlation coefficients with clean reference signals [6] [25]. For assessing signal distortion, common metrics include the ability to recover known hemodynamic response functions (HRF) and the degree of temporal distortion introduced [6]. Recent approaches have also incorporated computer vision techniques to establish ground-truth movement information, using frame-by-frame analysis of video recordings with deep neural networks like SynergyNet to compute head orientation angles, which are then correlated with artifact characteristics in fNIRS signals [5].

Experimental Protocols and Methodologies

Benchmarking Framework Design

The BenchNIRS framework employs a robust methodology with nested cross-validation to optimize models and evaluate them without bias [62]. This approach uses multiple open-access datasets for BCI applications to establish best practices for machine learning methodology. The framework investigates the influence of different factors on classification performance, including the number of training examples, size of the time window for each fNIRS sample, sliding window approaches versus simple epoch classification, and personalized (within-subject) versus generalized (unseen subject) approaches [62].

For motion artifact characterization, controlled experimental protocols have been developed where participants perform specific head movements along three rotational axes (vertical, frontal, sagittal) at varying speeds (fast, slow) and movement types (half, full, repeated rotation) [5]. These sessions are video recorded and analyzed frame-by-frame using computer vision algorithms to compute head orientation angles, enabling precise correlation between movement parameters and artifact characteristics in fNIRS signals [5].

Signal Processing Workflows

Motion Artifact Classification System

Essential Research Reagents and Materials

Table 3: Key research reagents and solutions for fNIRS benchmarking studies

Category	Specific Tool/Solution	Function/Purpose	Example Implementation
Benchmarking Frameworks	BenchNIRS [62]	Open-source benchmarking with nested cross-validation	Established robust ML methodology on multiple datasets
	Homer2 Software Package [2]	fNIRS data processing and analysis	Motion artifact identification and correction
fNIRS Systems	Continuous Wave Systems (CW-NIRS) [59]	Standard fNIRS measurement	TechEN-CW6 system with 690/830nm wavelengths [2]
	Time-Domain Systems (TD-NIRS) [59]	Enhanced depth sensitivity	Advanced photon migration analysis
	Frequency-Domain Systems (FD-NIRS) [59]	Absolute quantification capabilities	Phase and intensity measurements
Motion Tracking	Accelerometer-based Systems [6]	Motion artifact detection and correction	Active Noise Cancellation (ANC), ABAMAR
	Computer Vision Systems [5]	Ground-truth movement quantification	SynergyNet deep neural network for head orientation
	Inertial Measurement Units (IMU) [6]	Multi-dimensional movement capture	Gyroscope, magnetometer integration
Signal Processing Tools	Wavelet Analysis [2]	Multi-scale artifact correction	Effective for pediatric data
	Spline Interpolation [6] [2]	Artifact removal without signal loss	Recommended for minimizing motion impact
	Targeted PCA (tPCA) [25]	Selective artifact component removal	Improved HRF recovery compared to alternatives
Validation Datasets	Open Access fNIRS Datasets [62]	Standardized performance comparison	Multiple BCI task datasets
	Controlled Movement Datasets [5]	Artifact algorithm validation	Head movements along rotational axes

Performance benchmarking in fNIRS research reveals several critical insights. First, standardized evaluation frameworks like BenchNIRS demonstrate that actual model performance on unseen data is typically lower than often reported in literature, highlighting the importance of robust validation methodologies [62]. Second, motion artifact correction remains a fundamental challenge, with hybrid approaches combining hardware and software solutions showing promise for real-world applications [6] [2]. Future research directions should address the balance between auxiliary hardware and algorithmic solutions, consider filtering delays for real-time applications, and improve robustness under extreme conditions [6]. Furthermore, the field would benefit from increased standardization in optode placement, harmonization of signal processing methods, and larger sample sizes to enhance validity and comparability across studies [63]. As fNIRS continues to evolve toward real-world applications, rigorous performance benchmarking will remain essential for translating research findings into reliable brain-computer interfaces and clinical applications.

Evaluating Robustness and Stability Under Extreme Application Conditions

Functional near-infrared spectroscopy (fNIRS) has emerged as a preferred neuroimaging technique for studies requiring ecological validity and patient mobility, enabling brain monitoring during natural movement and real-world tasks [6] [5]. However, the fundamental challenge confronting fNIRS research is the vulnerability of optical signals to motion artifacts (MAs)—signal distortions caused by relative movement between optical sensors (optodes) and the scalp [6] [66]. These artifacts can severely degrade signal quality, potentially obscuring genuine neurovascular responses and compromising data integrity in both research and clinical applications [9].

While numerous MA removal techniques have been developed, a significant deficiency persists in evaluating their performance under extreme application conditions [6]. Current validation approaches often fail to assess how these methods perform when subjected to intense, complex, or prolonged motion—precisely the scenarios where reliable fNIRS monitoring is most valuable but also most challenging. This gap is particularly critical for applications such as epilepsy monitoring during seizures, stroke rehabilitation involving motor exercises, infant studies where movement cannot be constrained, and real-world brain-computer interface applications [66] [67]. This review synthesizes current methodologies for evaluating MA removal robustness, presents experimental frameworks for stress-testing under extreme conditions, and provides a standardized approach for comparative assessment of motion correction techniques.

Motion Artifact Origins and Characteristics in Extreme Conditions

Motion artifacts in fNIRS signals originate from mechanical disruptions in the optode-scalp interface. When subject movement causes imperfect contact—through displacement, non-orthogonal contact, or optode oscillation—the resulting signal artifacts can exceed the amplitude of physiological hemodynamic responses by an order of magnitude [66]. These artifacts manifest as high-frequency spikes, slow drifts, and baseline shifts that vary in characteristics based on the movement type and intensity [9].

Recent research using computer vision and ground-truth movement data has systematically characterized how specific head movements correlate with distinct artifact patterns [5]. Movements along rotational axes (vertical, frontal, sagittal) produce differentiable signal corruptions, with repeated movements, upward, and downward motions particularly compromising signal quality. Brain regions also exhibit differential vulnerability, with occipital and pre-occipital regions most susceptible to vertical movements, while temporal regions are most affected by lateral motions [5].

Under extreme conditions such as epileptic seizures, convulsive movements generate complex artifact signatures that combine multiple motion vectors and intensities, presenting the most challenging scenario for artifact removal algorithms [66]. Similarly, in motor rehabilitation studies, rhythmic whole-body movements during walking or physical therapy introduce compound artifacts that combine low-frequency oscillations with abrupt motion transients [9]. These extreme conditions test the limits of MA removal techniques by producing artifacts that overlap with the frequency spectrum of genuine hemodynamic responses (typically 0.5-5 Hz for PPG signals, overlapping with the 0.01-10 Hz motion artifact range) and often exceeding the dynamic range of conventional correction approaches [68].

Experimental Protocols for Extreme Condition Testing

Controlled Motion Artifact Induction Protocols

Robust evaluation of MA removal techniques requires standardized protocols for inducing motion artifacts under controlled conditions that simulate real-world extremes. Well-designed experimental methodologies include:

Head Movement Protocol: Subjects perform controlled head movements along three rotational axes (vertical, frontal, sagittal) at varying speeds (slow, fast) and types (half, full, repeated rotations) while fNIRS signals are simultaneously recorded with motion tracking [5]. This systematic approach enables correlation of specific movement parameters with artifact characteristics.

Simulated Seizure Protocol: Healthy subjects simulate motions observed during epileptic seizures, including nodding (up-down), shaking (side-to-side), tilting, twisting, and rapid head movements, while simultaneous recordings are taken from both tested and reference optode configurations [66]. This protocol typically involves 3-second motion trials repeated 5 times with randomized 5-10 second inter-trial intervals.

Physical Activity Protocol: Subjects perform graded physical activities (sitting, slow walking, fast walking, running) while fNIRS and reference signals (e.g., ECG) are recorded [68]. This protocol tests MA removal under conditions of increasing motion intensity, with performance quantified through heart rate estimation accuracy.

Long-duration Monitoring: Extended recording sessions (up to 24 hours) in clinical populations (epilepsy, stroke patients) to assess method stability under realistic clinical conditions with spontaneous movement artifacts [69].

Performance Metrics for Extreme Condition Evaluation

Comprehensive evaluation requires multiple quantitative metrics that capture different aspects of MA removal performance under stress conditions:

Table 1: Performance Metrics for Extreme Condition Evaluation

Metric Category	Specific Metrics	Application Context	Optimal Values
Noise Suppression	Signal-to-Noise Ratio (SNR) Gain [70]	General artifact removal	Higher values preferred
	Failed Detection Rate (FDR) [68]	Heart rate estimation	< 1% for intensive motion
	Sensitivity (Se), Positive Predictive Value (PPV) [68]	Component classification	> 95% for walking
Signal Integrity	Percent Signal Change Reduction [66]	Motion artifact amplitude	90% reduction demonstrated
	Mean Squared Error (MSE) [9]	Hemodynamic response reconstruction	Lower values preferred
Physiological Plausibility	Contrast-to-Noise Ratio (CNR) [9]	Functional activation detection	Higher values preferred
	Correlation with Ground Truth [68]	Heart rate validation	> 0.75 for running

These metrics collectively assess a method's ability to suppress noise while preserving signal integrity—the fundamental challenge of motion artifact correction. The failed detection rate (FDR) for heart rate estimation provides a particularly stringent test, with values under 1% representing excellent performance even during intensive motion like running [68].

Comparative Analysis of Motion Artifact Removal Techniques

Hardware-Based Stabilization Approaches

Hardware solutions focus on preventing motion artifacts through improved optode-scalp coupling and motion monitoring:

Table 2: Hardware-Based Motion Artifact Mitigation Approaches

Technique	Implementation	Performance	Limitations
Collodion-Fixed Fibers [66]	Miniaturized optical fibers fixed with clinical adhesive collodion	90% reduction in motion artifact percent signal change; 6x (690 nm) and 3x (830 nm) SNR improvement	Requires expertise for application; less convenient for quick setup
Accelerometer-Based Methods [6]	Adaptive filtering using accelerometer as motion reference	Enables real-time artifact rejection; improves feasibility for mobile applications	Additional hardware requirement; potential synchronization challenges
Spring-Loaded Optodes [69]	Mechanical pressure maintenance through spring mechanisms	Improved light coupling; reduced ambient light contamination	Increased design complexity; potential comfort issues during long monitoring
Multi-Channel Sensor Arrays [68]	Multiple wavelengths and detection points for signal redundancy	Enables signal reconstruction from least-corrupted channels; direction-based artifact characterization	Increased system complexity; higher computational requirements

The collodion-fixed fiber approach represents the gold standard for extreme motion conditions, having demonstrated capability to maintain signal quality during epileptic seizures where conventional Velcro-based arrays fail completely [66]. This method is particularly valuable for long-term clinical monitoring where motion is unpredictable and often violent.

Algorithmic Motion Artifact Removal Techniques

Algorithmic approaches focus on signal processing techniques to identify and remove motion artifacts from corrupted signals:

Table 3: Algorithmic Motion Artifact Removal Techniques

Technique	Methodology	Extreme Condition Performance	Computational Requirements
Kalman Filtering [70]	Recursive state estimation using autoregressive signal modeling	Superior to adaptive filtering; comparable to Wiener filtering but suitable for real-time application	Moderate; efficient recursive implementation
Independent Component Analysis (ICA) [68]	Blind source separation of signal components	Effective for multi-channel systems; successful in running conditions (FDR < 0.45% for walking)	High; requires multiple channels for effective separation
Wavelet-Based Methods [9]	Multi-resolution analysis and thresholding of wavelet coefficients	Effective for spike-like artifacts; can preserve physiological signal components	Moderate; depends on decomposition levels
Deep Learning Approaches [9]	Neural networks (CNN, U-Net, Autoencoders) for artifact removal	Promising for complex artifact patterns; data-driven without manual parameter tuning	High training requirements; moderate implementation after training
Multi-stage Cascaded Filtering [6]	Sequential application of complementary filtering techniques	Addresses different artifact characteristics at each stage; improved robustness	High; multiple algorithmic stages

The performance of algorithmic techniques depends critically on implementation details and parameter optimization. For example, ICA combined with truncated singular value decomposition has demonstrated excellent performance in wearable PPG applications during intensive motion, maintaining heart rate estimation accuracy with 99% sensitivity and 99.55% positive predictive value even during walking [68].

Experimental Framework for Robustness Assessment

Standardized Extreme Condition Testing Workflow

A comprehensive experimental framework for evaluating MA removal robustness under extreme conditions should incorporate multiple validation approaches:

Figure 1: Experimental Framework for Evaluating Motion Artifact Removal Robustness

This integrated workflow emphasizes multi-modal validation, combining semi-simulated data (mixing known hemodynamic responses with experimental motion artifacts) [9], computer vision-based motion tracking for ground-truth movement correlation [5], and clinical validation in extreme but real-world conditions such as epileptic seizures [66]. Each validation approach provides complementary evidence of robustness under different stress conditions.

The Researcher's Toolkit for Extreme Condition Testing

Table 4: Essential Research Tools for Extreme Condition Motion Artifact Research

Tool Category	Specific Solutions	Function in Robustness Testing
Motion Tracking	3D Motion Capture Systems [5]	Provide ground-truth movement data for artifact characterization
	Inertial Measurement Units (IMUs) [6] [68]	Real-time motion monitoring for adaptive filtering
	Computer Vision (SynergyNet DNN) [5]	Markerless motion tracking from video recordings
Signal Quality Validation	ECG-based Heart Rate Monitoring [68]	Objective physiological validation for motion-corrupted signals
	Multi-wavelength fNIRS Systems [68]	Signal redundancy and depth-dependent artifact characterization
	Collodion-Fixed Reference Optodes [66]	Gold-standard signal quality during extreme motion
Data Processing	HOMER2 Processing Package [66]	Standardized pipeline for comparison across methods
	SLOMOCO Motion Correction [71]	Specialized motion correction with slice-wise adjustment
	Custom MATLAB/Python Toolboxes	Flexible implementation of novel algorithms

This toolkit enables researchers to implement comprehensive testing protocols that move beyond idealized conditions to assess true performance boundaries. The combination of collodion-fixed reference optodes (providing the best-possible signal during motion) with multi-modal motion tracking represents the most rigorous approach for extreme condition validation [5] [66].

Evaluation of motion artifact removal techniques must evolve to meet the demands of real-world fNIRS applications where movement cannot be constrained. Based on current evidence, hardware solutions like collodion-fixed fibers provide the most reliable performance under extreme conditions such as epileptic seizures, but with practical limitations for routine use [66]. Algorithmic approaches show promising robustness when properly validated, with Kalman filtering and ICA-based methods demonstrating strong performance across challenging conditions [68] [70].

Future progress requires standardized extreme condition testing protocols that combine semi-simulated data with rigorous clinical validation. The research community would benefit from shared datasets containing labeled motion artifacts from diverse extreme conditions, enabling direct comparison of method performance. Additionally, the emerging field of learning-based motion artifact processing offers promising directions for handling complex, non-linear artifact patterns that challenge conventional algorithms [9].

As fNIRS technology expands into more mobile and clinical applications, robustness and stability under extreme conditions will transition from a specialized concern to a central requirement. The evaluation frameworks and comparative data presented here provide a foundation for this necessary evolution in validation standards, ultimately enabling more reliable brain monitoring when it matters most.

Functional near-infrared spectroscopy (fNIRS) has emerged as a vital neuroimaging tool, offering a non-invasive method for monitoring cerebral hemodynamics with advantages in portability, cost, and ecological validity over other modalities like fMRI and EEG [6]. However, its significant vulnerability to motion artifacts (MAs) remains a primary constraint, particularly in studies involving pediatric populations, clinical patients, or naturalistic settings where movement is inherent [2] [3]. The pursuit of optimal motion artifact correction has yielded a diverse array of hardware-based and algorithmic solutions, yet the field lacks a standardized validation framework [44].

Traditionally, technique comparison has often relied on single or limited metrics, such as improvement in signal-to-noise ratio or mean-squared error. This unidimensional approach is insufficient for capturing the complex performance trade-offs—between noise suppression, signal fidelity, computational efficiency, and applicability to different data types—that characterize motion correction algorithms [6] [44]. This guide adopts a multi-dimensional validation perspective, synthesizing comparative evidence from real and simulated fNIRS data to provide researchers with a structured framework for evaluating and selecting motion artifact removal techniques based on their specific experimental needs.

Performance Comparison of Motion Artifact Correction Techniques

The following table summarizes the quantitative performance and key characteristics of prevalent motion correction techniques, synthesizing findings from multiple comparative studies.

Table 1: Comprehensive Comparison of fNIRS Motion Artifact Correction Techniques

Technique	Reported Efficacy (Key Findings)	Compatible Signal Types	Online/ Real-time Capability	Primary Limitations
Wavelet Filtering	Superior performance in real cognitive data (93% artifact reduction) [3] [38]. Best for pediatric data with Moving Average [2]. Top performer for functional connectivity analysis [72].	Optical intensity, optical density, concentration changes [72]	Offline	Performance relies on appropriate threshold tuning [72].
Temporal Derivature Distribution Repair (TDDR)	Top performer for functional connectivity and network topology analysis; superior denoising and recovery of original FC pattern [72].	Concentration changes (HbO, HbR)	Online [72]	Assumes non-motion fluctuations are normally distributed [72].
Moving Average (MA)	Yields best outcomes for pediatric fNIRS data, alongside Wavelet [2].	Optical density, concentration changes [2]	Offline	May smooth out rapid, genuine physiological signals [2].
Spline Interpolation (MARA)	Effective for spike artifacts; performance varies with artifact type [3] [72].	Optical density, concentration changes [72]	Offline	Requires precise artifact detection and level correction; complex for real-time use [72].
Correlation-Based Signal Improvement (CBSI)	Effective for co-varying HbO/HbR artifacts; performance can be lower for functional connectivity vs. other methods [3] [72].	Concentration changes (HbO, HbR) [72]	Offline	Relies on strict negative correlation between HbO and HbR, which may not always hold [72].
Principal Component Analysis (PCA)	Variable efficacy; can be outperformed by Wavelet and TDDR in functional connectivity analysis [3] [72].	Optical density, concentration changes [72]	Offline	Risk of removing physiological signals of interest with high-variance components [72].
Kalman Filtering	Lower agreement with literature-backed hypotheses in a multi-analysis study; can be outperformed by other methods [3] [39].	Optical density, concentration changes [72]	Online [72]	Requires historical data for autoregressive model; covariance estimation is critical [72].
Accelerometer-Based Methods (ANC, ABAMAR)	Effective real-time rejection when auxiliary hardware is used [6].	Optical density, concentration changes [6]	Online [6]	Requires additional hardware, complicating participant setup [6] [2].

Experimental Protocols for Technique Validation

A multi-dimensional validation strategy requires assessing techniques under controlled conditions with known ground truths. Below are detailed methodologies from key comparative studies that serve as robust experimental prototypes.

Protocol 1: Validation with Real Cognitive Data and Physiological Plausibility

This protocol, as employed by Brigadoi et al., is crucial for testing algorithms on realistic, challenging artifacts that are correlated with the hemodynamic response [3] [38].

Objective: To compare the performance of motion correction techniques (PCA, spline, wavelet, Kalman, CBSI) on real fNIRS data containing task-related, low-frequency motion artifacts.
fNIRS Data Acquisition: Data was acquired from participants during a color-naming task requiring vocal responses. The jaw movement from speaking produced a low-frequency, low-amplitude motion artifact temporally correlated with the expected hemodynamic response. A multi-channel, frequency-domain system (ISS Imagent) with 32 laser diodes (690 nm and 830 nm) was used at a sampling frequency of 7.8 Hz [3].
Data Analysis: The performance of each correction technique was evaluated not against a simulated signal, but using objective metrics related to physiological plausibility of the recovered hemodynamic response. These metrics were derived to determine which method produced a response that most closely resembled a canonical, plausible brain activation [3] [38].
Key Outcome: The study concluded that wavelet filtering was the most effective technique, reducing the area under the curve of the artifact in 93% of cases. It also established that correcting motion artifacts is almost always preferable to outright trial rejection, which is often not feasible in studies with limited trials [3] [38].

Protocol 2: Evaluating Impact on Functional Connectivity and Network Analysis

With the growth of fNIRS in network neuroscience, this protocol evaluates how motion correction alters functional connectivity (FC) metrics, a critical consideration for developmental and clinical research [72].

Objective: To systematically evaluate how various MA correction methods (PCA, spline, CBSI, Kalman, wavelet, TDDR) affect subsequent brain functional connectivity and topological properties.
Data Preparation: The study used both simulated resting-state fNIRS data with real motion artifacts added and experimental datasets. The simulated data allowed for sensitivity-specificity analysis against a known ground-truth network structure [72].
Analysis Metrics: The efficacy of each algorithm was quantified using:
- Receiver Operating Characteristic (ROC) curves to assess accuracy in recovering true functional connections.
- Graph theory metrics to evaluate the preservation of network topology.
- Mean functional connectivity within pre-defined brain networks [72].
Key Outcome: TDDR and wavelet filtering were identified as the most effective methods for FC analysis, demonstrating superior denoising, the best ROC performance, and an enhanced ability to recover the original FC pattern without introducing significant distortion [72].

Protocol 3: Computer Vision-Augmented Ground Truth for Movement

This novel protocol uses computer vision to obtain precise, ground-truth movement data, enabling a direct characterization of the relationship between specific head movements and resultant artifacts [5].

Objective: To characterize the association between specific head movements and motion artifacts using ground-truth movement information.
Procedure: Participants performed controlled head movements (e.g., rotations along vertical, frontal, and sagittal axes) at different speeds while undergoing whole-head fNIRS imaging. Experimental sessions were video-recorded [5].
Movement Quantification: The video data was analyzed frame-by-frame using the SynergyNet deep neural network to compute precise head orientation angles. Movement metrics like maximal amplitude and speed were extracted from this data [5].
Artifact Correlation: Spikes and baseline shifts in the fNIRS signals were then directly correlated with the quantified head movements, identifying, for instance, that occipital regions are particularly susceptible to upwards or downwards movements [5]. This provides a validated benchmark for testing correction algorithms.

Signaling Pathways and Experimental Workflows

The following diagram illustrates the multi-dimensional decision pathway for selecting and validating a motion artifact correction technique, integrating the core findings from the comparative studies.

Success in fNIRS motion artifact management depends on both software tools and hardware components. The following table details key resources referenced in the comparative studies.

Table 2: Essential Reagents and Resources for fNIRS Motion Artifact Research

Tool/Resource	Type	Primary Function in MA Management	Example Implementation
Homer2 Software Package	Software Toolbox	A standard fNIRS processing package used for implementing and comparing various motion artifact correction algorithms (e.g., spline interpolation, wavelet). [2]	Used in pediatric studies to process optical density data and identify motion artifacts. [2]
Accelerometer/Inertial Measurement Unit (IMU)	Hardware	Provides an auxiliary measure of physical motion to inform artifact correction algorithms, enabling real-time rejection. [6]	Core component in methods like Active Noise Cancellation (ANC) and ABAMAR. [6]
Linearly Polarized Light & Analyzer	Hardware/Optical Setup	Helps optically eliminate hair-reflected light and short-circuited light from detection, reducing one source of motion-sensitive noise. [6] [25]	Used to improve signal quality at the acquisition stage by ensuring only light that has traveled through tissue is detected. [25]
SynergyNet Deep Neural Network	Software (Computer Vision)	Provides ground-truth head movement data by computing head orientation angles from video recordings of experimental sessions. [5]	Used to validate the specific head movements that cause artifacts, creating a benchmark for algorithm development. [5]
Short-Separation Detector	Hardware/Probe Geometry	Measures physiological noises from the scalp and superficial layers, which can be used to regress out non-cerebral signals (e.g., via ICA). [2] [3]	A detector placed ~0.8 cm from a source to separate superficial from cerebral signals.
Custom-Made Caps with Foam & Optode Holders	Hardware/Probe Support	Secures optodes in place and minimizes movement relative to the scalp, serving as a primary physical defense against motion artifacts. [2]	Used in pediatric studies to improve signal quality and consistency across participants with different head sizes. [2]

The move beyond single metrics to a multi-dimensional validation framework is essential for advancing fNIRS research. Evidence consistently shows that technique performance is context-dependent: wavelet filtering excels in task-activation studies, TDDR and wavelet are superior for functional connectivity, and moving average methods remain highly effective for particularly challenging data like that from pediatric cohorts [2] [3] [72]. The choice of algorithm is a consequential decision that interacts with experimental design, population, and analytical goals.

Future work must address unresolved challenges, including the need for standardized, open-source benchmarking datasets and the development of transparent, automated reporting standards for preprocessing steps [44] [39]. By adopting a comprehensive, multi-faceted approach to technique validation—one that considers noise suppression, signal fidelity, computational load, and impact on high-level analysis—researchers can enhance the reliability, reproducibility, and clinical utility of fNIRS neuroimaging.

Conclusion

Effective evaluation of motion artifact removal is paramount for ensuring the validity of fNIRS findings. This review synthesizes that a multi-metric approach, balancing noise suppression with signal fidelity, is essential. While techniques like wavelet filtering and hybrid methods often show superior performance, the optimal choice is context-dependent. Future work must address critical gaps, including standardized benchmarking protocols, the balance between hardware and algorithmic solutions, and a deeper investigation into the robustness, stability, and real-time filtering delays of these methods. Advancing these areas will solidify fNIRS as a reliable tool for both fundamental neuroscience and clinical drug development.