Motion Artifact in fMRI: Distinguishing Overestimation from Underestimation in Trait-FC Effects

Allison Howard Dec 02, 2025 24

In-scanner head motion is a pervasive source of artifact in resting-state functional MRI that can systematically bias trait-functional connectivity (trait-FC) associations, leading to both spurious discoveries and masked true effects.

Motion Artifact in fMRI: Distinguishing Overestimation from Underestimation in Trait-FC Effects

Abstract

In-scanner head motion is a pervasive source of artifact in resting-state functional MRI that can systematically bias trait-functional connectivity (trait-FC) associations, leading to both spurious discoveries and masked true effects. This article synthesizes current research to explore a critical distinction: how residual motion can cause either overestimation or underestimation of true trait-FC relationships. We examine the mechanisms behind these directional biases, with a focus on populations where motion is correlated with the trait of interest, such as in ADHD and autism. The content covers methodological advances for detecting and quantifying motion impact, including the novel SHAMAN framework, and evaluates the efficacy of denoising and censoring strategies. For researchers and clinical trial professionals, we provide actionable guidance on optimizing preprocessing pipelines to mitigate these biases, thereby enhancing the validity of brain-wide association studies and accelerating the development of robust neuroimaging biomarkers.

The Dual Threat: How Motion Artifact Biases Trait-FC Findings

Defining Motion Overestimation and Underestimation in Trait-FC Contexts

In-scanner head motion represents the most substantial source of systematic bias in resting-state functional magnetic resonance imaging (fMRI) studies, introducing artifacts that persist despite extensive denoising algorithms [1]. This challenge is particularly acute for researchers investigating traits inherently associated with greater movement, such as psychiatric disorders, where failure to account for residual motion artifacts can lead to both false positive and false negative findings [1]. The distinction between motion-induced overestimation and underestimation of trait-functional connectivity (trait-FC) relationships represents a critical methodological frontier in neuroimaging. Without precise tools to quantify these directional biases, investigators risk reporting spurious brain-behavior associations or missing genuine neurobiological relationships obscured by motion-related artifacts [1].

The development of the Split Half Analysis of Motion Associated Networks (SHAMAN) framework provides researchers with a standardized approach to assign motion impact scores to specific trait-FC relationships, distinguishing between effects where motion causes inflation (overestimation) versus suppression (underestimation) of true effect sizes [1]. This comparative guide evaluates SHAMAN against existing motion correction approaches using data from large-scale studies including the Adolescent Brain Cognitive Development (ABCD) Study, providing researchers and drug development professionals with evidence-based recommendations for optimizing trait-FC study designs and analytical pipelines [1].

Motion Impact Mechanisms: Defining Overestimation and Underestimation

Conceptual Framework

Within trait-FC research, motion overestimation occurs when head motion artifact systematically inflates or exaggerates the apparent relationship between a trait and functional connectivity, creating false positive findings [1]. Conversely, motion underestimation arises when motion artifact suppresses or obscures genuine trait-FC relationships, leading to false negative conclusions [1]. The directional nature of these motion impacts stems from the complex interaction between motion-correlated traits and the systematic biases motion introduces to FC metrics [1].

Motion Artifact Effects on Functional Connectivity

Head motion systematically alters resting-state FC patterns in spatially consistent ways, primarily characterized by decreased long-distance connectivity and increased short-range connectivity, with particularly pronounced effects within the default mode network [1]. These motion-induced FC changes create a fundamental confound in trait studies because they correlate with many behavioral and clinical traits of interest [1]. The resulting motion-FC effect matrix demonstrates a strong negative correlation (Spearman ρ = -0.58) with the average FC matrix, meaning participants who move more consistently show weaker connection strengths across brain networks compared to those who move less [1].

Table 1: Characteristics of Motion Artifact Effects on Functional Connectivity

Effect Type	Primary Manifestation	Network Impact	Correlation with Average FC
Overall Motion Effect	Decreased connection strength in high-motion participants	Reduced long-distance connectivity	Strong negative correlation (ρ = -0.58)
Spatial Pattern	Distance-dependent correlations	Increased short-range connectivity	Persistent after censoring (ρ = -0.51)
Network Specificity	Default mode network vulnerability	Systematic spatial bias	Consistent across denoising approaches

The SHAMAN Framework: Methodology and Experimental Protocol

Core Principles and Workflow

The Split Half Analysis of Motion Associated Networks (SHAMAN) capitalizes on a fundamental physiological observation: traits (e.g., cognitive abilities, clinical symptoms) remain stable over the timescale of an MRI scan, while head motion represents a state that varies second-to-second [1]. This temporal dissociation enables SHAMAN to quantify motion impact by measuring differences in correlation structure between split high-motion and low-motion halves of each participant's fMRI timeseries [1].

The SHAMAN workflow implements the following methodological sequence:

Timeseries Segmentation: For each participant, the resting-state fMRI timeseries is divided into high-motion and low-motion halves based on framewise displacement (FD) metrics.
Connectivity Calculation: Separate FC matrices are computed for the high-motion and low-motion segments.
Trait-FC Effect Comparison: The relationship between trait measures and FC is quantified separately for each motion segment.
Impact Score Computation: Motion impact scores are derived from systematic differences between trait-FC effects in high-motion versus low-motion segments.
Directional Classification: Significant impact scores aligned with the trait-FC effect direction indicate motion overestimation; scores opposite the trait-FC effect indicate motion underestimation.
Statistical Validation: Permutation testing and non-parametric combining across pairwise connections generate significance values for motion impact scores.

Diagram 1: SHAMAN Motion Impact Assessment Workflow

Key Methodological Advantages

SHAMAN provides several critical advantages over previous motion quantification approaches. Unlike methods that require repeated resting-state scans on different days, SHAMAN operates effectively with single scanning sessions [1]. The framework incorporates adaptability for covariate modeling and generates directionally specific impact scores that distinguish between overestimation and underestimation effects [1]. Additionally, SHAMAN establishes statistical thresholds for acceptable versus unacceptable levels of trait-specific motion impact, moving beyond simple motion-FC correlation measures to provide actionable guidance for specific trait-FC relationships under investigation [1].

Comparative Performance: SHAMAN Versus Alternative Motion Correction Approaches

Experimental Design and Dataset

The comparative performance evaluation of motion correction methodologies utilized data from the Adolescent Brain Cognitive Development (ABCD) Study, comprising 11,874 children ages 9-10 years with extensive resting-state fMRI and behavioral data [1]. For primary analyses, n = 7,270 participants with sufficient data quality were included [1]. Researchers assessed 45 diverse traits spanning demographic, biophysical, and behavioral domains to evaluate the prevalence of motion overestimation and underestimation across different motion correction approaches [1].

The experimental protocol compared SHAMAN-derived motion impact scores across three processing conditions:

Standard Denoising Only: Application of ABCD-BIDS preprocessing (global signal regression, respiratory filtering, spectral filtering, despiking, motion parameter regression) without motion censoring.
Moderate Censoring: Standard denoising plus censoring of frames with framewise displacement (FD) < 0.2 mm.
Stringent Censoring: Standard denoising plus more aggressive frame removal (FD < 0.1 mm).

Table 2: Motion Impact Prevalence Across Processing Pipelines (n=45 Traits)

Processing Pipeline	Traits with Significant Overestimation	Traits with Significant Underestimation	Total Traits with Motion Impact
Standard Denoising Only	42% (19/45)	38% (17/45)	80% (36/45)
FD < 0.2 mm Censoring	2% (1/45)	38% (17/45)	40% (18/45)
FD < 0.1 mm Censoring	Supplementary analysis required	Supplementary analysis required	Supplementary analysis required

Performance Findings

The comparative analysis revealed crucial differential effects of motion correction strategies on overestimation versus underestimation. After standard denoising without motion censoring, motion significantly impacted the majority (80%) of trait-FC relationships, with nearly equal distribution between overestimation (42%) and underestimation (38%) effects [1].

Implementing motion censoring at FD < 0.2 mm dramatically reduced significant overestimation to just 2% of traits, demonstrating exceptional efficacy against false positive inflation [1]. However, this same censoring threshold produced no reduction in significant underestimation, which remained at 38% of traits [1]. This asymmetric effect highlights a critical limitation of conventional motion censoring: while effectively controlling false positives, it fails to address motion-induced suppression of genuine trait-FC relationships, potentially exacerbating false negative rates in studies of motion-correlated traits.

Comparison with Alternative Methods

SHAMAN addresses several limitations present in previous motion impact assessment approaches. Unlike distance-dependent correlation analysis, SHAMAN provides trait-specific impact scores rather than global motion-FC similarity metrics [1]. Compared to matched-group designs, SHAMAN operates within participants, eliminating between-group matching challenges [1]. Relative to Siegel et al.'s original method, SHAMAN eliminates the requirement for repeated scanning sessions and incorporates directional impact classification [1].

Diagram 2: Motion Impact Pathways and Intervention Effects

Table 3: Essential Research Reagents and Computational Tools

Tool/Resource	Primary Function	Application Context	Key Features
SHAMAN Framework	Quantifies trait-specific motion impact	Post-processing analysis	Directional impact scores (over/underestimation), Single-session application
ABCD-BIDS Pipeline	Comprehensive denoising	fMRI preprocessing	Global signal regression, Respiratory filtering, Motion parameter regression
Framewise Displacement (FD)	Quantifies head motion between volumes	Motion quantification	Composite metric (rotation + translation), Censoring threshold specification
Motion Censoring	Removes high-motion frames	Data quality control	Reduces overestimation bias, Framewise exclusion
ABCD Dataset	Large-scale developmental neuroimaging	Data source	n=11,874 participants, 45+ traits, Multi-site standardization

The distinction between motion overestimation and underestimation represents a fundamental advancement in understanding how head motion artifacts influence trait-FC research. The SHAMAN framework provides researchers with a critical tool for quantifying these directional impacts, addressing a significant limitation of conventional motion correction approaches that treat all motion effects as uniform [1].

The empirical evidence demonstrates that standard denoising alone fails to eliminate motion impact for most traits (80%), with nearly equal distribution between overestimation and underestimation effects [1]. While motion censoring at conventional thresholds (FD < 0.2 mm) effectively controls false positives by reducing overestimation to minimal levels (2%), it leaves false negative rates unchanged, failing to address underestimation bias [1]. This finding has profound implications for trait-FC study design, particularly for investigations of clinical populations with inherently higher motion, where underestimation may systematically suppress detection of genuine neurobiological relationships.

For researchers and drug development professionals, these findings recommend a tiered analytical approach: (1) implementation of rigorous denoising pipelines, (2) application of appropriate motion censoring to control false positives, and (3) routine application of SHAMAN or similar frameworks to quantify residual underestimation bias in traits of interest. This comprehensive strategy moves the field beyond binary motion "correction" toward precise characterization of how motion impacts specific trait-FC relationships, enabling more accurate interpretation of neurobehavioral associations and more targeted therapeutic development.

A fundamental challenge in neuroscience is the contamination of neural signals by motion artifacts. These artifacts systematically bias research findings by either mimicking genuine brain activity, leading to false positive results (overestimation), or obscuring true neural signals, leading to false negative results (underestimation). This guide examines the sources and impacts of these artifacts across major neuroimaging and neuromodulation techniques, comparing the performance of different correction strategies.

Motion artifacts introduce systematic errors that can compromise the validity of neuroimaging and neuromodulation studies. In functional magnetic resonance imaging (fMRI), even small, involuntary head movements on the order of millimeters can cause substantial signal changes, altering measured functional connectivity between brain regions. Similarly, in techniques like electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), motion can generate electrical signals or hemodynamic responses that closely resemble neurobiological activity. The central problem is twofold: motion can create spurious signals that mimic true neural effects, or it can mask and suppress genuine neural signals, with overestimation and underestimation biases presenting distinct methodological challenges.

The severity of this problem is particularly pronounced when studying clinical populations or traits naturally associated with increased movement, such as psychiatric disorders, neurodevelopmental conditions, or pediatric populations. Understanding the mechanisms, biases, and state-of-the-art correction methods is therefore essential for generating reliable neuroscientific findings and advancing drug development research.

Experimental Evidence of Mimicry and Masking

Evidence from Transcranial Focused Ultrasound

Research on low-intensity transcranial focused ultrasound stimulation (tFUS) has revealed that mechanical artifacts can precisely mimic biologically evoked responses. In experimental models, researchers observed a strong, stereotyped local field potential response time-locked to sonication onset and offset that closely resembled sensory-evoked potentials.

Table 1: Motion Artifact Characteristics in Transcranial Focused Ultrasound

Experimental Condition	Artifact Characteristics	Neural Mimicry Evidence	Key Findings
Anesthetized Rat Hippocampus	Stereotyped LFP response time-locked to sonication	Resembled sensory-evoked potentials	Same waveform persisted in euthanized animals, confirming non-biological origin
Silicon Microelectrodes	Artifact scaled with acoustic intensity	Most pronounced under continuous-wave sonication	Electrode movement generated artifactual LFP signals

Critically, the same waveform was observed in euthanized animals, confirming a non-biological, artifact-driven origin [2]. These findings underscore the risk of misinterpreting motion-related artifacts as genuine neural responses, particularly in techniques involving mechanical energy transmission.

Evidence from Functional MRI

In-scanner head motion introduces systematic bias to resting-state fMRI functional connectivity that isn't completely removed by standard denoising algorithms. Research using the Split Half Analysis of Motion Associated Networks (SHAMAN) method has quantified how motion impacts specific trait-FC relationships.

Table 2: Motion Impact on Brain-Behavior Associations in fMRI (n=7,270 participants)

Analysis Condition	Traits with Significant Overestimation	Traits with Significant Underestimation	Effect on Trait-FC Inference
After Standard Denoising (No Censoring)	42% (19/45 traits)	38% (17/45 traits)	Widespread inflation and suppression of effects
After Censoring (FD < 0.2 mm)	2% (1/45 traits)	38% (17/45 traits)	Reduced overestimation but persistent underestimation

After standard denoising without motion censoring, 42% of traits showed significant motion overestimation scores while 38% showed significant underestimation scores. Censoring at framewise displacement < 0.2 mm reduced significant overestimation to just 2% of traits but did not decrease the number of traits with significant motion underestimation scores [1] [3] [4]. This demonstrates that standard motion correction approaches may reduce one type of bias while perpetuating another.

Methodological Protocols for Motion Impact Assessment

The SHAMAN Framework for fMRI

The Split Half Analysis of Motion Associated Networks (SHAMAN) methodology was developed to assign a motion impact score to specific trait-functional connectivity relationships. This approach distinguishes between motion causing overestimation or underestimation of trait-FC effects through several key steps:

Data Acquisition: Utilizing large-scale datasets like the Adolescent Brain Cognitive Development (ABCD) Study with up to 20 minutes of rs-fMRI data on 11,874 children ages 9-10 years.
Denoising Application: Implementing standard denoising algorithms (e.g., ABCD-BIDS) including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter timeseries regression.
Split-Half Analysis: Capitalizing on the observation that traits are stable over the timescale of an MRI scan while motion varies second-to-second. The method measures differences in correlation structure between split high- and low-motion halves of each participant's fMRI timeseries.
Impact Scoring: A direction (positive or negative) of the motion impact score aligned with the trait-FC effect indicates motion causing overestimation, while a score opposite the trait-FC effect indicates motion causing underestimation.
Statistical Validation: Permutation of the timeseries and non-parametric combining across pairwise connections yields a motion impact score with p-value distinguishing significant from non-significant motion impacts [1].

Motion-Net for EEG Artifact Removal

For mobile EEG applications, the Motion-Net deep learning framework provides a subject-specific approach to motion artifact removal:

Experimental Design: Subjects perform controlled movements while EEG and accelerometer data are synchronously recorded.
Data Preprocessing: Cutting data according to experiment triggers, resampling, synchronization testing by comparing motion artifact amplitude peak locations in EEG and accelerometer signals, and baseline correction.
Feature Extraction: Incorporating visibility graph features that provide structural information about EEG signals, enhancing model performance with smaller datasets.
Model Architecture: Implementing a U-Net-based convolutional neural network trained separately for each subject using real EEG recordings with ground-truth references.
Performance Validation: Achieving an average motion artifact reduction percentage of 86% ±4.13, SNR improvement of 20 ±4.47 dB, and Mean Absolute Error of 0.20 ±0.16 across experimental setups [5].

Comparative Analysis of Motion Artifact Challenges Across Techniques

Different neuroimaging modalities face distinct motion artifact challenges and require specialized correction approaches.

Table 3: Motion Artifact Profiles Across Neuroimaging Modalities

Technique	Primary Motion Artifact Mechanisms	Mimicry Risks	Masking Risks	Optimal Correction Strategies
fMRI	Head movement alters magnetic field uniformity; causes spin history effects	Spurious functional connectivity; false brain-behavior correlations	Underestimation of long-distance connectivity	SHAMAN; stringent censoring (FD < 0.2 mm); combination pipelines
EEG	Electrode movement, cable motion, muscle artifacts	Mimics epileptic spikes, evoked potentials	Obscures genuine brain oscillations	Motion-Net; visibility graph features; subject-specific approaches
fNIRS	Head movement disrupts optode-scalp coupling	Creates hemodynamic-like responses	Reduces signal-to-noise ratio	Computer vision tracking; movement categorization by axis/speed
Simultaneous EEG-fMRI	Conductive path movement in static magnetic field	Creates spurious EEG-fMRI correlations	Obscures true neurovascular coupling	Reference layer artifact subtraction; motion parameter regression

Each technique shows varying susceptibility to different motion types. For example, in fNIRS, upward and downward movements particularly compromise signal quality in occipital regions, while temporal regions are most affected by lateral movements [6]. In simultaneous EEG-fMRI, head shaking produces more challenging artifacts compared to head nodding due to non-rigid body movement of the skull and skin [7].

The Scientist's Toolkit: Research Reagent Solutions

Table 4: Essential Resources for Motion Artifact Management

Research Reagent	Primary Function	Application Context	Key Advantages
SHAMAN Framework	Quantifies motion overestimation/underestimation	Resting-state fMRI	Trait-specific impact scores; distinguishes direction of bias
Motion-Net	Deep learning artifact removal	Mobile EEG	Subject-specific approach; 86% artifact reduction
Visibility Graph Features	Extracts structural signal properties	EEG preprocessing	Enhances model accuracy with smaller datasets
Computer Vision Tracking	Quantifies head movement parameters	fNIRS studies	Provides ground-truth movement data for validation
Reference Layer Systems	Measures motion artifacts directly	EEG-fMRI environments	Electrically isolated reference signals
Framewise Displacement	Quantifies head movement between volumes	fMRI quality control	Standardized metric for censoring decisions
ICA-AROMA	Identifies motion-related components	fMRI preprocessing	Automatic removal of motion artifacts

Visualization of Motion Artifact Pathways and Methodologies

Motion Artifact Generation in Neuroimaging

SHAMAN Analytical Workflow

Motion artifacts present a complex challenge in neuroscience research, with the potential to both mimic and mask genuine neural signals. The evidence demonstrates that different correction strategies carry trade-offs between reducing overestimation versus underestimation biases. While stringent motion censoring effectively reduces false positive findings, it may perpetuate false negatives and limit generalizability by systematically excluding participants with higher motion. Emerging techniques like SHAMAN for fMRI and Motion-Net for EEG represent significant advances in quantifying and addressing these biases. For researchers in neuroscience and drug development, implementing multimodal motion assessment, applying trait-specific impact analyses, and transparently reporting motion management strategies are essential for producing valid, reproducible findings.

Functional connectivity (FC) research, particularly in resting-state functional MRI (rs-fMRI), has become a cornerstone for investigating brain organization in neurodevelopmental conditions such as attention-deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD). However, head motion during scanning introduces systematic bias that can create spurious brain-behavior associations, potentially compromising findings in these clinically relevant populations. This methodological challenge is particularly acute when studying traits inherently correlated with motion, such as symptoms of ADHD and autism [1].

The motion-trait conundrum represents a critical methodological challenge: individuals with ADHD and autism often exhibit greater in-scanner head motion [1], meaning that the very populations of interest are those most likely to produce data contaminated by motion artifacts. Even with standard denoising algorithms, residual motion artifact persists and can systematically alter observed trait-FC relationships. Research indicates that motion artifacts can cause both overestimation and underestimation of true trait-FC effects, potentially leading to false positive and false negative results in brain-wide association studies (BWAS) [1].

Understanding and addressing this confound is essential for advancing robust, reproducible research on neurodevelopmental conditions. This guide compares current approaches for detecting and mitigating motion-related artifacts, with particular emphasis on populations at risk for motion-correlated traits.

Quantitative Comparison of Motion Artifact Impact

Prevalence of Motion Impact on Behavioral Traits

Table 1: Impact of Residual Head Motion on Trait-FC Relationships in the ABCD Study (n=7,270)

Analysis Condition	Traits with Significant Motion Overestimation	Traits with Significant Motion Underestimation	Key Findings
After standard denoising (ABCD-BIDS)	42% (19/45 traits)	38% (17/45 traits)	Majority of traits showed significant motion impact despite denoising
After censoring (FD < 0.2 mm)	2% (1/45 traits)	38% (17/45 traits)	Censoring reduced overestimation but did not address underestimation

Data derived from SHAMAN methodology application to the ABCD Study dataset, assessing 45 behavioral traits [1].

Cognitive Profiles in ADHD and Autism

Table 2: Cognitive Profiles on Wechsler Intelligence Tests (WAIS-IV/WISC-V) in Autism and ADHD

Cognitive Domain	Autism Profile	ADHD Profile	Clinical Implications
Verbal Comprehension	Typical performance (~100)	Age-expected levels	Relative strength in autism
Perceptual Reasoning	Typical performance (~100)	Age-expected levels	Relative strength in autism
Working Memory	Slightly reduced (~90)	Slightly reduced (~95)	Area of difficulty for both conditions
Processing Speed	Significantly reduced (~85)	Age-expected levels	Distinctive weakness in autism
Full-Scale IQ	Within typical range	Within typical range	Profiles not sufficient for diagnosis

Meta-analysis of over 1,800 neurodivergent participants from 18 data sources; standardized scores shown where average population performance = 100 [8].

Experimental Protocols for Motion Impact Assessment

The SHAMAN Framework: Split-Half Analysis of Motion-Associated Networks

The Split-Half Analysis of Motion-Associated Networks (SHAMAN) methodology was developed specifically to quantify trait-specific motion artifact in functional connectivity studies [1]. This approach addresses key limitations of previous methods by operating on one or more rs-fMRI scans per participant and accommodating covariates in statistical models.

Experimental Workflow:

Data Acquisition and Preprocessing: Collect resting-state fMRI data with associated framewise displacement (FD) calculations. Apply standard denoising pipelines (e.g., ABCD-BIDS processing including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter timeseries regression) [1].
Split-Half Partitioning: For each participant, separate the fMRI timeseries into high-motion and low-motion halves based on framewise displacement values.
Trait-FC Effect Calculation: Compute correlation structures between the trait of interest and functional connectivity measures separately for each motion half.
Motion Impact Score Calculation: Compare the difference in trait-FC correlation structure between high-motion and low-motion halves. The significance of this difference is assessed through permutation testing of the timeseries and non-parametric combining across pairwise connections.
Directional Interpretation:
- A motion overestimation score is assigned when the direction of the motion impact score aligns with the direction of the trait-FC effect.
- A motion underestimation score is assigned when the motion impact score direction opposes the trait-FC effect.
Validation: Apply censoring at various FD thresholds (e.g., FD < 0.2 mm) to evaluate reduction in motion impact scores.

Figure 1: SHAMAN Experimental Workflow for Quantifying Motion Impact

Comparative Diagnostic Assessment for ADHD and Autism

Accurate differential diagnosis is essential for disentangling the motion-trait conundrum, particularly given the high co-occurrence of ADHD and autism [9]. The following protocol outlines a comprehensive assessment approach:

Clinical Assessment Protocol:

Multi-Method Data Collection:
- Parental Interviews: Structured diagnostic interviews (e.g., Autism Diagnostic Interview-Revised) focusing on early developmental history and current symptoms.
- Rating Scales: Standardized behavior checklists for both autism (e.g., Social Communication Questionnaire) and ADHD (e.g., Conners' Rating Scales) symptoms.
- Behavioral Observations: Direct observation using standardized protocols (e.g., Autism Diagnostic Observation Schedule) across different contexts.
- Cognitive Testing: Comprehensive neuropsychological assessment including Wechsler Intelligence Scales to identify cognitive profiles [8].
Subtype Differentiation: Specifically assess for ADHD presentations (predominantly inattentive, predominantly hyperactive-impulsive, or combined) within autistic individuals [9].
Longitudinal Follow-up: Implement ongoing assessment across development, as symptoms may change presentation over time, particularly from childhood to adolescence [9].

Molecular Pathways and Research Models

Signaling Pathways in Neurodevelopmental Conditions

Research using animal models has identified several molecular pathways relevant to ADHD and autism pathophysiology. The diagram below illustrates key pathways implicated in these conditions, particularly focusing on synaptic function.

Figure 2: Molecular Pathways in ADHD and Autism Pathophysiology

Key Pathway Insights:

Dopamine Dysregulation in ADHD: DAT knockout (DAT-KO) mouse models demonstrate a five-fold increase in extracellular dopamine concentration, reduced dopamine release, and approximately 50% downregulation of postsynaptic D1 and D2 receptors in the striatum [10]. These alterations lead to hyperlocomotion, deficits in attention, and poor learning and memory - core features relevant to ADHD.
Synaptic Dysfunction in Autism: SH3RF3 deficiency disrupts the formation of a molecular complex between BRSK1/SAD-B kinase and the ASD-associated active zone protein RIM1, leading to reduced RIM1 phosphorylation. This perturbation substantially reduces synaptic vesicle density and readily releasable pool size, coupled with delayed synaptic vesicle replenishment kinetics. These deficits ultimately impair excitatory synaptic transmission in the prefrontal cortex and disturb excitatory-inhibitory (E/I) balance [11].

Table 3: Key Research Reagents and Resources for Motion-Trait Studies

Resource Category	Specific Examples	Research Application	Key Considerations
Denoising Algorithms	ABCD-BIDS pipeline, FIX, Global Signal Regression	Removes motion-related variance from fMRI data	Residual motion artifact persists; may not eliminate trait-specific motion effects
Motion Censoring Methods	Framewise displacement (FD) thresholding (e.g., FD < 0.2 mm)	Excludes high-motion fMRI frames from analysis	Reduces overestimation but may not address underestimation; can bias sample distribution
Motion Impact Quantification	SHAMAN framework	Assigns motion impact scores to specific trait-FC relationships	Distinguishes between overestimation and underestimation effects; adaptable to covariates
Animal Models	DAT-KO mice, SH3RF3-KO mice, Spontaneously Hypertensive Rats (SHR)	Studies molecular mechanisms and tests interventions	Recapitulate specific behavioral features but may not model full complexity of human disorders
Stem Cell Models	iPSC-derived neurons, 3D organoids, assembloids	Studies human-specific features of neurodevelopment	Captures patient-specific genetics; useful for personalized therapeutic screening
Behavioral Assessments	Open field test, Marble burying, Social interaction tests	Quantifies behavioral phenotypes in animal models	Provides face validity for neurodevelopmental condition features

The motion-trait conundrum presents a significant methodological challenge for researchers studying ADHD, autism, and other neurodevelopmental conditions. Current evidence suggests that standard denoising approaches substantially reduce but do not eliminate motion-related artifacts in functional connectivity analyses. The development of specialized methods like the SHAMAN framework represents important progress in quantifying and addressing trait-specific motion impacts.

For populations at risk of motion-correlated traits, particularly individuals with ADHD and autism, researchers must implement comprehensive strategies that include robust motion quantification, careful diagnostic characterization, and appropriate analytical corrections. Future research should continue to refine these methodological approaches to ensure that observed brain-behavior relationships reflect genuine neurobiological mechanisms rather than motion-related artifacts, ultimately advancing our understanding of neurodevelopmental conditions and supporting the development of more effective interventions.

The Spatiotemporal Signature of Motion on Functional Connectivity

Functional connectivity (FC), measured as the temporal synchronization of blood-oxygen-level-dependent (BOLD) signal fluctuations across different brain regions, has become a fundamental tool for exploring brain network organization [12]. However, the spatiotemporal signature of in-scanner head motion introduces systematic biases that can profoundly alter FC estimates and lead to spurious scientific conclusions [1] [13]. This confound is particularly problematic in studies investigating traits associated with motion, such as psychiatric disorders, developmental conditions, and aging, where motion systematically correlates with the variables of interest [1] [13]. Understanding the dual nature of motion artifacts—capable of both overestimating and underestimating trait-FC relationships—has become essential for proper interpretation of functional connectivity findings.

The spatial signature of motion artifacts exhibits consistent patterns: motion typically decreases long-distance connectivity while increasing short-range connectivity, most notably in default mode and frontoparietal control networks [1] [14]. Temporally, motion induces both immediate, large-amplitude signal changes and longer-duration artifacts that may persist for 8-10 seconds [13]. These spatiotemporal characteristics vary across individuals and populations, making motion correction particularly challenging in clinical and developmental populations where motion is often elevated [1]. This review systematically examines the spatiotemporal signature of motion on FC, compares methodological approaches for quantifying and mitigating these artifacts, and provides practical guidance for researchers investigating trait-FC relationships.

Motion Overestimation vs. Underestimation in Trait-FC Effects

The Dual Nature of Motion Artifacts

Head motion systematically biases trait-FC relationships in two distinct directions: overestimation and underestimation of true effects [1]. The direction of bias depends on the alignment between motion-FC effects and trait-FC effects. When the spatial pattern of motion-related FC changes aligns with the trait-FC effect direction, motion causes overestimation; when these patterns oppose each other, motion causes underestimation of the true relationship [1]. This distinction is critical because these two types of bias require different mitigation strategies, and standard denoising approaches may reduce one type while potentially exacerbating the other.

Recent large-scale studies reveal how prevalent both types of bias are in practice. In an analysis of 45 traits from n=7,270 participants in the Adolescent Brain Cognitive Development (ABCD) Study, 42% (19/45) of traits exhibited significant motion overestimation scores while 38% (17/45) showed significant underestimation scores after standard denoising [1]. This nearly equal distribution highlights that motion artifacts are not unidirectional and cannot be addressed with a one-size-fits-all approach.

The SHAMAN Method for Quantifying Motion Impact

The Split Half Analysis of Motion Associated Networks (SHAMAN) method was developed to quantitatively distinguish between motion overestimation and underestimation effects on specific trait-FC relationships [1]. SHAMAN capitalizes on the fundamental observation that traits (e.g., cognitive abilities, clinical diagnoses) are stable over the timescale of an MRI scan, whereas motion is a state that varies from second to second [1].

The method operates by:

Splitting each participant's fMRI timeseries into high-motion and low-motion halves
Measuring differences in correlation structure between these halves
Comparing the direction of motion impact scores with the direction of trait-FC effects
Using permutation testing and non-parametric combining to generate statistically significant motion impact scores [1]

A motion impact score aligned with the trait-FC effect direction indicates overestimation, while a score opposite to the trait-FC effect indicates underestimation [1]. This methodological innovation provides researchers with a specific tool to evaluate whether their trait-of-interest is vulnerable to motion-related bias.

Table 1: SHAMAN Motion Impact Score Interpretation

Motion Impact Score	Relationship to Trait-FC Effect	Interpretation	Recommended Action
Positive and significant	Aligned with trait-FC effect	Motion causes overestimation	Apply stricter motion censoring (FD < 0.2 mm)
Negative and significant	Opposite to trait-FC effect	Motion causes underestimation	Avoid aggressive censoring that may exacerbate bias
Not significant	No systematic relationship	Minimal motion impact	Standard processing sufficient

Differential Effectiveness of Mitigation Strategies

The SHAMAN method reveals that standard motion correction approaches differentially address overestimation versus underestimation biases. In the ABCD study, motion censoring at framewise displacement (FD) < 0.2 mm dramatically reduced significant overestimation from 42% to just 2% of traits [1]. However, this same censoring threshold did not decrease the number of traits with significant motion underestimation scores [1]. This finding has profound implications for analytical choices—while aggressive censoring may effectively control false positives from motion overestimation, it does not resolve underestimation biases and may even exacerbate them by selectively removing data from high-motion individuals who often represent important clinical populations.

Spatiotemporal Characteristics of Motion Artifacts

Spatial Signatures of Motion in FC

The spatial distribution of motion artifacts in functional connectivity follows systematic patterns that reflect both the biomechanics of head movement and the physics of MRI signal acquisition:

Distance-dependent effects: Motion produces characteristic decreases in long-distance connectivity and increases in short-range connectivity [1] [14]. This pattern emerges because motion artifacts introduce spatially correlated signal changes that diminish with distance from the movement source.
Network-specific vulnerability: The default mode network (DMN) and frontoparietal control networks show particularly pronounced motion-related decreases in functional coupling [14]. These networks, characterized by distributed regions of association cortex, appear most vulnerable to motion-induced signal loss.
Regional susceptibility: Motion is biomechanically constrained by the neck, resulting in minimal movement near the atlas vertebrae (where the skull attaches) and increasing motion with distance from this anchor point [13]. Frontal cortex shows particularly high motion susceptibility, likely due to the prevalence of y-axis rotation (nodding movement) [13].
Apparent connectivity increases: While motion typically decreases most connections, it can spuriously increase certain network measures, including local functional coupling and connectivity between homotopic motor regions [14]. These artifactual "increases" can be particularly misleading when interpreting group differences.

Table 2: Spatial Patterns of Motion Artifacts in Functional Connectivity

Spatial Pattern	Affected Brain Networks/Regions	Direction of Effect	Potential Misinterpretation
Distance-dependent correlation	Long-distance connections	Decreased connectivity	Misinterpreting motion-related reduction as neuronal decoupling
Short-range overestimation	Local, neighboring regions	Increased connectivity	False positive local hyperconnectivity
Default network vulnerability	DMN, frontoparietal networks	Marked decrease	Mistaking motion artifact for DMN dysfunction in clinical populations
Motor network changes	Bilateral motor regions	Increased interhemispheric coupling	Using motor connectivity as reference despite motion susceptibility

Temporal Signatures of Motion in FC

The temporal characteristics of motion artifacts manifest across multiple timescales and present distinct challenges for FC analysis:

Immediate signal disruptions: Motion produces large-amplitude, temporally circumscribed signal changes that are maximal at the volume acquired immediately after an observed movement [13]. These abrupt signal changes scale with motion magnitude and represent the most easily identifiable motion artifacts.
Prolonged signal alterations: In addition to immediate disruptions, motion can induce longer-duration artifacts lasting 8-10 seconds [13]. The origin of these prolonged artifacts remains incompletely understood but may involve motion-related changes in CO2 from yawning or deep breathing, or slow equilibration of signal disruptions following large movements [13].
Spin-history effects: Nonlinear persistence of motion artifacts can occur due to spin excitation history effects that continue for some time after movement cessation [13]. These effects create complex temporal dependencies that simple regression approaches may not fully capture.
Frequency-specific contamination: While traditional band-pass filtering (0.01-0.1 Hz) targets the frequency range of neural BOLD fluctuations, motion artifacts contaminate this frequency band and cannot be effectively removed by standard filtering alone [13].

Methodological Comparisons for Motion Mitigation

FC Estimation Methods and Motion Sensitivity

The choice of functional connectivity metric significantly influences sensitivity to motion artifacts. Different FC measures exhibit varying resilience to motion contamination:

Table 3: Motion Sensitivity Across Functional Connectivity Measures

FC Method	Sensitivity to Motion	Test-Retest Reliability	Fingerprinting Accuracy	Recommended Use Cases
Full correlation	High residual distance-dependent motion relationship	High	High	Studies with low-motion participants where reliability is prioritized
Partial correlation	Low sensitivity to motion artifact	Intermediate	Low	Motion-prone populations when system identifiability is key
Coherence-based measures	Low sensitivity	Variable	Variable	Supplemental analysis to confirm correlation-based findings
Information theory measures	Low sensitivity	Variable	Variable	Exploring non-linear connectivity resistant to motion

A systematic evaluation of eight FC measures revealed that full correlation maintains high test-retest reliability and fingerprinting accuracy but shows relatively high residual distance-dependent relationships with motion, even after rigorous mitigation [15]. Partial correlation offers the best of both worlds with low motion sensitivity and intermediate system identifiability, though with lower test-retest reliability and fingerprinting accuracy [15]. Importantly, certain networks—particularly the default mode and retrosplenial temporal sub-networks—show high motion correlation across all FC methods, indicating their particular vulnerability [15].

Motion Mitigation Pipelines and Performance

Various preprocessing strategies have been developed to address motion artifacts, each with distinct strengths and limitations:

Global Signal Regression (GSR): This controversial approach effectively reduces distance-dependent motion artifacts but may introduce artificial negative correlations and remove neural signals of interest [13]. The decision to use GSR involves trade-offs between motion mitigation and signal preservation.
Motion Censoring ("Scrubbing"): Removing high-motion volumes (typically FD > 0.2-0.5 mm) reduces motion-related artifacts but can introduce biases by disproportionately excluding data from high-motion participants, who often represent clinically interesting populations [1] [13]. Censoring effectively addresses overestimation biases but shows limited efficacy for underestimation biases [1].
Physiological Noise Modeling: Incorporating cardiac and respiratory measures can address physiological sources of motion artifact [12] [1]. The ABCD-BIDS pipeline, which includes respiratory filtering, reduces motion-related variance by approximately 69% compared to minimal processing alone [1].
Multi-echo Sequences: Advanced acquisition techniques using multiple echo times can help distinguish motion-related from neural signals but require specialized sequences not universally available [1].

Diagram 1: Spatiotemporal pathways of motion artifacts leading to overestimation and underestimation biases in trait-FC research. Motion induces distinct temporal artifacts that manifest as specific spatial patterns in functional connectivity, ultimately producing systematic biases that can either exaggerate or mask true trait-FC relationships.

Experimental Protocols for Motion Impact Assessment

SHAMAN Implementation Protocol

The SHAMAN methodology provides a standardized approach for quantifying motion impact on specific trait-FC relationships:

Data Requirements: One or more resting-state fMRI scans per participant with framewise displacement (FD) calculations for each volume. Trait measures should be stable over the scanning timeframe.
Timeseries Splitting: For each participant, split the preprocessed BOLD timeseries into high-motion and low-motion halves based on median FD split within subject.
FC Calculation: Compute separate functional connectivity matrices for high-motion and low-motion halves using the preferred FC metric (e.g., full correlation, partial correlation).
Motion Impact Score Calculation:
- Compute the difference in FC between high-motion and low-motion halves for each connection
- Calculate the spatial correlation between this motion-FC difference and the trait-FC effect
- Use permutation testing (typically 1,000-10,000 permutations) to establish statistical significance
- Apply non-parametric combining across connections to generate overall motion impact scores
Bias Direction Determination:
- Positive significant scores = motion overestimation of trait-FC effect
- Negative significant scores = motion underestimation of trait-FC effect
Validation: Repeat analysis with different motion censoring thresholds to assess robustness of findings [1].

Motion-Robust FC Analysis Protocol

For researchers studying traits potentially correlated with motion, the following comprehensive protocol is recommended:

Data Acquisition:
- Collect physiological recordings (cardiac, respiration) concurrently with fMRI
- Use multi-echo sequences if available
- Implement real-time motion monitoring and feedback
Preprocessing:
- Apply volume-based realignment, generating FD estimates
- Incorporate physiological noise models (e.g., RETROICOR, HRV/RRV)
- Implement aggressive motion censoring (FD < 0.2 mm) for initial analysis
- Compare with minimal censoring (FD < 0.5 mm) to assess robustness
FC Estimation:
- Calculate multiple FC metrics (full correlation, partial correlation)
- Compare results across metrics to identify motion-resistant findings
- For clinical group comparisons, include matched-motion subgroups
Motion Impact Assessment:
- Apply SHAMAN to traits of interest
- Report motion impact scores and significance values
- Acknowledge limitations for traits with significant motion impact
Reporting:
- Document mean FD and exclusion rates for all groups
- Report correlation between motion and trait measures
- Present results with and without motion censoring
- Acknowledge motion impact when interpreting findings

Table 4: Essential Research Reagent Solutions for Motion-Robust FC Research

Tool/Category	Specific Examples	Function/Purpose	Implementation Considerations
Motion Quantification	Framewise Displacement (FD), DVARS	Quantifies volume-to-volume head movement	Use standardized FD calculation (Jenkinson et al.) for cross-study comparisons
Denoising Pipelines	ABCD-BIDS, fMRIPrep, HCP Pipelines	Standardized preprocessing for motion mitigation	ABCD-BIDS includes respiratory filtering and motion regression
Motion Impact Assessment	SHAMAN, Distance-Dependent Correlation	Quantifies trait-specific motion impact	SHAMAN distinguishes overestimation vs. underestimation
Physiological Monitoring	Cardiac pulse, Respiration belt, CO2 monitoring	Captures physiological sources of motion	Essential for modeling cardiopulmonary artifacts
Real-Time Motion Control	FIRMM, FIDUCIAL	Provides immediate motion feedback during scanning	Reduces data loss by alerting to excessive motion
FC Metric Suites	Full/partial correlation, Coherence, Mutual information	Multiple connectivity measures with varying motion sensitivity	Using multiple metrics strengthens motion-resistant findings
Multivariate Pattern Analysis	MVPA, Pattern classification	Detects subvoxel patterns resistant to motion	Can reveal information not apparent in univariate analyses

The spatiotemporal signature of motion on functional connectivity represents a fundamental methodological challenge that transcends conventional nuisance regression approaches. The systematic patterns of motion artifacts—with distinct spatial distributions across brain networks and temporal profiles across timescales—can produce both overestimation and underestimation biases in trait-FC relationships. The development of specialized methods like SHAMAN provides researchers with tools to quantify these specific biases, while comparative analyses of FC metrics offer guidance for selecting motion-resistant connectivity measures.

Future advances will require continued refinement of dynamic motion correction methods, improved integration of physiological monitoring, and development of study designs that minimize motion confounds through engaging paradigms. Most importantly, researchers must acknowledge and address motion artifacts as a core analytical consideration rather than a peripheral preprocessing step. By implementing comprehensive motion assessment protocols and transparently reporting motion impact, the field can enhance the validity and reproducibility of functional connectivity research across diverse populations and clinical conditions.

In-scanner head motion is now recognized as a major methodological challenge for studies of functional connectivity (FC), not merely as random noise but as a systematic confound that can produce both spurious positive (overestimation) and negative (underestimation) biases in trait-FC relationships [13] [1]. This confound is particularly problematic because in-scanner motion is frequently correlated with traits of interest such as age, clinical status, cognitive ability, and symptom severity [13]. For example, individuals with attention-deficit/hyperactivity disorder (ADHD) or autism often exhibit higher in-scanner head motion than neurotypical participants, creating a systematic bias that can generate false positive or negative findings if not adequately addressed [1]. Understanding and mitigating these motion-induced biases is especially crucial for researchers and drug development professionals employing brain-wide association studies (BWAS) to identify meaningful brain-behavior relationships, as motion artifacts can substantially alter inference in studies of lifespan development, individual differences, and clinical populations [13] [1].

The spatial and temporal characteristics of motion artifacts are well-documented. Motion typically decreases long-distance connectivity while increasing short-range connectivity, most notably in the default mode network [1]. These effects occur because motion causes signal decrements across the entire brain parenchyma simultaneous with large signal increases at brain boundaries due to partial volume effects [13]. The temporal signature includes both brief, large-amplitude signal changes immediately following movement and longer-duration artifacts potentially lasting 8-10 seconds, which may result from motion-related physiological changes or interactions between motion direction and image phase encoding [13].

Quantitative Comparison of Denoising Strategies and Their Efficacy

Performance Benchmarks for Confound Regression Strategies

Table 1: Comparative Performance of Major Denoising Approaches for Motion Artifact Reduction

Denoising Method	Residual Motion-FC Relationship	Distance-Dependence Artifacts	Network Identifiability	Data Retention	Key Limitations
Global Signal Regression (GSR)	Minimal	Introduces distance-dependent artifacts	Moderate	High	Can introduce negative correlations; controversial biological interpretation [16]
Motion Censoring ("Scrubbing")	Significantly reduced	Mitigates distance-dependence	High	Lower (data loss)	Reduces usable data; potential exclusion of high-motion participants [1] [17]
Motion Parameter Regression	Moderate reduction	Limited effect on distance-dependence	Moderate	High	Ineffective for nonlinear motion effects; incomplete artifact removal [13] [17]
Low-Pass Filtering	Partial reduction	Variable impact	Moderate	High	May remove neural signals along with artifacts [13]
Combined GSR + Censoring	Minimal	Mitigates distance-dependence	High	Moderate	Optimal balance for many applications [18]

Trait-Specific Motion Impact Across Methods

Table 2: Motion Impact on Specific Traits After Different Denoising Approaches (ABCD Study, n=7,270)

Trait Category	Significant Motion Overestimation (Minimal Processing)	Significant Motion Underestimation (Minimal Processing)	Motion Overestimation (FD < 0.2 mm Censoring)	Motion Underestimation (FD < 0.2 mm Censoring)
Cognitive/Educational	42% (19/45 traits)	38% (17/45 traits)	2% (1/45 traits)	No reduction
Psychiatric Symptoms	High susceptibility	High susceptibility	Substantially reduced	Persistent
Physical Metrics	Moderate susceptibility	Moderate susceptibility	Minimal	Persistent
Demographic Factors	Variable	Variable	Minimal	Variable

Recent research using the Split Half Analysis of Motion Associated Networks (SHAMAN) method has quantified how motion specifically impacts trait-FC relationships. In the large-scale Adolescent Brain Cognitive Development (ABCD) Study, after standard denoising without motion censoring, 42% of examined traits showed significant motion overestimation scores, while 38% showed significant underestimation scores [1]. After stringent motion censoring at framewise displacement (FD) < 0.2 mm, significant overestimation was reduced to just 2% of traits, but the number of traits with significant motion underestimation scores was not decreased [1]. This demonstrates the complex and persistent nature of motion artifacts, where different denoising strategies may mitigate one type of bias while potentially exacerbating another.

Experimental Protocols for Motion Impact Assessment

The SHAMAN Methodology for Trait-Specific Motion Quantification

The Split Half Analysis of Motion Associated Networks (SHAMAN) represents a novel approach for computing a trait-specific motion impact score that operates on one or more resting-state fMRI scans per participant [1]. The methodology capitalizes on the observation that traits (e.g., weight, intelligence) are stable over the timescale of an MRI scan, whereas motion is a state that varies from second to second [1]. The experimental protocol proceeds as follows:

Data Acquisition and Preprocessing: Acquire resting-state fMRI data using standardized protocols (e.g., 6-20 minutes of resting-state data). Apply minimal preprocessing including motion correction through frame realignment. The ABCD-BIDS default denoising algorithm includes global signal regression, respiratory filtering, spectral filtering, despiking, and regressing out motion parameter timeseries [1].
Framewise Displacement Calculation: Compute framewise displacement (FD) using the formula derived from Power et al. [13], which summarizes the frame-to-frame changes in the six motion parameters (three translations, three rotations).
Time-Series Splitting: For each participant, split the fMRI timeseries into high-motion and low-motion halves based on median FD values.
Trait-FC Effect Calculation: Compute the correlation between the trait and functional connectivity in both the high-motion and low-motion halves.
Motion Impact Score Determination: Calculate the difference in trait-FC effects between the high-motion and low-motion halves. A significant difference indicates that motion impacts the trait-FC relationship.
Directionality Assessment: A motion impact score aligned with the direction of the trait-FC effect indicates motion overestimation (inflated effect). A score opposite the trait-FC effect indicates motion underestimation (deflated effect) [1].
Statistical Significance Testing: Use permutation testing and non-parametric combining across pairwise connections to generate a p-value distinguishing significant from non-significant motion impacts.

Benchmarking Protocol for Denoising Pipelines

Ciric et al. [16] established a systematic protocol for evaluating participant-level denoising pipelines according to four key benchmarks:

Residual Motion-FC Relationship: Quantify the correlation between mean FD and connectivity after denoising, with better pipelines showing smaller correlations.
Distance-Dependence of Motion Effects: Evaluate whether motion artifacts exhibit the characteristic distance-dependent pattern (increased short-range connectivity, decreased long-range connectivity).
Network Identifiability: Assess the ability to recover known functional network modules from the denoised connectome data.
Degrees of Freedom Lost: Account for the statistical cost of each denoising method, as more aggressive approaches may remove meaningful neural variance along with artifacts [16].

This benchmarking approach enables objective comparison of denoising strategies and facilitates selection of appropriate methods for specific research goals.

Visualization of Motion Confounding Mechanisms and Analytical Approaches

The Mechanism of Motion Confounding in Trait-FC Studies

(Diagram 1: Motion as a confounder in trait-FC studies. Motion creates artifacts in FC measures that can either mask or inflate true trait-FC relationships, particularly when genetic factors influence both motion and the trait of interest.)

The SHAMAN Analytical Workflow for Motion Impact Assessment

(Diagram 2: The SHAMAN analytical workflow for quantifying trait-specific motion impact. This method capitalizes on the stability of traits versus the state-dependent nature of motion to identify spurious associations.)

The Scientist's Toolkit: Essential Methods and Reagents

Table 3: Essential Tools and Methods for Motion Confound Management in Trait-FC Research

Tool/Method	Primary Function	Key Implementation Considerations
Framewise Displacement (FD)	Quantifies volume-to-volume head motion	Multiple calculation methods exist (Power vs. Jenkinson); affects threshold selection [13]
fMRIPrep	Standardized fMRI preprocessing pipeline	Provides robust, reproducible preprocessing; minimizes manual intervention [19]
SHAMAN Analysis	Quantifies trait-specific motion impact	Requires sufficient scan duration for split-half analysis; adaptable to covariates [1]
Global Signal Regression (GSR)	Removes global motion-related variance	Controversial due to potential introduction of negative correlations; use judiciously [16]
Motion Censoring ("Scrubbing")	Removes high-motion volumes from analysis	Balance between artifact reduction and data retention; FD threshold typically 0.2-0.5mm [1] [17]
Low-Pass Filtering of Motion Parameters	Reduces high-frequency respiration contamination	Particularly beneficial in single-band fMRI datasets; improves FD accuracy [18]
ANTs/FSL	Image registration and normalization	Critical for accurate spatial alignment; choice affects motion correction efficacy [19]
Ciric Benchmarking Framework	Evaluates denoising pipeline performance	Assesses four key benchmarks: residual motion, distance-dependence, network identifiability, and degrees of freedom [16]

Motion artifact in trait-FC research represents a complex challenge that extends beyond simple noise to include systematic confounding that can either inflate or deflate observed brain-behavior relationships [1]. The most effective approaches combine multiple denoising strategies, typically including global signal regression with selective motion censoring, while acknowledging that no method completely eliminates motion-related bias [16] [18]. The development of trait-specific motion impact assessment tools like SHAMAN represents a significant advance, enabling researchers to quantify and account for motion artifacts in their specific research context [1]. As large-scale brain-wide association studies continue to grow, implementing robust, transparent methods for motion management will be essential for generating valid, reproducible findings in neuroimaging research.

Quantifying the Impact: From SHAMAN to Censoring Strategies

In-scanner head motion represents the largest source of artifact in functional magnetic resonance imaging (fMRI) data, introducing systematic bias to resting-state functional connectivity (FC) that is not completely removed by standard denoising algorithms [1]. This technical challenge is particularly acute for researchers studying clinical or behavioral traits inherently associated with greater motion, such as psychiatric disorders. Without knowing whether observed trait-FC relationships are impacted by residual motion, researchers risk reporting false positive results that do not reflect genuine neural associations [1]. The problem persists even in large-scale studies like the Adolescent Brain Cognitive Development (ABCD) Study, where standard denoising with ABCD-BIDS leaves 23% of signal variance explained by head motion—a substantial improvement over minimally processed data (73%), but still problematic for detecting true neurobiological relationships [1].

The SHAMAN (Split Half Analysis of Motion Associated Networks) framework addresses this fundamental methodological challenge by providing researchers with a quantitative tool to assign a motion impact score to specific trait-FC relationships [1]. This framework is particularly valuable because it distinguishes between motion causing overestimation or underestimation of trait-FC effects—a critical distinction for accurate interpretation of brain-behavior associations [1]. In an assessment of 45 traits from n=7,270 participants in the ABCD Study, SHAMAN revealed that after standard denoising without motion censoring, 42% (19/45) of traits had significant motion overestimation scores while 38% (17/45) had significant underestimation scores [1].

SHAMAN Methodology: Core Principles and Workflow

Theoretical Foundation

SHAMAN capitalizes on a fundamental distinction between traits and motion states: traits (e.g., weight, intelligence) remain stable over the timescale of an MRI scan, whereas motion is a state that varies from second to second [1]. This temporal dissociation enables the framework to detect when state-dependent motion artifacts systematically influence estimates of trait-dependent neural correlations.

The method improves upon earlier approaches by operating effectively with one or more resting-state fMRI scans per participant, accommodating covariates in statistical models, and providing directional information about whether motion artifact inflates or suppresses observed trait-FC relationships [1].

Experimental Protocol and Workflow

The SHAMAN framework implements a sophisticated analytical pipeline:

Data Preparation and Preprocessing

Acquire resting-state fMRI data with associated framewise displacement (FD) calculations
Apply standard denoising procedures (e.g., ABCD-BIDS pipeline including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter timeseries regression)
Optional: Apply post-hoc motion censoring at various FD thresholds (e.g., FD < 0.2 mm)

Split-Half Analysis Procedure

Divide each participant's fMRI timeseries into high-motion and low-motion halves based on framewise displacement metrics
Calculate functional connectivity matrices separately for high-motion and low-motion segments
Compute difference in correlation structure between high-motion and low-motion halves for each participant

Statistical Analysis and Score Calculation

Measure significance of differences in connectivity between motion states using permutation testing and non-parametric combining across pairwise connections
Calculate motion impact score with directionality indicating overestimation or underestimation
Relate motion impact score direction to trait-FC effect direction:
- Alignment indicates motion overestimation of trait-FC effect
- Opposition indicates motion underestimation of trait-FC effect

Table 1: Key Analytical Outputs of the SHAMAN Framework

Output Metric	Interpretation	Statistical Foundation
Motion Overestimation Score	Motion artifact inflates observed trait-FC effect	Significant positive association between motion-FC and trait-FC effects
Motion Underestimation Score	Motion artifact suppresses observed trait-FC effect	Significant negative association between motion-FC and trait-FC effects
Motion Impact p-value	Statistical significance of motion's influence on trait-FC effect	Permutation-based non-parametric testing

Figure 1: SHAMAN Analytical Workflow - The computational pipeline for calculating motion impact scores from fMRI data, showing key decision points for classifying overestimation versus underestimation effects.

Performance Comparison: SHAMAN Versus Alternative Approaches

Comparative Framework Efficacy

SHAMAN addresses specific limitations of previous methods for quantifying motion artifact in trait-FC research. Traditional approaches include measuring distance-dependent correlations at different motion censoring levels, assessing spatial similarity between trait-FC and motion-FC effects, and comparing trait-FC effects between motion-matched groups [1]. While each method offers insights, they share a fundamental limitation: inability to establish clear thresholds for acceptable versus unacceptable levels of trait-specific motion artifact [1].

Table 2: Methodological Comparison of Motion Artifact Assessment Approaches

Method	Key Strengths	Key Limitations	Trait-Specific Thresholds
SHAMAN Framework	Distinguishes over/underestimation; Works with single scan; Accommodates covariates	Computational intensity; Requires sufficient data for split-half analysis	Yes - provides significance testing for trait-specific motion impact
Distance-Dependent Correlations	Identifies systematic spatial patterns of motion artifact	Does not establish trait-specific significance; Agnostic to hypothesis	No - general motion effects only
Spatial Similarity Analysis	Quantifies overlap between trait-FC and motion-FC maps	Cannot distinguish directionality of bias; No statistical thresholding	No - qualitative assessment only
Motion-Matched Group Comparison	Controls for motion differences between groups	Requires large samples; Lacks single-subject applicability; Resource intensive	Partial - group-level inferences only

Empirical Performance in Large-Sample Applications

Application of SHAMAN to the ABCD Study dataset (n=7,270) after standard denoising with ABCD-BIDS revealed the pervasive nature of motion artifacts in trait-FC research [1]. Without motion censoring, the framework identified significant motion impacts on the majority of traits assessed:

42% (19/45) of traits showed significant motion overestimation scores [1]
38% (17/45) of traits showed significant motion underestimation scores [1]
Only 20% (9/45) of traits showed no significant motion impact [1]

The introduction of motion censoring at framewise displacement (FD) < 0.2 mm dramatically reduced significant overestimation to just 2% (1/45) of traits, demonstrating the effectiveness of rigorous motion censoring for controlling false positive inflation [1]. However, this same censoring threshold did not decrease the number of traits with significant motion underestimation scores, highlighting how different types of motion artifacts require distinct mitigation strategies [1].

Motion Overestimation vs. Underestimation: Differential Impact and Mitigation

Distinct Mechanisms and Consequences

The SHAMAN framework's ability to distinguish between motion overestimation and underestimation addresses a critical gap in neuroimaging methodology. These directional effects represent fundamentally different forms of bias with distinct implications for interpretation:

Motion Overestimation Effects occur when motion artifact systematically inflates observed trait-FC relationships, creating false positive associations that can lead to erroneous conclusions about brain-behavior relationships. This form of bias is particularly problematic for studies of clinical populations who typically exhibit greater motion, potentially creating illusory neural correlates of psychiatric symptoms [1].

Motion Underestimation Effects occur when motion artifact systematically suppresses genuine trait-FC relationships, creating false negative findings that may cause researchers to overlook true neurobiological associations. This form of bias reduces statistical power and may lead to important neural correlates of behavior being missed in analysis [1].

Differential Response to Mitigation Strategies

The effectiveness of motion mitigation strategies differs substantially for overestimation versus underestimation effects:

Motion Censoring Efficacy

Overestimation: Censoring at FD < 0.2 mm reduced significant overestimation from 42% to 2% of traits [1]
Underestimation: Same censoring threshold produced no decrease in significant underestimation scores [1]

Denoising Algorithm Performance

Standard denoising with ABCD-BIDS achieved 69% relative reduction in motion-related signal variance compared to minimal processing alone [1]
Despite this improvement, residual motion-FC effects remained strongly correlated with average FC matrix (Spearman ρ = -0.58) [1]
This residual motion effect persisted even after stringent censoring (Spearman ρ = -0.51) [1]

Essential Research Reagents and Computational Tools

Table 3: Essential Research Resources for SHAMAN Framework Implementation

Resource Category	Specific Tools/Measures	Function in SHAMAN Workflow
Motion Quantification	Framewise Displacement (FD), DVARS	Quantifies head motion in fMRI timeseries; used for split-half classification
Denoising Pipelines	ABCD-BIDS, FSL FIX, Global Signal Regression	Removes motion artifacts from BOLD signal prior to SHAMAN analysis
Functional Connectivity Metrics	Pearson correlation, Partial correlation, Spectral coherence	Quantifies functional connectivity between brain regions for trait-FC and motion-FC effects
Statistical Frameworks	Permutation testing, Non-parametric combining, Linear mixed models	Provides statistical foundation for motion impact score calculation and significance testing
Large-Scale Datasets	ABCD Study, Human Connectome Project, UK Biobank	Validation cohorts for establishing generalizability of motion impact findings

The SHAMAN framework represents a significant methodological advancement for identifying and quantifying motion-related artifacts in trait-FC research. By providing directional motion impact scores with statistical thresholds, the method enables researchers to distinguish between false positive inflations (overestimation) and genuine relationships suppressed by noise (underestimation).

The empirical demonstration that motion censoring effectively addresses overestimation but not underestimation effects has profound implications for study design and analytical planning in developmental, clinical, and cognitive neuroscience [1]. Researchers investigating motion-correlated traits must implement comprehensive motion mitigation strategies that extend beyond standard denoising and censoring approaches.

Future applications of SHAMAN may include prospective study design optimization, data quality monitoring during acquisition, and refinement of individualized denoising strategies. As the field continues to recognize the nuanced ways in which motion artifacts can distort brain-behavior relationships, frameworks like SHAMAN provide essential tools for ensuring the validity and reproducibility of functional connectivity research.

Implementing Framewise Displacement (FD) and Censoring (Scrubbing)

Framewise Displacement (FD) and censoring (scrubbing) are established techniques for mitigating motion artifacts in functional magnetic resonance imaging (fMRI). However, emerging research reveals a critical trade-off: while aggressive motion censoring effectively reduces motion overestimation (false positive trait-FC relationships), it is less effective against motion underestimation (false negative trait-FC relationships) and can significantly impact data retention and reliability. This guide compares FD-based scrubbing with data-driven alternatives, providing experimental data to inform preprocessing decisions in trait-FC effects research.

Experimental Protocols and Methodologies

The SHAMAN Framework for Quantifying Motion Impact

Objective: To assign a trait-specific motion impact score that distinguishes between overestimation and underestimation of trait-functional connectivity (trait-FC) effects [20].

Procedure:
- Data Splitting: For each participant, the resting-state fMRI (rs-fMRI) timeseries is split into high-motion and low-motion halves based on Framewise Displacement (FD).
- Trait-FC Effect Calculation: The correlation between a trait (e.g., cognitive score) and functional connectivity (FC) is computed separately for each half.
- Impact Score Calculation: A significant difference in the trait-FC correlation between the two halves indicates a motion impact.
  - A motion impact score aligned with the trait-FC effect direction indicates motion overestimation (inflating the observed effect).
  - A motion impact score opposite the trait-FC effect direction indicates motion underestimation (masking the true effect).
- Statistical Testing: Permutation testing and non-parametric combining across connections yield a significant p-value for the motion impact score [20].

Benchmarking Denoising and Censoring Pipelines

Objective: To evaluate the efficacy of various motion correction strategies, including FD scrubbing, using multiple quality control benchmarks [21].

Procedure:
- Pipeline Application: Multiple retrospective denoising pipelines are applied to rs-fMRI datasets. These can include:
  - Motion Parameter Regression: Regressing out head motion parameters.
  - Tissue Signal Regression: Regressing out signals from white matter and cerebrospinal fluid.
  - Global Signal Regression (GSR): Regressing out the global mean signal.
  - Volume Censoring (Scrubbing): Removing volumes with FD exceeding a threshold (e.g., 0.2 mm).
  - ICA-AROMA: An ICA-based method for automatic removal of motion artifacts.
- Benchmark Evaluation: Each pipeline is evaluated on:
  - QC-FC Correlation: The residual correlation between head motion and FC after denoising.
  - Distance-Dependence: How the residual motion artifact varies with the distance between brain regions.
  - Data Loss: The temporal degrees of freedom lost.
  - Test-Retest Reliability: The intraclass correlation coefficient (ICC) of FC estimates [21].

Performance Comparison Data

Table 1: Efficacy of FD Censoring on Motion Overestimation and Underestimation

Data derived from SHAMAN analysis of 45 traits in the ABCD Study (n=7,270 participants) after denoising with ABCD-BIDS pipeline [20].

Censoring Threshold (FD)	Traits with Significant Motion Overestimation	Traits with Significant Motion Underestimation
No Censoring	42% (19/45 traits)	38% (17/45 traits)
< 0.2 mm	2% (1/45 traits)	38% (17/45 traits)

Table 2: Comparative Performance of Motion Correction Strategies

Synthesized data from benchmarks evaluating validity, reliability, and data retention [22] [21].

Motion Correction Strategy	Residual Motion Artifact	Test-Retest Reliability	Data Retention (Volumes/Subjects)	Key Limitations
FD Scrubbing (Stringent)	Very Low [21]	Diminished reliability with increased censoring [23]	Low [22]	Arbitrary threshold; high data loss; fails to address underestimation [20]
ICA-AROMA	Low [21]	Good [21]	Moderate [21]	Requires high-quality ICA; may alter neural signals
Projection Scrubbing	Low [22]	Good, not generally worsened [22]	High [22]	Statistically complex; may require customization
Global Signal Regression (GSR)	Low (but increases distance-dependence) [21]	Good [21]	High (no data removal)	Controversial; may remove neural signal [21]

Workflow and Decision Pathways

The following diagram illustrates the procedural workflow for implementing FD scrubbing and the key decision points that influence its impact on trait-FC research, based on the experimental data.

FD Scrubbing Impact Pathways

Table 3: Key Computational Tools and Metrics for Motion Correction

A curated list of essential materials and their functions for implementing and evaluating FD and censoring protocols.

Tool / Metric	Function	Application Context
Framewise Displacement (FD)	Quantifies volume-to-volume head movement from translational and rotational parameters [24].	Primary metric for identifying motion-contaminated volumes to censor.
SHAMAN	A method to compute a trait-specific motion impact score, distinguishing overestimation from underestimation [20].	Critical for post-hoc validation of whether trait-FC findings are confounded by motion.
ICA-AROMA	A data-driven, ICA-based method for automatic removal of motion artifacts from fMRI data [21].	An alternative or complement to FD scrubbing to mitigate motion with less data loss.
Projection Scrubbing	A data-driven scrubbing method using a statistical outlier detection framework and dimension reduction [22].	An alternative to motion-based scrubbing that flags volumes displaying abnormal patterns.
QC-FC Correlation	A group-level metric calculating the correlation between participant motion (mean FD) and functional connectivity [24] [21].	Benchmarks the overall effectiveness of a denoising pipeline in removing motion artifacts.
Intraclass Correlation Coefficient (ICC)	Measures the test-retest reliability of functional connectivity metrics [23].	Evaluates how motion correction strategies impact the stability and reproducibility of findings.
ABCD-BIDS Pipeline	A standardized preprocessing pipeline for the ABCD Study dataset, including motion parameter regression, despiking, and filtering [20].	An example of a comprehensive, publicly available denoising pipeline for large-scale datasets.

The implementation of FD and censoring is a double-edged sword. While stringent censoring (FD < 0.2 mm) is highly effective at mitigating motion overestimation, it does not reduce motion underestimation and incurs a high cost in data retention and reliability [20] [23]. Data-driven methods like projection scrubbing and ICA-AROMA offer a more balanced approach, improving data retention without generally worsening validity and reliability [22] [21]. For rigorous trait-FC research, a combination of stringent preprocessing, moderate censoring, and post-hoc validation with frameworks like SHAMAN is recommended to ensure results are not driven by motion-related artifacts.

Functional connectivity (FC) derived from functional magnetic resonance imaging (fMRI) has become a cornerstone technique for investigating the brain's intrinsic organization and its relationship to cognition, behavior, and clinical conditions [25] [26]. The blood oxygenation level-dependent (BOLD) signal measured by fMRI is notoriously susceptible to various noise sources, with in-scanner head motion representing one of the most significant confounders [27]. These motion-induced artifacts can introduce spurious correlations that obscure neuronally-driven functional connections, potentially leading to invalid conclusions about brain organization and its behavioral correlates [27] [26].

In trait-based functional connectivity (trait-FC) research, which seeks to relate individual differences in FC patterns to stable behavioral or clinical characteristics, the confounding effects of motion present a particularly challenging problem. Motion can disproportionately affect different populations (e.g., children, elderly, clinical groups) and experimental conditions (e.g., rest vs. cognitive tasks), potentially creating systematic biases that masquerade as meaningful neurobiological findings [27] [28]. The central thesis in contemporary denoising research revolves around the tension between motion overestimation—where aggressive denoising removes neural signal of interest—and motion underestimation—where residual artifacts contaminate FC estimates. This balance is especially critical in trait-FC studies, where the goal is to preserve meaningful individual differences while eliminating non-neural confounds [26].

Regression-based approaches constitute some of the most widely used denoising methods in the fMRI preprocessing toolkit. These include techniques such as global signal regression (GSR), white matter and cerebrospinal fluid regression (WM-CSF), and anatomical CompCor (aCompCor), which operate by statistically removing variance associated with putative noise sources [29] [28]. Despite their popularity, evidence suggests these methods have fundamental limitations in adequately addressing motion-related artifacts while preserving neural signals relevant for trait-FC effects [27] [26]. This review comprehensively evaluates the performance of regression-based denoising approaches against alternative strategies, examining their efficacy through multiple benchmarking metrics and their impact on trait-FC research outcomes.

Experimental Benchmarking: Methodologies for Evaluating Denoising Efficacy

The evaluation of denoising pipelines employs diverse methodological frameworks and benchmarking metrics. Understanding these experimental approaches is essential for interpreting comparative performance data.

Benchmarking Metrics and Quality Assessment

Researchers employ multiple quantitative metrics to assess denoising efficacy, each capturing different aspects of pipeline performance:

QC-FC correlations: Quantifies residual motion-related artifacts by correlating frame-wise displacement with subsequent FC estimates [27] [30]
Network identifiability: Measures the ability to recover known resting-state networks using templates from independent datasets [27] [29]
Distance-dependent effects: Evaluates spurious correlations between motion and connectivity as a function of inter-regional distance [27]
Structure-function coupling: Assesses alignment between functional connectivity and structural connectivity derived from diffusion MRI [25]
Behavioral prediction accuracy: Measures improvement in predicting individual differences in behavior from FC patterns after denoising [26]
Spectral characteristics: Examines retention of low-frequency neural signals versus removal of high-frequency physiological noise [28]

Recent methodological advances have introduced novel benchmarking approaches. The "summary performance index" combines multiple metrics into a unified measure that captures the trade-off between artifact removal and signal preservation [29]. Additionally, alternatives to QC-FC correlations have been proposed that are agnostic to problematic assumptions about null relationships between motion and FC [30].

Experimental Designs and Datasets

Comparative evaluations typically employ multiple datasets with different acquisition parameters and subject populations to ensure robust findings. Common datasets include:

Human Connectome Project (HCP): Features high-quality multiband fMRI data from healthy young adults [30] [26]
Consortium for Neuropsychiatric Phenomics (CNP): Includes data from healthy controls and patient populations [27] [26]
Genomics Superstruct Project (GSP): Comprises resting-state fMRI with extensive behavioral measures [26]
Centro Fermi (CF) dataset: Contains extended scans with alternating rest and task conditions [27]

Experimental protocols often involve processing the same dataset through multiple denoising pipelines followed by computation of benchmarking metrics. Some studies employ synthetic data with known ground truth to precisely quantify noise removal and signal preservation [29]. Task-based paradigms with different motion profiles (e.g., rest vs. cognitively demanding conditions) allow researchers to evaluate how well pipelines balance artifacts across functional states [27].

Table 1: Experimental Datasets Used in Denoising Pipeline Evaluations

Dataset	Sample Size	Key Features	Primary Use in Evaluation
Human Connectome Project (HCP)	1,200 subjects	High-quality multiband fMRI, extensive behavioral data	Motion artifact reduction, behavioral prediction accuracy [30] [26]
Consortium for Neuropsychiatric Phenomics (CNP)	121-130 subjects	Rest and task fMRI, patient and control groups	Condition-dependent denoising efficacy [27] [26]
Genomics Superstruct Project (GSP)	1,570 subjects	Resting-state fMRI, behavioral measures	Behavioral prediction, motion artifact reduction [26]
Centro Fermi (CF)	20 subjects	Extended scans with alternating rest/task blocks	Balancing motion artifacts across functional conditions [27]

Comparative Performance of Denoising Pipelines

Regression-Based Approaches: Limitations and Trade-offs

Regression-based denoising methods demonstrate particular limitations in addressing key challenges in trait-FC research:

Global Signal Regression (GSR) effectively reduces motion-related artifacts and improves the anatomical specificity of connectivity maps [28] [26]. However, it performs poorly in mitigating spurious distance-dependent associations between motion and connectivity [27]. A significant concern is its removal of low-frequency signals that may contain neurally relevant information, potentially associated with brain states such as arousal and vigilance [28]. While GSR can increase correlations between connectivity and behavior, it also results in substantially lower age-related FC differences, suggesting possible overcorrection in developmental and aging studies [28] [26].

Anatomical CompCor (aCompCor), which applies principal component analysis to signals from white matter and cerebrospinal fluid regions, shows better preservation of low-frequency signals compared to GSR [28]. When optimized to increase the noise prediction power of extracted confounding signals, it ranks among the most effective approaches for balancing artifacts between rest and task conditions [27]. However, like GSR, it demonstrates limited efficacy in reducing distance-dependent artifacts [27]. Its performance in enhancing brain-behavior associations is inconsistent across datasets [26].

WM-CSF Regression shares similar limitations with aCompCor but may be less effective at accounting for region-specific temporal variations of physiological effects [28]. Evidence suggests that white matter signals may contain information about brain function, raising concerns about potential removal of neurally relevant signals [28].

Table 2: Performance Profile of Regression-Based Denoising Methods

Method	Motion Artifact Reduction	Network Identifiability	Distance-Dependent Artifacts	Behavioral Prediction	Key Limitations
Global Signal Regression (GSR)	High effectiveness [28]	Moderate to high [27]	Poor reduction [27]	Variable across datasets [26]	Removes potentially neural signals, affects age-related differences [28]
Anatomical CompCor (aCompCor)	Moderate effectiveness [27]	High [27] [29]	Limited reduction [27]	Inconsistent across datasets [26]	Optimized version requires careful parameter tuning [27]
WM-CSF Regression	Moderate effectiveness [28]	Moderate [28]	Limited reduction [27]	Lower than ICA-based methods [26]	May not account for regional physiological variations [28]

Comparison with Alternative Denoising Strategies

Non-regression-based approaches demonstrate distinct performance profiles that highlight the limitations of regression-based methods:

ICA-AROMA (Automatic Removal of Motion Artifacts), an independent component analysis-based approach, effectively removes physiological noise but also removes substantial low-frequency signal power [28]. It is associated with markedly lower age-related FC differences, similar to GSR [28]. However, it better preserves the correlational structure of functionally relevant networks compared to regression methods [28]. For multi-echo data at 7T, the aggressive option of ICA-AROMA provides highly effective denoising, though it may remove more signal of interest compared to ME-ICA combined with aCompCor [31].

Volume Censoring (also known as "scrubbing") is the only approach that substantially reduces distance-dependent artifacts [27]. However, this benefit comes at the great cost of reduced network identifiability and potential introduction of biases due to data removal [27]. Its efficacy is highly dependent on parameter optimization, and it is considered cost-ineffective compared to other methods [27] [30].

Multi-Echo ICA (ME-ICA) leverages the TE-dependence of BOLD signals in multi-echo acquisitions to differentiate neural from non-neural components [31]. This approach demonstrates superior performance in decoupling motion and neuronal effects, particularly at ultra-high field strengths (7T) [31]. After ME-ICA application, data is best post-processed with methods like aCompCor to remove spatially diffuse noise [31].

Combined Methods often outperform individual pipelines. For example, ME-ICA combined with aCompCor provides effective denoising for multi-echo 7T fMRI data, potentially preserving more signal-of-interest compared to aggressive ICA-AROMA [31]. Similarly, pipelines combining ICA-FIX and GSR demonstrate a reasonable trade-off between motion reduction and behavioral prediction performance [26].

Denoising Methods and Impact on fMRI Data

Implications for Trait-FC Research: Motion Overestimation vs. Underestimation

The choice of denoising pipeline profoundly influences trait-FC findings, with different methods creating distinct biases in the relationship between motion and functional connectivity estimates.

The Fundamental Tension in Trait-FC Denoising

In trait-FC research, the primary challenge lies in balancing two opposing errors:

Motion underestimation occurs when denoising is insufficient, leaving residual artifacts that inflate FC correlations with motion. This creates spurious associations between motion and apparent individual differences in connectivity, particularly problematic when motion systematically varies with population characteristics (e.g., children vs. adults, patients vs. controls) [27]. Regression-based methods like WM-CSF and aCompCor show moderate effectiveness against motion artifacts but limited efficacy in reducing distance-dependent effects, leaving them vulnerable to motion underestimation biases [27].

Motion overestimation arises from overly aggressive denoising that removes neural signals along with artifacts. GSR and ICA-AROMA exemplify this risk, as both remove substantial low-frequency signal power and are associated with markedly lower age-related FC differences [28]. This overcorrection can obscure genuine neurobiological relationships, particularly in studies examining developmental, aging, or clinical populations where motion may correlate with the effects of interest [27] [28].

Impact on Brain-Behavior Relationships

Denoising choices significantly influence the detection and magnitude of brain-behavior associations:

Pipeline-dependent effects: No single pipeline universally excels at maximizing brain-behavior associations across different cohorts and behavioral measures [26]
Modest inter-pipeline variations: Differences in predictive performance across pipelines are typically modest, suggesting that excessive optimization for specific behavioral measures may risk overfitting [26]
Dataset-specific performance: Pipeline efficacy varies with dataset acquisition parameters, with no approach consistently outperforming others across the HCP, CNP, and GSP datasets [26]
Reliability-validity tradeoffs: While denoising improves FC reliability by reducing noise, excessive denoising may undermine validity by removing meaningful neural signals [26]

Table 3: Impact of Denoising Approach on Trait-FC Study Outcomes

Research Context	Primary Motion-Related Challenge	Limitations of Regression Approaches	Recommended Alternatives
Developmental/Aging Studies	Motion correlates with age; risk of both over- and under-correction [27] [28]	GSR and ICA-AROMA associated with substantially lower age-related FC differences [28]	aCompCor shows relatively higher age-related FC differences [28]
Clinical Populations	Patient groups often move differently; motion can confound group differences [27]	Limited efficacy in balancing artifacts across conditions [27]	Optimized aCompCor better balances rest/task conditions [27]
Brain-Behavior Prediction	Motion can induce spurious correlations or mask true relationships [26]	Inconsistent behavioral prediction across datasets [26]	Combined approaches (e.g., ICA-FIX+GSR) provide reasonable trade-off [26]
Task-Based FC	Differential motion across conditions (typically less in engaging tasks) [27]	Poorly balanced efficacy between rest and task conditions [27]	Volume censoring reduces distance-dependent artifacts but reduces data [27]

Experimental Toolkit for Denoising Pipeline Evaluation

Software and Computational Tools

HALFpipe: Provides a standardized workflow for fMRI analysis from raw data to group-level statistics, incorporating multiple software tools in a containerized environment to aid reproducibility [29]
fMRIPrep: Widely adopted preprocessing software that consistently applies initial processing steps before denoising [29] [26]
ICA-AROMA: Automated ICA-based component classification for motion artifact removal without requiring training data [28]
PySPI: Python package implementing 239 pairwise interaction statistics for comprehensive FC matrix benchmarking [25]

Reference Datasets and Benchmarks

Human Connectome Project (HCP): Provides high-quality multiband fMRI data with extensive behavioral measures for method validation [30] [25]
Consortium for Neuropsychiatric Phenomics (CNP): Includes resting-state and task fMRI data for evaluating condition-dependent denoising performance [27] [26]
Genomics Superstruct Project (GSP): Large-sample dataset with diverse behavioral measures for assessing denoising impact on brain-behavior prediction [26]

Evaluation Metrics and Criteria

QC-FC correlations: Standard approach for quantifying residual motion-related artifacts, though with recognized limitations [27] [30]
Network identifiability: Template-matching approaches for assessing biological signal preservation [27] [29]
Distance-dependent effects: Critical metric for evaluating motion-related spatial biases in connectivity [27]
Behavioral prediction accuracy: Emerging benchmark using cross-validated prediction of behavioral measures from FC patterns [26]

Regression-based denoising approaches, while foundational to fMRI preprocessing, demonstrate significant limitations in the context of trait-FC research. Their inability to consistently eliminate distance-dependent artifacts, their variable impact on brain-behavior associations across datasets, and their tendency to either over- or under-correct motion artifacts highlight the need for more nuanced solutions [27] [26].

The emerging consensus suggests that combined approaches—such as ICA-FIX with GSR or ME-ICA with aCompCor—often provide superior trade-offs between artifact removal and signal preservation compared to individual regression methods [31] [26]. However, no universal solution exists, and optimal pipeline selection depends on specific research questions, sample characteristics, and acquisition parameters.

Future methodological development should focus on techniques that more precisely differentiate neural from non-neural signals, perhaps by incorporating physiological recordings, leveraging multi-echo acquisitions, or developing subject-specific noise models. Furthermore, the field would benefit from standardized evaluation frameworks that simultaneously assess multiple performance metrics, including motion artifact reduction, network identifiability, and behavioral prediction accuracy.

In trait-FC research, where the goal is to extract meaningful individual differences from noisy data, researchers must carefully consider the limitations of regression-based approaches and selectively implement denoising strategies that balance the risks of motion overestimation and underestimation based on their specific study designs and populations.

Denoising is a critical preprocessing step in data analysis pipelines across scientific domains, from medical imaging to neuroscience. Its primary objective is to extract the underlying true signal from noisy observations, a process that inherently involves a trade-off between reducing statistical variance and introducing statistical bias. In functional connectivity (FC) research using resting-state functional magnetic resonance imaging (rs-fMRI), this balance is particularly crucial. Inaccurate denoising can systematically alter trait-FC relationships, leading to either motion overestimation (false positive findings) or motion underestimation (false negative findings) of true brain-behavior associations [1] [3].

This benchmark guide evaluates 19 denoising methods across multiple imaging modalities, with a specific focus on their performance characteristics and implications for trait-FC research. We synthesize experimental data from recent challenges and peer-reviewed studies to provide researchers, scientists, and drug development professionals with objective comparisons to inform their analytical pipeline choices.

The Impact of Denoising on Trait-FC Research

The Problem of Residual Motion Artifact

In-scanner head motion introduces systematic bias into rs-fMRI functional connectivity that is not completely removed by standard denoising algorithms [1] [3]. This residual motion artifact disproportionately affects research on clinical populations, as individuals with conditions such as ADHD or autism tend to exhibit higher in-scanner head motion [1]. The challenge is particularly acute in large-scale brain-wide association studies (BWAS), where even small effect sizes can reach statistical significance, potentially leading to false positive associations [1].

Recent research has quantified this problem using the Split Half Analysis of Motion Associated Networks (SHAMAN) framework, which assigns a motion impact score to specific trait-FC relationships [1]. In an assessment of 45 traits from n=7,270 participants in the Adolescent Brain Cognitive Development (ABCD) Study, 42% (19/45) of traits showed significant motion overestimation scores after standard denoising, while 38% (17/45) showed significant underestimation scores [1] [3]. This demonstrates that nearly half of all trait-FC relationships may be systematically biased by residual motion artifact.

Overestimation vs. Underestimation in Trait-FC Effects

The direction of motion impact reveals different methodological challenges:

Motion Overestimation: Occurs when the motion impact score aligns with the direction of the trait-FC effect, potentially creating false positive findings. This was the more prevalent issue in the ABCD dataset, affecting 42% of traits with standard denoising [1].
Motion Underestimation: Occurs when the motion impact score opposes the direction of the trait-FC effect, potentially obscuring true associations. This affected 38% of traits in the same dataset [1].

Notably, these two types of errors respond differently to mitigation strategies. Censoring at framewise displacement (FD) < 0.2 mm reduced significant overestimation to just 2% (1/45) of traits but did not decrease the number of traits with significant motion underestimation scores [1]. This asymmetric impact highlights the need for denoising methods that simultaneously address both types of bias.

Experimental Benchmark of Denoising Methods

Methodology for Comparative Evaluation

Our benchmark synthesis follows rigorous evaluation protocols established in recent challenges and peer-reviewed comparisons. For each domain, we standardized performance metrics to enable cross-method comparisons:

Medical and General Image Denoising: Algorithms were evaluated using Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM) on datasets with known ground truth [32] [33]. Additional perceptual quality metrics included NIQE, BRISQUE, and PIQE [32].
fMRI Denoising: Methods were assessed using the SHAMAN framework to calculate motion impact scores for trait-FC relationships, measuring both overestimation and underestimation effects [1].
MRSI Denoising: Performance was quantified using concentration root-mean-square error (RMSE) against gold standard metabolite maps in simulated data with known ground truth [34].
Microscopy Denoising: Challenges employed supervised evaluation with noisy/clean image pairs and required preservation of temporal dynamics in calcium imaging data [35].

Benchmark Results Table

Table 1: Comprehensive Benchmark of Denoising Methods Across Modalities

Method	Domain	Key Metric	Performance	Strengths	Limitations
BM3D [32]	Medical Images (MRI/HRCT)	PSNR/SSIM	Consistently outperformed others at low/moderate noise	Preserves structural integrity and perceptual quality	Limited performance at high noise levels
ABCD-BIDS [1]	rs-fMRI	Motion Impact Score	69% relative reduction in motion-related variance vs. minimal processing	Comprehensive pipeline: global signal regression, respiratory filtering, motion regression	Residual motion still explains 23% of variance after denoising
SHAMAN [1]	rs-fMRI	Trait-FC Bias Detection	Identified 42% of traits with overestimation, 38% with underestimation	Specifically designed to detect trait-FC bias from motion	Diagnostic tool rather than denoising method
SPIN-SVD [34]	Dynamic ²H-MRSI	Concentration RMSE	33.4% reduction in RMSE at in vivo SNR	Best preservation of local metabolic alterations in simulations	Not evaluated on fMRI data
tMPPCA [34]	Dynamic ²H-MRSI	Concentration RMSE	21.9% reduction in RMSE at in vivo SNR	Computational efficiency, good for absolute quantification of low SNR metabolites	Slightly lower accuracy than SPIN-SVD
SUPPORT [36]	Voltage Imaging	Poisson-Gaussian Noise Reduction	Preserves spike shape in simultaneous electrophysiological recording	Self-supervised, handles fast dynamics better than consecutive-frame methods	Requires specialized implementation
Hybrid AMF-MDBMF [33]	General Images (Salt & Pepper Noise)	PSNR/SSIM/IEF	PSNR improvement up to 2.34 dB, IEF >20% improvement	Excellent edge preservation, handles high-density noise	Limited evaluation beyond impulse noise
Global PS [34]	Dynamic ²H-MRSI	Concentration RMSE	29.3% reduction in RMSE at in vivo SNR	Effective noise reduction	Fails to preserve signal variations in simulations
Local PS [34]	Dynamic ²H-MRSI	Concentration RMSE	24.4% reduction in RMSE at in vivo SNR	Local processing approach	Not top performer in accuracy
Stacked PS [34]	Dynamic ²H-MRSI	Concentration RMSE	20.2% reduction in RMSE at in vivo SNR	Novel approach	Failed to preserve signal variations
GL-HOSVD [34]	Dynamic ²H-MRSI	Concentration RMSE	23.5% reduction in RMSE at in vivo SNR	Tensor structure preservation	Computationally intensive
DnCNN [32]	Medical Images	PSNR/SSIM	Competitive performance	Deep learning approach, handles significant noise variations	Requires training, computational resources
EPLL [32]	Medical Images	PSNR/SSIM	Competitive at high noise levels in homogeneous areas	Preserves fine texture	Computationally complex
WNNM [32]	Medical Images	PSNR/SSIM	Competitive at high noise levels in homogeneous areas	Effective in specific conditions	Computationally complex
DeepCAD-RT [36]	Calcium Imaging	Temporal Dynamics Preservation	Effective for sufficiently fast imaging	Self-supervised, no clean data required	Assumption breaks when imaging not faster than dynamics
DeepInterpolation [36]	Calcium Imaging	Temporal Dynamics Preservation	Effective for sufficiently fast imaging	Self-supervised approach	Limited with fast dynamics
Neighbor2Global [32]	General Images (Poisson-Gaussian)	PSNR/SSIM	Outperforms existing techniques on real images	Self-supervised with noise-level adaptation	Limited domain-specific validation
Non-Local Means [32]	Medical Images	PSNR/SSIM	Moderate performance	Classic approach, well understood	Outperformed by newer methods
Bilateral Filter [32]	Medical Images	PSNR/SSIM	Moderate performance	Edge-preserving qualities	Outperformed by newer methods

Specialized Method Performance

Table 2: Low-Rank Denoising Methods for Dynamic ²H-MRSI (Ranked by Performance) [34]

Method	Type	RMSE Reduction	Key Advantage	Implementation Complexity
SPIN-SVD	PS Variation	33.4%	Superior preservation of local metabolic alterations	Simple
Global PS	Partial Separability	29.3%	Effective noise reduction	Simple
tMPPCA	HOSVD-based	21.9%	Computational efficiency	Moderate
GL-HOSVD	HOSVD-based	23.5%	Tensor structure preservation	Complex
Local PS	Partial Separability	24.4%	Local processing approach	Moderate
Stacked PS	PS Variation	20.2%	Novel stacking approach	Moderate

Experimental Protocols and Workflows

SHAMAN Methodology for Detecting Motion Impact

The Split Half Analysis of Motion Associated Networks (SHAMAN) framework provides a robust method for quantifying motion-related bias in trait-FC relationships [1]. The experimental protocol involves:

Data Preparation: Process rs-fMRI data using standard denoising pipelines (e.g., ABCD-BIDS including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter regression) [1].
Split-Half Analysis: Divide each participant's fMRI timeseries into high-motion and low-motion halves based on framewise displacement (FD) values.
Trait-FC Effect Calculation: Compute correlation structures between the trait of interest and FC measures separately for high-motion and low-motion halves.
Impact Score Determination: Calculate the difference in trait-FC effects between high-motion and low-motion halves. A significant difference indicates motion impact.
Direction Classification:
- Overestimation: Motion impact score aligns with trait-FC effect direction
- Underestimation: Motion impact score opposes trait-FC effect direction
Statistical Testing: Use permutation testing and non-parametric combining across pairwise connections to assign significance values to motion impact scores [1].

Low-Rank Denoising Evaluation Protocol

For MRSI and related modalities, the evaluation of low-rank denoising methods follows this standardized protocol [34]:

Simulation Development: Create realistic simulations of dynamic ²H-MRSI brain data across a range of noise levels and B₀ inhomogeneity conditions, including specialized models with focal metabolic alterations (e.g., lesions).
Ground Truth Establishment: Use known input concentrations to create gold standard metabolite maps for quantitative comparison.
Method Application: Apply each denoising method to the simulated data:
- Matrix-based methods (Global PS, Local PS, Stacked PS, SPIN-SVD): Reshape data into matrices before low-rank approximation
- Tensor-based methods (GL-HOSVD, tMPPCA): Maintain natural tensor structure using HOSVD
Performance Quantification: Calculate concentration root-mean-square error (RMSE) between denoised results and gold standard maps.
In Vivo Validation: Apply top-performing methods to real dynamic ²H-MRSI human brain data to verify preservation of spatial and temporal metabolite variations.

The Scientist's Toolkit

Essential Research Reagent Solutions

Table 3: Key Reagents and Materials for Denoising Research and Application

Item	Function/Application	Relevance to Denoising Evaluation
Dynamic ²H-MRSI Phantom [34]	Simulation of brain metabolite concentrations	Provides ground truth for quantitative evaluation of denoising performance
[6,6'-²H₂]-glucose [34]	Deuterium-labelled metabolic substrate for ²H-MRSI	Enables dynamic metabolic flux measurements that require denoising
Framewise Displacement (FD) Metrics [1]	Quantification of head motion in fMRI	Essential for evaluating motion-related artifacts in trait-FC research
ABCD-BIDS Pipeline [1]	Standardized fMRI denoising pipeline	Reference implementation for comparing new denoising methods
SHAMAN Software [1]	Calculation of motion impact scores	Critical for detecting overestimation/underestimation in trait-FC effects
SUPPORT GUI [36]	User-friendly interface for self-supervised denoising	Makes advanced denoising accessible without programming expertise
BM3D Implementation [32]	Benchmark algorithm for medical image denoising	Reference method for evaluating new denoising approaches
Hybrid AMF-MDBMF Code [33]	Implementation of adaptive filtering for impulse noise	Reference for high-density salt-and-pepper noise removal

This comprehensive benchmark demonstrates that denoising method selection has profound implications for the validity of trait-FC research findings. Methods that effectively reduce both motion overestimation and underestimation biases are particularly valuable for brain-behavior association studies. The recent development of assessment frameworks like SHAMAN provides researchers with critical tools for quantifying these biases in their specific research contexts.

Across imaging modalities, we observe a trend toward specialized denoising approaches that leverage domain-specific characteristics of the signal and noise. In fMRI research, methods that specifically address the systematic biases introduced by head motion are essential for accurate trait-FC inference. The experimental data synthesized in this guide provides researchers with evidence-based criteria for selecting denoising methods appropriate to their specific research questions and data characteristics.

In-scanner head motion represents the most substantial source of artifact in functional magnetic resonance imaging (fMRI), introducing systematic bias into resting-state functional connectivity (FC) measurements that cannot be completely removed by standard denoising algorithms [1]. This challenge is particularly acute when studying populations prone to higher movement, such as children, adolescents, older adults, and individuals with neurological or psychiatric conditions [1]. Researchers investigating traits associated with motion susceptibility face a critical methodological dilemma: they must determine whether observed trait-FC relationships reflect genuine neural associations or spurious findings driven by residual motion artifact. The Adolescent Brain Cognitive Development (ABCD) Study, with its extensive neuroimaging and behavioral data from over 11,000 children ages 9-10, provides an ideal framework for examining these methodological challenges and evaluating motion correction approaches [1] [37].

The complexity of this issue is magnified by the non-linear characteristics of MRI physics, which make complete removal of motion artifact during post-processing exceptionally difficult [1]. Motion effects on FC are spatially systematic, causing decreased long-distance connectivity and increased short-range connectivity, most notably within the default mode network [1]. This creates a persistent tension in analytical decision-making: removing too few motion-contaminated volumes risks false positive inferences, while censoring too many volumes may systematically exclude individuals with high motion who may exhibit important variance in the trait of interest, thereby biasing sample distributions [1].

Theoretical Framework: Motion Overestimation Versus Underestimation in Trait-FC Effects

Understanding the dual nature of motion's impact on trait-FC associations is fundamental to rigorous neuroimaging research. Motion can create two distinct types of spurious findings that threaten the validity of brain-behavior relationships:

Motion Overestimation

Motion overestimation occurs when residual motion artifact causes inflation or false positive reporting of trait-FC relationships. This happens when motion-related variance in the fMRI signal aligns with the direction of the trait-FC effect, leading researchers to overestimate the strength of a genuine association or detect a relationship that does not actually exist [1]. In the ABCD dataset, analyses revealed that after standard denoising without motion censoring, 42% (19/45) of traits examined showed significant (p < 0.05) motion overestimation scores [1].

Motion Underestimation

Conversely, motion underestimation represents the scenario where motion artifact obscures or diminishes genuine trait-FC relationships. This occurs when motion-related variance opposes the direction of the true trait-FC effect, leading researchers to underestimate effect sizes or miss significant associations entirely [1]. In the same ABCD analysis, 38% (17/45) of traits demonstrated significant underestimation scores even after standard denoising procedures [1].

Table 1: Prevalence of Motion Impact on Trait-FC Effects in ABCD Data

Motion Impact Type	Prevalence Before Censoring	Prevalence After FD < 0.2 mm Censoring	Key Characteristics
Overestimation	42% (19/45 traits)	2% (1/45 traits)	False positive inflation of effects
Underestimation	38% (17/45 traits)	No significant decrease	Genuine effects obscured by motion

The particular vulnerability of certain traits to motion artifact stems from the fact that some participant characteristics are inherently correlated with movement tendency. For instance, participants with attention-deficit hyperactivity disorder or autism typically exhibit higher in-scanner head motion than neurotypical participants [1]. This creates a systematic confound wherein motion-prone populations may show apparent FC differences that actually reflect artifact rather than neural reality.

The SHAMAN Framework: Quantifying Motion Impact in Trait-FC Research

Methodological Innovation

To address the critical need for trait-specific motion artifact quantification, researchers developed the Split Half Analysis of Motion Associated Networks (SHAMAN) framework [1]. This novel method assigns a motion impact score to specific trait-FC relationships and distinguishes between motion causing overestimation versus underestimation of effects. SHAMAN capitalizes on a fundamental physiological principle: traits (e.g., cognitive abilities, demographic characteristics) remain stable over the timescale of an MRI scan, while motion is a state that varies from second to second [1].

The methodological workflow of SHAMAN involves measuring differences in correlation structure between split high- and low-motion halves of each participant's fMRI timeseries. When trait-FC effects are independent of motion, the difference between halves is non-significant because traits remain stable over time. A significant difference emerges only when state-dependent motion variations impact the trait's connectivity pattern. The directionality of this effect indicates whether motion causes overestimation (alignment with trait-FC effect direction) or underestimation (opposition to trait-FC effect direction) [1].

Analytical Workflow

The SHAMAN framework implements a rigorous analytical pipeline that extends beyond standard motion correction approaches:

Timeseries Division: Each participant's resting-state fMRI data is divided into high-motion and low-motion halves based on framewise displacement (FD) metrics.
Trait-FC Effect Calculation: The relationship between functional connectivity and the trait of interest is computed separately for each half.
Permutation Testing: Non-parametric combining across pairwise connections yields a motion impact score with associated p-value [1].
Directionality Assessment: Significant motion impact scores are classified as overestimation or underestimation based on alignment with trait-FC effect direction.

This method operates effectively with one or more rs-fMRI scans per participant and can be adapted to incorporate covariates, providing flexibility for different study designs and analytical needs [1].

Experimental Protocols and Comparative Performance Metrics

ABCD Denoising Pipeline

The standard ABCD-BIDS preprocessing pipeline represents a comprehensive denoising approach incorporating multiple correction strategies [1]:

Global signal regression to remove widespread noise
Respiratory filtering to mitigate breathing artifacts
Spectral (low-pass) filtering to eliminate high-frequency noise
Despiking and interpolation of high-motion frames
Motion parameter timeseries regression to account for head movement

Performance validation demonstrated that minimal processing (motion-correction by frame realignment only) left 73% of signal variance explained by head motion. After application of the full ABCD-BIDS denoising protocol, the proportion of variance explained by motion was reduced to 23%, representing a 69% relative reduction in motion-related signal variance compared to minimal processing alone [1].

Motion Censoring Threshold Optimization

A critical analytical decision in motion correction involves selecting appropriate censoring thresholds—the framewise displacement (FD) value above which individual volumes are excluded from analysis. The ABCD dataset enables systematic evaluation of threshold efficacy:

Table 2: Efficacy of Motion Censoring Thresholds in Reducing Spurious Trait-FC Associations

Censoring Threshold	Motion Overestimation Prevalence	Motion Underestimation Prevalence	Data Retention Implications
No censoring	42% (19/45 traits)	38% (17/45 traits)	Maximum data retention
FD < 0.3 mm	11% (5/45 traits)	33% (15/45 traits)	Moderate data retention
FD < 0.2 mm	2% (1/45 traits)	38% (17/45 traits)	Substantial data exclusion
FD < 0.1 mm	Not reported	Not reported	Severe data exclusion

The data reveal an important dissociation: while stricter censoring (FD < 0.2 mm) effectively controlled overestimation bias, reducing significant overestimation from 42% to just 2% of traits, it did not decrease the number of traits with significant motion underestimation scores [1]. This indicates that residual motion continues to obscure genuine trait-FC relationships even after aggressive censoring.

Impact on Functional Connectivity Patterns

Motion systematically alters the apparent architecture of functional networks in characteristic patterns. Analyses of ABCD data revealed that the motion-FC effect matrix showed a strong negative correlation (Spearman ρ = -0.58) with the average FC matrix [1]. This indicates that participants who moved more showed systematically weaker connection strength across functional networks compared to those who moved less. This motion-driven connectivity reduction was notably larger than the increase or decrease in FC related to traits of interest, highlighting the potential for motion to overwhelm true biological signals [1].

The Researcher's Toolkit: Essential Methods for Motion Compensation

Table 3: Research Reagent Solutions for Motion Compensation in fMRI Studies

Method Category	Specific Techniques	Primary Function	Key Considerations
Acquisition Strategies	Multiband sequences, Multi-echo pulse sequences	Enhance temporal resolution and motion robustness	Increased computational complexity
Behavioral Interventions	Participant training, Head stabilization, Compliance reinforcement	Reduce in-scanner motion at source	Particularly crucial for pediatric/special populations
Denoising Algorithms	ABCD-BIDS pipeline, Global signal regression, Motion parameter regression	Remove motion artifact from signal	Risk of removing neural signal along with noise
Data-Driven Correction	SHAMAN framework, ICA-based cleanup, PCA denoising	Identify and correct motion-specific artifacts	Requires adequate statistical power
Censoring Approaches	Framewise displacement thresholds, DVARS-based exclusion	Remove high-motion volumes from analysis	Risk of biasing sample if excluding high-motion participants

Integration with Complementary Methodological Approaches

Data-Driven Motion Compensation in PET Imaging

Parallel advances in motion compensation have emerged in positron emission tomography (PET) neuroimaging, particularly relevant for studies of populations with movement disorders. Recent validation of data-driven motion-compensated PET brain image reconstruction algorithms demonstrates robust correction for even minimal movements (1-mm translations and 1° rotations) [38]. In clinical cohorts where standard images were deemed suboptimal or non-diagnostic, motion-compensated images were consistently classified as having acceptable diagnostic quality, often obviating the need for repeat scans [38]. These methods operate by subdividing imaging data into brief frames (1.0 second), identifying periods of quiescence, and estimating transforms between motion frames to integrate motion compensation directly into iterative image reconstruction [38].

Multivariate Machine Learning Approaches

Machine learning (ML) methods applied to large-scale datasets like ABCD offer complementary approaches for understanding brain-behavior relationships in the context of motion artifact. Multivariate ML algorithms can explain approximately 12% of variance in stop-signal reaction time (SSRT) using cross-validated fMRI activation patterns [39]. These approaches demonstrate that brain regions with greater activity during cognitive tasks and more interindividual variation in activation show stronger associations with behavioral measures [39]. The multivariate nature of these methods may provide some inherent robustness to motion effects by distributing signal across multiple features rather than relying on individual connections.

The methodological framework presented in this case study highlights both the substantial challenges posed by motion artifact in developmental neuroimaging and the promising solutions emerging from rigorous methodological research. Several key principles emerge for researchers working with the ABCD dataset and similar large-scale neuroimaging initiatives:

First, standard denoising alone is insufficient to eliminate motion-related bias in trait-FC associations, particularly for traits correlated with movement tendency. Second, motion censoring effectively controls false positive inflation but does not resolve the problem of effect obscuration and may introduce sampling bias. Third, trait-specific motion impact assessment using methods like SHAMAN provides crucial information about the validity of individual brain-behavior relationships.

Future methodological development should focus on integrating multiple compensation strategies, developing more sophisticated censoring approaches that preserve statistical power while controlling artifact, and creating standardized reporting guidelines for motion impact in neuroimaging studies. As the ABCD cohort progresses through adolescence, longitudinal applications of these methods will illuminate how motion artifacts and their correction influence the interpretation of developmental trajectories in brain-behavior relationships.

The systematic implementation of rigorous motion correction protocols represents an essential step toward realizing the full potential of large-scale neuroimaging datasets for understanding typical and atypical neurodevelopment.

Optimizing Pipelines: Balancing Data Integrity and Statistical Power

In data-driven research, particularly in fields like neuroimaging and survival analysis, censoring represents a critical methodological decision point with far-reaching consequences. Censoring involves the systematic exclusion of data points perceived as low-quality or unreliable, such as high-motion frames in functional magnetic resonance imaging (fMRI) or subjects with incomplete mortality information in real-world evidence studies. This practice aims to reduce false positive findings by removing noise from the dataset. However, this noise reduction comes at a potential cost: the introduction of systematic sample bias that can distort true effect estimates and undermine generalizability. In neuroimaging, this dilemma is particularly acute when studying traits correlated with motion, such as psychiatric disorders, where aggressive censoring may selectively exclude the very populations of interest [1]. This article examines this fundamental trade-off through the specific lens of motion overestimation versus underestimation in trait-functional connectivity (trait-FC) effects research, providing researchers with evidence-based guidance for navigating these methodological challenges.

Quantitative Comparison of Censoring Approaches

The tables below summarize key experimental findings from recent studies investigating the impact of different censoring methodologies on analytical outcomes.

Table 1: Impact of Motion Censoring on Trait-FC Effects in the ABCD Study (n=7,270)

Censoring Threshold (Framewise Displacement)	Traits with Significant Motion Overestimation	Traits with Significant Motion Underestimation	Key Findings
No censoring	42% (19/45 traits)	38% (17/45 traits)	Standard denoising alone leaves substantial motion-related bias
FD < 0.2 mm	2% (1/45 traits)	38% (17/45 traits)	Stringent censoring virtually eliminates overestimation but does not address underestimation

Table 2: Performance Comparison of Scrubbing Methods in fMRI Data Analysis

Scrubbing Method	Theoretical Basis	Impact on Sample Size	Effect on Functional Connectivity
Motion Scrubbing	Framewise displacement (FD) and derivative	High rates of volume and subject exclusion	Can worsen validity and reliability with stringent thresholds [22]
Data-Driven Scrubbing	Statistical outlier detection in processed fMRI	Minimal volume and subject exclusion	Improves fingerprinting without negatively impacting validity/reliability [22]
Projection Scrubbing	ICA-based outlier detection in strategic dimensions	Dramatically increases sample retention	Produces more valid, reliable, and identifiable FC on average [22]

Table 3: Bias in Median Survival Estimates Under Different Censoring Schemes with Incomplete Mortality Data

Censoring Scheme	Direction of Bias	Magnitude of Bias with Increasing Missing Data	Recommended Application Context
Last Activity Date Censoring	Underestimation	Bias decreases as missing data increases [40]	When linked external mortality information is substantially incomplete
Data Cutoff Censoring	Overestimation	Bias increases as missing data increases [40]	When mortality information is nearly complete and validated

Experimental Protocols and Methodologies

The SHAMAN Protocol for Quantifying Motion Impact

The Split Half Analysis of Motion Associated Networks (SHAMAN) methodology was developed to assign a motion impact score to specific trait-FC relationships, distinguishing between motion causing overestimation or underestimation of trait-FC effects [1]. The experimental protocol involves:

Data Acquisition and Preprocessing: Using resting-state fMRI data from large-scale studies like the Adolescent Brain Cognitive Development (ABCD) Study, which collected up to 20 minutes of rs-fMRI data on 11,874 children ages 9-10 years [1]. Default denoising algorithms (e.g., ABCD-BIDS) are applied, including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter timeseries regression.
Split-Half Analysis: Capitalizing on the observation that traits (e.g., weight, intelligence) are stable over the timescale of an MRI scan while motion varies second-to-second, researchers measure differences in correlation structure between split high- and low-motion halves of each participant's fMRI timeseries [1].
Motion Impact Scoring: A significant difference between halves indicates motion impact on trait connectivity. The score direction indicates the bias type: motion impact score aligned with the trait-FC effect direction indicates overestimation, while a score opposite the trait-FC effect direction indicates underestimation [1].
Statistical Validation: Permutation of the timeseries and non-parametric combining across pairwise connections yields a motion impact score with associated p-value distinguishing significant from non-significant motion impacts [1].

Monte Carlo Simulation for Censoring Bias in Survival Analysis

To understand the impact of different censoring methods on estimating median survival and log hazard ratios when external mortality information is partially captured, researchers have employed sophisticated Monte Carlo simulation approaches [40]:

Data Generation Mechanism: Simulations emulate non-randomized comparative effectiveness studies using real-world data from electronic health records and linked external mortality data. Time to death, time to last database activity, and time to data cutoff are generated using exponential and uniform distributions informed by past real-world evidence studies [40].
Missing Data Incorporation: Death events after the last database activity are attributed to linked external mortality data and randomly set to missing to reflect the sensitivity of contemporary real-world data sources, which ranges from 83.9% to 91.5% across cancer types when benchmarked against the National Death Index [40].
Censoring Scheme Evaluation: Two censoring schemes are evaluated: (1) censoring at the last activity date and (2) administrative censoring at the end date of database collection (censoring at data cutoff) without an observed death [40].
Performance Assessment: Researchers assess the performance of each method in estimating median survival and log hazard ratios using bias, coverage, variance, and rejection rate under varying amounts of incomplete mortality information and varying treatment effects, length of follow-up, and sample size [40].

Visualizing the Censoring Decision Pathway

The following diagram illustrates the analytical decision pathway for censoring and its consequences for false positives and sample bias:

Censoring Decision Pathway: This workflow illustrates how censoring decisions lead to different bias profiles in research outcomes, highlighting the critical trade-offs at each decision point.

Table 4: Essential Research Reagent Solutions for Censoring Methodology Research

Tool/Method	Primary Function	Application Context	Key Considerations
Framewise Displacement (FD)	Quantifies head motion between consecutive fMRI volumes	Motion censoring threshold determination	Requires arbitrary threshold selection; lacks generalizability to multiband acquisitions [22]
SHAMAN	Computes trait-specific motion impact scores	Quantifying overestimation vs. underestimation	Distinguishes direction of motion bias; requires no repeated scans [1]
Projection Scrubbing	Data-driven outlier detection using ICA or other projections	Noise reduction with maximal data retention	Statistically principled approach; avoids unnecessary censoring [22]
Monte Carlo Simulation	Evaluates censoring scheme performance under known conditions	Survival analysis with incomplete mortality data	Allows systematic assessment of bias under varying missing data conditions [40]
ABCD-BIDS Pipeline	Standardized denoising for resting-state fMRI data	Motion artifact reduction in large datasets	Includes global signal regression, respiratory filtering, spectral filtering, despiking [1]

Discussion and Research Implications

The empirical evidence demonstrates that censoring strategies inevitably involve trade-offs between statistical accuracy and sample representativeness. In neuroimaging, stringent motion censoring (FD < 0.2 mm) effectively addresses false positives due to motion overestimation but fails to resolve underestimation biases and systematically excludes high-motion participants who may represent clinically important populations [1]. Similarly, in survival analysis, censoring scheme selection dramatically influences bias direction, with last activity date censoring producing underestimation and data cutoff censoring producing overestimation of median survival [40].

The emerging consensus favors data-driven, context-sensitive approaches over one-size-fits-all censoring thresholds. Methods like projection scrubbing in fMRI [22] and carefully validated censoring schemes in survival analysis [40] offer promising directions. Researchers must carefully consider their specific research questions, the nature of their data, and the potential consequences of both false positives and sample bias when establishing censoring protocols. Transparent reporting of censoring methods and their potential impacts on results is essential for interpreting and replicating findings across scientific studies.

Addressing High-Frequency Contamination from Physiological Noise

In the pursuit of mapping brain-behavior relationships, researchers face a formidable obstacle: the contamination of neuroimaging signals by physiological noise. This noise, originating from cardiac pulsation, respiration, and other systemic physiological processes, introduces high-frequency components that can systematically bias functional connectivity estimates. The challenge is particularly acute when studying traits inherently correlated with motion, such as psychiatric disorders, where failure to adequately address noise contamination can lead to both false positive and false negative findings [1].

The broader context of motion-related artifacts encompasses two distinct but related phenomena: motion overestimation and motion underestimation of trait-FC (trait-functional connectivity) effects. Motion overestimation occurs when residual motion artifact inflates apparent brain-behavior relationships, creating spurious associations. Conversely, motion underestimation arises when motion artifact obscures genuine trait-FC relationships, rendering true effects non-significant [1] [3]. This duality presents a complex challenge for researchers, as mitigation strategies that address one type of bias may inadvertently exacerbate the other.

Advances in noise correction methodologies span multiple neuroimaging modalities, including functional magnetic resonance imaging (fMRI) and functional near-infrared spectroscopy (fNIRS), each with distinct noise profiles and correction approaches. This guide provides a comprehensive comparison of contemporary physiological noise filtering techniques, their experimental validation, and practical implementation for researchers and drug development professionals working to establish robust brain-behavior associations.

Fundamental Noise Components Across Imaging Modalities

Physiological noise in neuroimaging signals arises from multiple sources, each with characteristic frequency signatures and mechanistic origins. In fNIRS, physiological noise typically exceeds the magnitude of neural activation signals and originates from both global systemic and local regulatory processes [41]. These components operate at different temporal scales, requiring specialized filtering approaches for effective mitigation.

Table 1: Physiological Noise Components in fNIRS Signals

Component	Frequency Range	Primary Origin	Impact on Signal
Cardiac Pulsation	~1 Hz (0.8-1.5 Hz)	Heartbeat	High-frequency oscillations well-separated from hemodynamic response
Respiration (R-band)	0.2-0.5 Hz	Breathing cycles	Medium-frequency oscillations requiring specialized filtering
Mayer Waves (M-band)	0.08-0.15 Hz	Autonomic blood pressure regulation	Low-frequency oscillations often overlapping with task responses
Very Low Frequency Oscillations (VLFOs/A-band)	0.02-0.08 Hz	Sympathetic nervous system activity	Slow drifts that can be synchronized with tasks
Motion Artifacts	Variable (impulsive or slow shifts)	Head movement, optode displacement	Large-amplitude spikes or baseline shifts [42]

In fMRI, particularly resting-state functional connectivity (FC) studies, head motion introduces systematic bias that is not completely removed by standard denoising algorithms. The effect of motion on FC has been shown to be spatially systematic, causing decreased long-distance connectivity and increased short-range connectivity, most notably in the default mode network [1]. This spatial specificity means that motion artifacts do not represent random noise but rather introduce structured bias that can mimic or obscure genuine neurobiological phenomena.

Signaling Pathways of Noise Contamination

The diagram below illustrates the pathways through which physiological processes contaminate neuroimaging signals and the corresponding filtering approaches.

Figure 1: Physiological noise pathways and filtering approaches in neuroimaging

The contamination pathway begins with physiological processes (cardiac, respiratory, Mayer waves, VLFOs, and motion) which introduce structured noise into both fNIRS and fMRI signals. This noise manifests as two distinct problems in trait-FC research: motion overestimation (false positives) and motion underestimation (false negatives). Contemporary solutions address these challenges through hardware-based methods, algorithmic approaches, and advanced statistical modeling [1] [41] [42].

Comparative Analysis of Noise Filtering Methodologies

Hardware-Based Noise Reduction Techniques

Hardware solutions focus on capturing physiological processes directly through auxiliary measurements, enabling more targeted noise removal.

Table 2: Hardware-Based Physiological Noise Correction Methods

Method	Auxiliary Hardware	Mechanism	Compatible Modalities	Limitations
Short-Separation Channels	Additional fNIRS detectors at 8-15mm distance	Measure superficial signals for regression	fNIRS	Increased setup complexity; limited depth resolution [43]
Accelerometer-Based Motion Tracking	Inertial measurement units (IMUs)	Direct motion capture for regression	fNIRS, fMRI	Does not capture physiological noise; additional hardware required [42]
Physiological Monitoring	ECG, respiration belt, blood pressure	Capture systemic physiological fluctuations	fNIRS, fMRI	Multiple attachment points; subject discomfort during long sessions [41]
Computer Vision Tracking	Camera systems with deep neural networks	Markerless motion tracking	fNIRS	Processing complexity; privacy concerns [6]

Accelerometer-based approaches include adaptive filtering, active noise cancellation (ANC), accelerometer-based motion artifact removal (ABAMAR), and blind source separation with accelerometer-based artifact rejection and detection (BLISSA2RD) [42]. The introduction of accelerometers improves the feasibility of real-time rejection of motion artifacts, which is particularly valuable in studies with populations prone to movement.

Algorithmic and Statistical Noise Correction Methods

Algorithmic approaches leverage signal processing and statistical techniques to separate physiological noise from neural signals without requiring additional hardware.

Table 3: Algorithmic Physiological Noise Filtering Methods

Method	Mechanism	Noise Targets	Performance Metrics	Implementation Complexity
Wavelet Transform	Multi-resolution time-frequency decomposition	Cardiac, respiration, Mayer waves, VLFOs	Reconstruction error, SNR improvement	Moderate [44]
Moving Average (MA)	Local smoothing to reduce high-frequency noise	Motion artifacts, high-frequency physiological noise	Artifact reduction, signal distortion	Low [45]
Spline Interpolation	Piecewise polynomial fitting to corrupted segments	Motion artifact spikes	Interpolation error, shape preservation	Low to Moderate [45]
GLM with tCCA	Temporally embedded Canonical Correlation Analysis	Systemic physiological fluctuations	Correlation: +45% (HbO), RMSE: -55% (HbO) [43]	High
MODWT-LSTM	Wavelet decomposition with deep learning prediction	Very low-frequency oscillations (0.01 Hz)	MAE, RMSE on synthetic data [44]	High
SHAMAN	Split-half analysis of motion-associated networks	Residual motion artifacts in trait-FC relationships	Motion impact scores for over/under-estimation [1]	High

The General Linear Model with temporally embedded Canonical Correlation Analysis (GLM with tCCA) represents a significant advancement over standard GLM with short-separation regression. This approach allows flexible integration of any number of auxiliary modalities and signals while orthogonality of the provided regressors is ensured [43]. Performance evaluations demonstrate that GLM with tCCA significantly improves upon the current best practice, yielding significantly better results across multiple metrics: Correlation (HbO max. +45%), Root Mean Squared Error (HbO max. -55%), F-Score (HbO up to 3.25-fold), and p-value as well as power spectral density of the noise floor [43].

The emerging MODWT-LSTM (Maximal Overlap Discrete Wavelet Transform - Long Short-Term Memory) framework combines wavelet decomposition with deep learning prediction. This method extracts fluctuating signals during the resting state using MODWT, identifies low-frequency wavelets corresponding to physiological noise, trains them using LSTM networks, and predicts/subtracts them during the task session [44]. This approach is particularly valuable when the activation period overlaps with physiological noise frequencies, where conventional GLM methods may fail.

Experimental Protocols and Validation Frameworks

SHAMAN Protocol for Motion Impact Assessment

The Split Half Analysis of Motion Associated Networks (SHAMAN) framework was developed specifically to assign a motion impact score to specific trait-FC relationships, distinguishing between motion causing overestimation or underestimation of trait-FC effects [1]. The methodology capitalizes on the observation that traits (e.g., weight, intelligence) are stable over the timescale of an MRI scan whereas motion is a state that varies from second to second.

Experimental Workflow:

Data Acquisition: Resting-state fMRI data from large cohorts (e.g., n=7,270 in ABCD Study)
Denoising Application: Standard pipeline processing (e.g., ABCD-BIDS with global signal regression, respiratory filtering, spectral filtering, despiking, motion parameter regression)
Motion Censoring: Optional exclusion of high-motion frames based on framewise displacement (FD) thresholds
Split-Half Analysis: Division of each participant's timeseries into high-motion and low-motion halves
Trait-FC Correlation: Separate correlation of traits with FC in high-motion and low-motion halves
Impact Score Calculation: Quantification of differences between correlation structures
Statistical Testing: Permutation testing to establish significance of motion impact scores

In validation studies using the ABCD dataset, after standard denoising without motion censoring, 42% (19/45) of traits had significant (p < 0.05) motion overestimation scores and 38% (17/45) had significant underestimation scores. Censoring at framewise displacement (FD) < 0.2 mm reduced significant overestimation to 2% (1/45) of traits but did not decrease the number of traits with significant motion underestimation scores [1] [3]. This demonstrates the complex balance between mitigating different types of motion-related bias.

Synthetic Data Validation for fNIRS Filtering Methods

For fNIRS noise correction methods, validation often employs synthetic data with known ground truth to quantify performance metrics.

Figure 2: Synthetic data validation workflow for fNIRS filtering methods

The synthetic data approach typically involves:

Signal Generation: Creating synthetic fNIRS signals with known hemodynamic responses and added physiological noise with specific frequency characteristics (cardiac: 1±0.1 Hz, respiration: 0.25±0.01 Hz, Mayer waves: 0.1±0.01 Hz, VLFO: 0.01±0.001 Hz) [44]
Algorithm Application: Processing synthetic data through the proposed filtering pipeline
Performance Quantification: Calculating error metrics between reconstructed signals and known ground truth
Comparative Analysis: Benchmarking against established methods

For MODWT-LSTM approaches, validation typically uses 600-second resting state data and 40-second task data, decomposing signals into nine wavelet levels and using the fifth to ninth wavelets (containing low-frequency components) for LSTM training and prediction [44]. This approach addresses the critical challenge of maintaining phase information of physiological noise at the start time of a task.

Performance Benchmarks Across Methodologies

Table 4: Quantitative Performance Comparison of Noise Filtering Methods

Method	Signal Quality Improvement	Computational Demand	Optimal Use Case	Validation Evidence
Moving Average + Wavelet	Good for spike removal	Low	Pediatric populations with high motion [45]	Real fNIRS data from children in language task
GLM with tCCA	Correlation: +45% (HbO), RMSE: -55% (HbO)	High	Studies with multiple auxiliary signals available [43]	Simulated ground truth and visual stimulation data
SHAMAN	Identifies 42% traits with motion overestimation	High	Large-scale trait-FC studies with motion-correlated traits [1]	ABCD Study (n=7,270) with 45 behavioral traits
MODWT-LSTM	Low prediction error in wavelet 8	Very High	Resting-state studies with unknown trial periods [44]	1,000 synthetic data samples with known ground truth
Computer Vision + fNIRS	Precise movement-artifact correlation	Moderate	Studies requiring movement characterization [6]	Controlled head movements with video validation

Table 5: Research Reagent Solutions for Physiological Noise Management

Resource Category	Specific Tools	Function	Implementation Considerations
Software Packages	Homer2, fNIRSDAT (MATLAB)	fNIRS signal processing and GLM analysis	Requires MATLAB license; learning curve for advanced features [45]
Motion Tracking	SynergyNet Deep Neural Network	Computer vision-based head movement tracking	Video recording setup; processing demands [6]
Data Quality Assessment	SHAMAN Motion Impact Scores	Quantifying motion bias in trait-FC relationships	Requires large samples; implementation complexity [1]
Hybrid Methods	MODWT-LSTM Framework	Wavelet decomposition with deep learning prediction	Python implementation; substantial training data needed [44]
Auxiliary Hardware	Short-separation detectors, IMU sensors	Direct measurement of superficial signals and motion	Increased setup time; subject comfort considerations [43] [42]

Addressing high-frequency contamination from physiological noise requires careful consideration of the specific research context, participant population, and imaging modality. The comparative analysis presented here reveals that no single solution optimally addresses all noise challenges, rather, researchers must strategically select methods based on their specific experimental needs.

For trait-FC studies where motion may correlate with traits of interest, the SHAMAN framework provides critical quantification of both overestimation and underestimation biases, revealing that conventional censoring approaches may reduce false positives while having minimal effect on false negatives [1] [3]. For fNIRS applications where unknown trial periods or frequency overlap complicate standard GLM approaches, MODWT-LSTM methods offer a promising alternative by leveraging resting-state data to predict task-period noise [44].

The evolving landscape of physiological noise filtering increasingly combines multiple modalities - integrating hardware-based auxiliary measurements with advanced algorithmic approaches - to achieve unprecedented signal fidelity. As neuroimaging moves toward more naturalistic paradigms and clinically vulnerable populations, these sophisticated noise handling methodologies will become increasingly essential for generating valid, reproducible brain-behavior associations in basic neuroscience and drug development research.

When Global Signal Regression Helps and Hurts

Global signal regression (GSR) remains one of the most contentious preprocessing techniques in resting-state functional magnetic resonance imaging (rs-fMRI) analysis. This guide objectively compares the performance of GSR against alternative preprocessing pipelines by synthesizing current experimental data, with particular emphasis on its dual role in both introducing and mitigating bias within trait-functional connectivity (trait-FC) effect research. Evidence indicates that GSR strengthens brain-behavior associations in healthy populations and effectively removes global motion and respiratory artifacts. However, it also introduces spurious negative correlations, distorts group differences in clinical populations, and can attenuate true trait-FC effects. The decision to apply GSR must therefore be context-dependent, weighing its benefits in artifact removal against its perils for specific research questions and populations.

Global signal regression (GSR) is a mathematical preprocessing step for fMRI data that involves calculating the global signal (GS)—the average time course of BOLD signals across all voxels in the brain—and removing it from each voxel's time series via linear regression [46] [47]. The GS itself is a composite signal, reflecting a mixture of neuronal information related to vigilance and arousal, and non-neuronal noise originating from head motion, respiration, and cardiac rhythms [46] [48] [49].

The technique is predicated on the hypothesis that global noise is additive and dominates the global-averaged signal, while neural-related signals cancel each other out in the global average [47]. By removing this common variance, GSR aims to enhance the spatial specificity of functional connectivity analyses. However, this operation mathematically forces the distribution of correlation values to center around zero, inevitably introducing negative correlations and sparking ongoing debate about the neurobiological validity of anticorrelations observed between networks like the default mode network and task-positive networks [47] [48].

How GSR Helps: Evidence of Benefits

Strengthening Brain-Behavior Associations

Substantial evidence demonstrates that GSR can enhance the association between resting-state functional connectivity (RSFC) and behavioral measures.

Table 1: GSR-Enhanced Behavioral Variance Explained in Major Studies

Dataset	Behavioral Measures	Variance Explained Increase with GSR	Prediction Accuracy Improvement	Citation
Brain Genomics Superstruct Project (GSP)	23 measures across cognition, personality, emotion	Average 47% increase	Average 64% improvement	[46]
Human Connectome Project (HCP)	58 behavioral measures	Average 40% increase	Average 12% improvement	[46]

This strengthening effect is not uniform across all behavioral measures. GSR appears to benefit task performance measures more than self-reported measures [46]. The underlying mechanism may be that GSR improves the neuronal-hemodynamic correspondence by removing non-neural global variance, thereby allowing cleaner neural signals to correlate with behavior [46] [50].

Effective Removal of Motion and Physiological Artifacts

GSR is highly effective at reducing global artifacts driven by motion and respiration, which are major confounds in fMRI studies.

Motion Reduction: In the Adolescent Brain Cognitive Development (ABCD) Study, the standard denoising pipeline (ABCD-BIDS) that includes GSR achieved a 69% relative reduction in the proportion of signal variance related to head motion compared to minimal processing alone [1].
Respiratory Artifact Removal: The global signal is strongly correlated with respiratory patterns [49]. GSR helps mitigate the inflating effect of respiratory volume per time (RVT) on functional connectivity measures, thus reducing spurious positive correlations [48] [12].

Improved Specificity in Clinical Research

In certain contexts, the global signal itself—and by extension its connectivity—holds clinical relevance, suggesting that its removal might discard valuable information.

Schizophrenia Differentiation: The functional connectivity of the global signal (GSFC) differentiates between schizophrenia patients and healthy controls during rest, indicating its potential as a clinical research tool [51].
Global Signal Topography: The spatial distribution of global signal correlation (GSCORR) is altered in various psychiatric disorders and correlates with behavioral variables, linking higher GSCORR in sensory regions and lower GSCORR in higher-order regions to psychiatric problems and cognitive performance [49] [52].

How GSR Hurts: Evidence of Drawbacks and Biases

The most widely documented drawback of GSR is its mathematical imposition of negative correlations.

Artificial Anticorrelations: GSR forces the mean of voxel-wise correlation distributions to be less than or equal to zero, generating negative correlations that may not have a neurobiological basis [47] [48]. This fundamentally alters the interpretation of network anticorrelations, such as those between the default mode network and task-positive networks.
Altered Network Topology: In studies of anesthesia-induced unconsciousness, GSR was found to differentially affect functional connectivity and graph theory measures, with sevoflurane-induced changes being particularly sensitive to global signal removal [53].

Distortion of Group Differences in Clinical Populations

GSR can artificially create or obscure true group differences, posing a significant threat to the validity of clinical neuroimaging research.

Simulation Evidence: When the global signal is removed, artificial FC group differences emerge in regions designed to have the same connectivity between groups, while true connectivity differences are attenuated [51].
Autism Research: Studies have suggested that some reported reductions in long-distance FC in autism may be attributable to increased head motion in autistic participants rather than neural pathology, a confound that GSR may not fully resolve [1].

Dynamic Functional Connectivity (dFC) Biases

The impact of GSR extends beyond static FC to time-varying connectivity dynamics, potentially misrepresenting brain states.

Temporal Modulation: The effect of GSR on sliding-window correlations is temporally modulated by the mean global signal magnitude across windows. GSR produces a greater impact on correlation maps at windows with high GS magnitude [50].
State Alteration: GSR can substantially change the connectivity structures of FC states associated with high GS magnitude and even lead to the emergence of new FC states not present in the original data [50]. This is critical because fluctuations in GS magnitude have been linked to time-varying EEG power, suggesting a neurophysiological basis related to changes in mental states [50].

Motion Overestimation vs. Underestimation in Trait-FC Effects

The relationship between in-scanner head motion and trait measurements presents a critical challenge. Motion is a state-dependent variable, while most traits of interest are stable characteristics. This discrepancy creates a confound where motion can systematically bias trait-FC associations.

Table 2: Motion Impact on Trait-FC Effects (ABCD Study, n=7,270)

Motion Censoring Level	Traits with Significant Motion Overestimation	Traits with Significant Motion Underestimation	Citation
No censoring (after standard denoising)	42% (19/45 traits)	38% (17/45 traits)	[1]
Censoring at FD < 0.2 mm	2% (1/45 traits)	38% (17/45 traits)	[1]

Motion Overestimation: Occurs when the motion impact score aligns with the direction of the trait-FC effect, causing inflation of the observed effect size. This is more effectively mitigated by stringent motion censoring [1].
Motion Underestimation: Occurs when the motion impact score opposes the trait-FC effect, leading to attenuation of the true effect. This form of bias appears more resistant to standard motion censoring techniques [1].

GSR's role in this context is complex. While it generally reduces motion-related artifacts, its effectiveness is distance-dependent and may not uniformly address both overestimation and underestimation biases [1]. Furthermore, the global signal itself contains motion-related variance, and its removal may inadvertently remove trait-relevant neural information in populations where motion is correlated with the trait of interest (e.g., attention-deficit disorders).

Experimental Protocols and Methodologies

Core Protocol: Assessing GSR Impact on Trait-FC Effects

The following methodology, derived from recent large-scale studies, provides a framework for evaluating when GSR helps or hurts specific research questions.

1. Data Acquisition and Preprocessing:

Acquire rs-fMRI data with associated behavioral/clinical measures.
Implement standard preprocessing: motion correction, slice timing, band-pass filtering, and regression of non-global nuisances (white matter, CSF, motion parameters).
Create two preprocessing streams: with GSR and without GSR.

2. Functional Connectivity Calculation:

For each stream, compute whole-brain functional connectivity matrices (e.g., using Pearson correlation between regional time series).

3. Trait-FC Association Analysis:

For each trait, model the relationship between FC and the trait using appropriate statistical models (e.g., variance component models, kernel ridge regression).
Quantify the strength of association for both preprocessing streams.

4. Motion Impact Assessment (SHAMAN Method):

Apply the Split Half Analysis of Motion Associated Networks (SHAMAN) to compute trait-specific motion impact scores [1].
Split each participant's fMRI timeseries into high-motion and low-motion halves.
Compare trait-FC effect sizes between halves; a significant difference indicates motion impact.
Determine direction: alignment with trait-FC effect indicates overestimation; opposition indicates underestimation.

5. Comparative Evaluation:

Compare variance explained and prediction accuracy between GSR and non-GSR pipelines.
Assess whether GSR increases or decreases motion-related biases for specific traits.
Evaluate the neurobiological plausibility of resulting connectivity patterns, particularly negative correlations.

Advanced Protocol: Dynamic Global Signal Regression (dGSR)

A proposed modification to conventional GSR addresses the physiological reality that global systemic oscillations propagate through the cerebral vasculature with voxel-specific time delays [48].

Workflow:

Calculate the global signal as the average of all brain voxel time courses.
Estimate the blood arrival time for each voxel relative to the global signal.
Apply a voxel-specific optimal time delay to the global signal.
Perform regression of this time-shifted global signal from each voxel's time series.

Experimental Findings:

dGSR increases the amount of BOLD signal variance being modeled and removed compared to static GSR [48].
It reduces spurious negative correlations introduced in reference regions by static GSR while attenuating inflated positive connectivity measures [48].

Figure 1: Experimental workflow for comparing Dynamic (dGSR) and Static (sGSR) Global Signal Regression protocols.

Decision Framework and Best Practices

The evidence does not support a universal prescription for or against GSR. Instead, the decision should be guided by the specific research context.

When GSR is Likely to Help:

Studies of Brain-Behavior Relationships in healthy young adults, particularly for task performance measures [46].
When Motion Artifact is a Primary Concern and the trait of interest is not strongly correlated with motion [1].
When Seeking to Enhance Specificity of functional connectivity measures within known networks [50].

When GSR is Likely to Hurt:

Clinical Group Comparisons where global signal may differ systematically between groups [53] [51].
Studies Focusing on Network Anticorrelations where biological interpretation of negative correlations is essential [47] [48].
Dynamic FC Analyses where global signal fluctuations may carry state-dependent neurophysiological information [50].
Traits Highly Correlated with Motion (e.g., ADHD, autism) where GSR may introduce or obscure biases [1].

Recommended Reporting Standards:

Always report analyses both with and without GSR to demonstrate robustness of findings.
Quantify and report motion impact scores for trait-FC effects using methods like SHAMAN [1].
Clearly justify the choice of preprocessing pipeline based on the research question and population.
Consider intermediate approaches like dynamic GSR that account for physiological timing [48].

The Scientist's Toolkit

Table 3: Essential Research Reagents and Resources

Resource	Function/Application	Example Implementation
Variance Component Model	Quantifies behavioral variance explained by whole-brain RSFC with/without GSR	Yang et al. (2011) method applied in [46]
Kernel Ridge Regression	Provides fast, effective behavioral prediction using rs-fMRI; validates variance component findings	He et al. (2018) approach used in [46]
SHAMAN (Split Half Analysis of Motion Associated Networks)	Computes trait-specific motion impact scores; distinguishes overestimation vs. underestimation	Nielsen et al. method described in [1]
Dynamic GSR (dGSR)	Advanced GSR accounting for voxel-specific blood arrival time of global systemic oscillations	Tong et al. method detailed in [48]
ICA-FIX Denoising	Independent component analysis-based denoising; tests GSR utility beyond alternative methods	Griffanti et al. (2014) approach used in HCP data [46]
CANONICAL CORRELATION ANALYSIS (CCA)	Multivariate method identifying co-variation between brain topography and behavioral measures	Smith et al. method applied in [49]

GSR presents a classic trade-off in fMRI preprocessing: enhanced specificity against introduced bias. The experimental evidence clearly demonstrates that GSR can strengthen brain-behavior associations and reduce global artifacts in healthy populations, supporting its utility for certain research questions. However, it can also induce spurious anticorrelations, distort group comparisons in clinical studies, and interact complexly with motion-related biases in trait-FC research. The emerging recognition that the global signal contains both noise and neurobiologically meaningful information necessitates a nuanced approach. Rather than seeking a universal standard, researchers should select preprocessing strategies based on their specific hypotheses, populations, and the particular vulnerabilities of their analytical frameworks, always verifying the robustness of their findings across multiple preprocessing approaches.

In resting-state functional magnetic resonance imaging (rs-fMRI) research, head motion represents the most substantial source of artifact, systematically biasing functional connectivity (FC) measurements and potentially leading to spurious brain-behavior associations [1]. Researchers studying traits frequently associated with motion, such as psychiatric disorders, face a fundamental methodological divergence: the choice between analysis strategies that primarily mitigate the overestimation of trait-FC effects versus those that address their underestimation [1]. This strategic divergence is not merely technical but conceptual, influencing how we interpret the neural basis of behavior and psychopathology. The inherent tension lies in removing sufficient motion-contaminated data to reduce false positives without systematically excluding high-motion participants who may represent crucial variance in the trait of interest, thereby biasing sample distributions and causing underestimation [1]. This guide objectively compares the experimental performance of methodological approaches centered on this core trade-off, providing researchers with the data and protocols necessary for informed, strategic decision-making.

Quantitative Comparison of Motion Impact and Mitigation Strategies

Prevalence of Overestimation and Underestimation Across Traits

The following data, derived from a large-scale analysis of the Adolescent Brain Cognitive Development (ABCD) Study, quantifies the scope of the problem across 45 behavioral and demographic traits [1].

Table 1: Trait-Specific Motion Impact Before and After Censoring (n=7,270 Participants)

Condition	Traits with Significant Motion Overestimation	Traits with Significant Motion Underestimation
After standard denoising (ABCD-BIDS), no censoring	42% (19/45 traits) [1]	38% (17/45 traits) [1]
After censoring at FD < 0.2 mm	2% (1/45 traits) [1]	38% (17/45 traits) [1]

Key Interpretation: Standard denoising alone leaves a majority of traits vulnerable to motion-related bias. While aggressive frame censoring (FD < 0.2 mm) is highly effective at mitigating overestimation, it does not reduce the prevalence of underestimation, leaving a substantial proportion of traits vulnerable to this alternative form of bias [1].

Performance Comparison of Motion Mitigation Protocols

Different processing strategies directly influence the balance between overestimation and underestimation. The table below compares the effectiveness of common experimental protocols.

Table 2: Performance Comparison of Motion Mitigation Protocols

Experimental Protocol	Impact on Overestimation	Impact on Underestimation	Key Trade-offs and Considerations
Standard Denoising (ABCD-BIDS) [1]	Ineffective (42% of traits affected) [1]	Ineffective (38% of traits affected) [1]	• Includes global signal regression, respiratory filtering, motion parameter regression, and despiking.• Reduces motion-related signal variance by ~69% compared to minimal processing, but significant bias remains [1].
Standard Denoising + Censoring (FD < 0.2 mm) [1]	Highly Effective (Reduces to 2% of traits) [1]	Ineffective (38% of traits remain affected) [1]	• Dramatically reduces false positives from overestimation.• Risks excluding high-motion participants, potentially biasing sample distribution and failing to address underestimation [1].
SHAMAN Framework (Proposed Method) [1]	Quantifies and distinguishes overestimation	Quantifies and distinguishes underestimation	• Provides a trait-specific motion impact score.• Does not remove data but diagnoses bias, informing strategic decisions on censoring thresholds for specific research questions [1].

Experimental Protocols for Detecting and Quantifying Bias

The SHAMAN Methodology for Trait-Specific Motion Impact Scoring

The Split Half Analysis of Motion Associated Networks (SHAMAN) is a novel methodological framework designed to assign a motion impact score to specific trait-FC relationships, directly addressing the core divergence of this guide [1].

Detailed Workflow:

Data Input and Splitting: For each participant, the preprocessed rs-fMRI timeseries is split into two halves: a high-motion half (timepoints with framewise displacement, FD, above the participant's median) and a low-motion half (timepoints with FD below the median) [1].
Functional Connectivity Calculation: Separate FC matrices are computed for each half of the data for every participant.
Trait-FC Effect Estimation: The relationship between the trait and FC strength is calculated separately for the high-motion and low-motion halves. This can be done using a regression model that may include covariates of no interest.
Motion Impact Score Calculation: For each functional connection (edge), the trait-FC effect from the high-motion half is compared to the effect from the low-motion half.
- A motion impact score that is aligned with the direction of the overall trait-FC effect is consistent with motion overestimation.
- A motion impact score opposite the direction of the overall trait-FC effect is consistent with motion underestimation [1].
Statistical Inference: Permutation testing (e.g., 10,000 iterations) is used to non-parametrically assess the significance of the motion impact score across all connections, generating a p-value that distinguishes significant from non-significant motion impacts [1].

Divergence Point Analysis (DPA) for Temporal Onset Estimation

While SHAMAN assesses the magnitude of bias, the Divergence Point Analysis (DPA) procedure estimates the earliest discernible impact of a variable (like motion) on response latencies, providing fine-grained time-course information [54].

Detailed Workflow:

Survival Curve Generation: Survival curves are generated for two conditions (e.g., high-motion vs. low-motion groups, or a clinical trait group vs. controls). A survival curve plots the percentage of fixations or reaction times "surviving" beyond a given time point t [54].
Bootstrap Resampling: A large number of bootstrap iterations (e.g., 10,000) are performed. In each iteration, data is randomly resampled with replacement for each participant and condition, and group-level survival curves are computed [54].
Confidence Interval Estimation: For each 1-ms time bin, the differences between the two survival curves across all bootstrap iterations are sorted. The confidence interval for the difference at each bin is defined (e.g., the range between the 2.5th and 97.5th percentiles for a 95% CI) [54].
Divergence Point Estimation: The divergence point is operationally defined as the earliest time bin where a significant difference between the survival curves is observed and sustained. The original procedure defined it as the first significant bin in a run of five consecutive significant bins, using a highly conservative per-bin alpha (α < 0.001) to protect against Type I errors [54].

Diagram 1: Divergence Point Analysis (DPA) Workflow

The Scientist's Toolkit: Essential Reagents and Materials

Successful implementation of the aforementioned protocols requires a suite of key resources, from specific datasets to software tools.

Table 3: Research Reagent Solutions for Motion Impact Analysis

Item Name	Function / Application	Specific Example / Source
Large-Scale Neuroimaging Dataset	Provides the statistical power necessary for robust BWAS and method validation. Essential for estimating true effect sizes and quantifying reproducibility.	Adolescent Brain Cognitive Development (ABCD) Study [1]; Human Connectome Project (HCP) [1]; UK Biobank [1]
Preprocessing & Denoising Pipeline	Implements standard or customized algorithms for initial motion artifact reduction.	ABCD-BIDS Pipeline (includes global signal regression, respiratory filtering, motion parameter regression, despiking) [1]
Motion Censoring Tool	Identifies and flags high-motion fMRI frames for exclusion from analysis based on a user-defined threshold (e.g., Framewise Displacement).	In-house scripts based on Framewise Displacement (FD) calculation; FIRMM motion-monitoring software [3]
SHAMAN Implementation	Computes trait-specific motion impact scores to diagnose overestimation and underestimation.	Custom code implementing the Split Half Analysis of Motion Associated Networks (SHAMAN) as described in Kay et al., 2025 [1]
Divergence Point Analysis (DPA) Code	Estimates the earliest discernible impact of motion on latency distributions.	Modified DPA procedures allowing for confidence interval estimation and individual participant analysis [54]

Integrated Workflow for Strategic Decision-Making

The following diagram synthesizes the core concepts and methodologies discussed in this guide into a logical pathway for strategic decision-making in trait-FC research. It illustrates how the fundamental problem of in-scanner head motion leads to distinct biases and how modern tools can guide the selection of an appropriate mitigation strategy.

Diagram 2: Strategic Decision Pathway for Mitigating Bias

The divergence between mitigating overestimation and underestimation in trait-FC research represents a critical strategic choice, one that cannot be resolved by a one-size-fits-all processing pipeline. The data demonstrates that while aggressive motion censoring is a powerful tool against spurious overestimation, it is ineffective against underestimation and may even perpetuate it [1]. The strategic path forward, therefore, relies on diagnostic tools like the SHAMAN framework [1] to quantitatively evaluate the specific motion impact for a given trait, informing a more tailored and justified analytical approach. By moving from heuristic-based processing to a diagnostic-driven strategy, researchers can more confidently navigate this divergence, leading to more robust and reproducible discoveries in brain-behavior research.

Data-Driven Recommendations for Different Study Designs and Populations

In-scanner head motion introduces systematic bias to resting-state fMRI functional connectivity (FC) that is not completely removed by standard denoising algorithms [1]. For researchers studying traits associated with motion, particularly in psychiatric disorders, understanding whether trait-FC relationships are impacted by residual motion is essential to avoid reporting false positive results [1] [3]. This challenge represents a fundamental methodological concern in contemporary neuroscience research, as motion artifact can systematically distort brain-behavior associations in ways that threaten the validity of findings across diverse study designs and populations [1].

The tension between data quality and inclusive sampling emerges as a central consideration. There exists a natural conflict between the need to remove motion-contaminated data to reduce spurious findings while avoiding systematic exclusion of individuals with high motion who may exhibit important variance in the trait of interest (e.g., low scores on attention measures associated with greater motion) [1]. This challenge is particularly acute when studying populations prone to greater head movement, including children, older adults, and patients with neurological or psychiatric disorders [1]. Recent methodological innovations now enable precise quantification of how motion impacts specific trait-FC relationships, allowing for more nuanced data-driven recommendations across different research contexts [1] [3].

Quantitative Assessment of Motion Impact Across Research Contexts

Motion Impact Scores by Trait Category

Table 1 summarizes the prevalence and direction of motion impact scores across different trait categories and denoising approaches, based on large-scale analyses from the Adolescent Brain Cognitive Development (ABCD) Study [1] [3].

Table 1: Motion Impact Scores Across Different Analytical Conditions

Analytical Condition	Traits with Significant Overestimation Scores	Traits with Significant Underestimation Scores	Key Implications for Study Design
Standard denoising (no censoring)	42% (19/45 traits) [1]	38% (17/45 traits) [1]	High risk of spurious findings without stringent motion control
Censoring at FD < 0.2 mm	2% (1/45 traits) [1]	38% (17/45 traits) [1]	Effective for overestimation but not underestimation artifacts
SHAMAN framework application	Quantifies direction-specific motion impact [1]	Distinguishes over/underestimation patterns [1]	Enables trait-specific motion artifact correction

SHAMAN Methodology: Split Half Analysis of Motion Associated Networks

The SHAMAN framework represents a significant methodological advancement for quantifying trait-specific motion artifact in functional connectivity research [1]. This approach capitalizes on the observation that traits (e.g., weight, intelligence) remain stable over the timescale of an MRI scan, while motion varies from second to second [1]. Below is the complete experimental protocol for implementing SHAMAN analysis:

Experimental Protocol: SHAMAN Motion Impact Assessment

Data Requirements: One or more resting-state fMRI scans per participant; framewise displacement (FD) timeseries; trait measures of interest [1].
Timeseries Segmentation: Split each participant's fMRI timeseries into high-motion and low-motion halves based on framewise displacement values [1].
Connectivity Calculation: Compute separate functional connectivity matrices for high-motion and low-motion segments for each participant [1].
Trait-FC Effect Estimation: Calculate correlation between trait measures and FC for both high-motion and low-motion segments across the sample [1].
Motion Impact Score Computation: Quantify differences in trait-FC correlations between high-motion and low-motion segments:
- A motion impact score aligned with the direction of the trait-FC effect indicates motion overestimation
- A motion impact score opposite the direction of the trait-FC effect indicates motion underestimation [1]
Statistical Significance Testing: Use permutation testing and non-parametric combining across pairwise connections to compute p-values for motion impact scores [1].
Threshold Determination: Establish significance thresholds for acceptable versus unacceptable levels of trait-specific motion impact [1].

Population-Specific Recommendations and Methodological Considerations

Pediatric and Developmental Populations

Research with pediatric populations, particularly the ABCD Study involving 11,874 children ages 9-10 years, demonstrates both the vulnerability to motion artifact and the importance of specialized analytical approaches [1]. The implementation of stringent censoring thresholds (framewise displacement < 0.2 mm) proves particularly effective in this population, reducing significant motion overestimation from 42% to just 2% of traits [1]. However, this approach does not address motion underestimation artifacts, which persist regardless of censoring stringency [1]. For developmental researchers, combining stringent censoring with motion impact quantification frameworks like SHAMAN provides the most comprehensive protection against spurious findings.

Clinical and Psychiatric Populations

Individuals with psychiatric disorders often exhibit higher in-scanner head motion, creating systematic biases in functional connectivity findings [1]. Historical examples include spuriously attributed decreases in long-distance FC in autism that actually reflected motion artifacts rather than neural characteristics [1]. For clinical researchers, motion impact scores provide essential safeguards against these confounds. The SHAMAN framework's ability to distinguish between overestimation and underestimation is particularly valuable when studying conditions like ADHD or autism where motion may correlate with trait measures of interest [1].

Inclusive and Diverse Research Populations

Advancing inclusive research requires careful consideration of how methodological decisions might differentially impact diverse populations [55]. The Five Principles for Advancing Inclusive Research framework provides guidance for ensuring representative sampling while maintaining methodological rigor:

Population Science Considerations: Understanding biological, genetic, and demographic factors in disease burden [55]
Data-Informed Site Placement: Ensuring geographical proportionality in trial enrollment [55]
Inclusive Trial Design: Implementing data-informed and user-informed approaches [55]
Patient-Reported Data Standards: Supporting complete and consistent data collection [55]
Trial Access Enablement: Building trustworthiness and improving patient navigation [55]

Table 2 outlines essential research reagents and computational tools for implementing motion-resistant analytical frameworks across diverse populations.

Table 2: Essential Research Reagents and Computational Tools

Research Reagent/Tool	Function/Purpose	Application Context
ABCD-BIDS Pipeline	Default denoising for pre-processed ABCD data; includes global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter regression [1]	Large-scale pediatric neuroimaging studies
FIRMM Motion-Monitoring	Real-time motion analytics during brain MRI to improve data quality and reduce costs [3]	All fMRI studies, particularly clinical populations with higher motion
SHAMAN Framework	Quantifies trait-specific motion impact scores distinguishing overestimation from underestimation [1]	Studies of motion-correlated traits (psychiatric disorders, developmental conditions)
Prospect Certainty Method	Quantifies output certainty of data-driven models dealing with changing input distributions during deployment [56]	Machine learning applications in neuroimaging
Connectome-Based Predictive Modeling	Advanced predictive analytics linking brain connectivity to behavioral measures [57]	Multimodal studies connecting neural circuitry to trauma or behavioral outcomes

Integrated Recommendations for Different Study Designs

Large-Scale Epidemiological Studies

For large-scale studies like the ABCD Study with 11,874 participants, implementation of standardized denoising pipelines like ABCD-BIDS provides essential consistency [1]. However, even with sophisticated processing, residual motion artifacts persist, explaining 23% of signal variance after denoising compared to 73% with minimal processing [1]. Large-scale studies should implement trait-specific motion impact assessments as routine quality control measures, particularly for traits potentially correlated with motion (e.g., attention measures, psychiatric symptoms) [1].

Clinical Trial Applications

In clinical trial contexts, particularly pharmaceutical development, regulatory expectations increasingly emphasize inclusive recruitment and real-world relevance [58]. The growing use of external control arms derived from observational data represents an important innovation, with regulators showing increasing openness to these approaches when rigorously validated [58]. Clinical trial designs should incorporate motion impact assessments when using neuroimaging biomarkers as endpoints, particularly in conditions where patient movement may systematically differ between treatment groups.

Longitudinal and Developmental Studies

Longitudinal designs face unique challenges as motion artifacts may systematically vary with age and developmental stage. The strong negative correlation (Spearman ρ = -0.58) between motion-FC effects and average FC matrices persists even after stringent motion censoring, indicating persistent systematic biases [1]. Developmental researchers should implement age-specific motion impact assessments and consider including motion impact scores as covariates in longitudinal analyses of brain-behavior relationships.

The evolving methodology for assessing and correcting motion artifacts in functional connectivity research enables more confident inferences about brain-behavior relationships across diverse study designs and populations. The distinction between motion overestimation and underestimation effects represents a critical advancement, moving beyond simple motion exclusion toward nuanced understanding of how motion systematically biases specific trait-FC relationships [1]. As the field progresses toward more inclusive research frameworks [55], integrating these methodological safeguards ensures that efforts to diversify participant samples do not come at the cost of methodological rigor. The data-driven recommendations presented here provide a pathway for maintaining scientific precision while advancing more representative and generalizable neuroscience research.

Benchmarking Efficacy: How Validation Studies Inform Best Practices

In-scanner head motion represents one of the most significant confounding factors in functional connectivity (FC) studies, introducing systematic biases that persist even after the application of sophisticated denoising algorithms [20] [13]. The residual motion artifact remaining after denoising poses a particular threat to inference in brain-wide association studies (BWAS), especially when investigating traits that are inherently correlated with motion propensity, such as those associated with psychiatric disorders, developmental conditions, or aging [20] [27]. This persistent artifact can manifest in two distinct directional biases: motion overestimation, where trait-FC relationships appear stronger than they truly are, and motion underestimation, where genuine biological relationships are obscured and attenuated [20]. Understanding the evidence for these effects, particularly from large-scale datasets, and evaluating the efficacy of mitigation strategies forms a critical foundation for rigorous neuroimaging research.

The Problem of Residual Motion

Fundamental Characteristics of Motion Artifacts

Head motion systematically alters fMRI data through multiple mechanisms. Even after volume realignment, residual artifacts persist due to non-linear effects including spin excitation history, interpolation artifacts during image reconstruction, and interactions between head position and the magnetic field [13]. These artifacts exhibit distinct spatial and temporal properties:

Spatial Properties: Motion causes a global signal drop in brain parenchyma accompanied by signal increases at tissue boundaries due to partial volume effects [13]. The artifact follows a characteristic spatial pattern, with maximal displacement in frontal regions and minimal movement near the craniocervical junction [13].
Temporal Properties: Motion induces sharp, high-amplitude signal changes immediately following movement events, with some artifacts persisting for 8-10 seconds, potentially due to motion-related physiological changes like CO₂ fluctuations from yawning or deep breathing [13].

The Overestimation vs. Underestimation Framework

Residual motion does not merely add random noise to FC estimates but introduces systematic directional biases that can fundamentally alter inference:

Motion Overestimation: Occurs when the motion impact score aligns with the direction of the trait-FC effect, causing researchers to overestimate the strength of a true biological relationship [20].
Motion Underestimation: Occurs when the motion impact score opposes the direction of the trait-FC effect, obscuring genuine trait-FC relationships and reducing statistical power to detect true effects [20].

This framework is particularly critical for studies of populations with inherent motion differences, such as children, older adults, or individuals with neurological conditions, where motion can correlate systematically with the traits of interest [20] [13].

Evidence from Large-Scale Datasets

The ABCD Study Findings

The Adolescent Brain Cognitive Development (ABCD) Study, with approximately 11,874 participants, provides unprecedented power to quantify residual motion effects. Analyses reveal the substantial impact of residual motion even after comprehensive denoising:

Table 1: Prevalence of Motion Bias in Traits from the ABCD Study (n=7,270)

Type of Motion Bias	Percentage of Traits Affected	Number of Traits (out of 45)
Significant Overestimation	42%	19
Significant Underestimation	38%	17

After denoising with the ABCD-BIDS pipeline (which includes global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter regression), 73% of signal variance was explained by head motion under minimal processing, reduced to 23% after denoising [20]. This represents a 69% relative reduction, yet a substantial motion-FC relationship remained (Spearman ρ = -0.58), indicating that stronger connections were systematically weaker in participants who moved more [20].

Benchmarking Denoising Pipelines in Task-Based fMRI

The challenge of residual motion extends beyond resting-state studies to task-based FC, where motion often differs between conditions (e.g., rest vs. cognitive tasks) [27]. Evaluations of common denoising pipelines reveal marked heterogeneity in their ability to balance motion artifacts across conditions [27]:

Table 2: Performance of Denoising Pipelines on Task-Based fMRI

Denoising Approach	Residual Motion Reduction	Network Identifiability	Balance Across Conditions
aCompCor (Optimized)	Effective	High	Good
Global Signal Regression	Effective	High	Moderate
Censoring (FD < 0.2 mm)	Substantially reduces distance-dependent artifacts	Reduced	Not applicable
Standard Volume Regression	Moderate	Moderate	Poor

Notably, censoring (removing high-motion volumes) was the only approach that substantially reduced distance-dependent artifacts, but this came at the cost of reduced network identifiability and potential introduction of other biases [27].

Methodological Approaches for Quantifying Residual Motion

The SHAMAN Framework

The Split Half Analysis of Motion Associated Networks (SHAMAN) provides a novel method for computing trait-specific motion impact scores [20]. This approach capitalizes on the temporal stability of traits versus the moment-to-moment variability of motion:

SHAMAN Workflow Analysis: The method measures differences in correlation structure between high-motion and low-motion halves of each participant's fMRI timeseries [20]. When trait-FC effects are independent of motion, the difference between halves is non-significant because traits are stable over time. A significant difference indicates that state-dependent motion impacts the trait's connectivity measurement [20].

Advanced Motion Estimation in Structural Imaging

Beyond functional MRI, rotational coronary angiography faces similar motion challenges in 3D reconstruction. The Projective Information Disentanglement for Motion Estimation (PID-ME) approach addresses motion state inconsistency by separating overlapping projection pixels through a projective average minimal distance model [59]. This method demonstrates that disentangling complex motion signals requires moving beyond traditional pixel-to-pixel mapping models to account for cardiac nonlinear deformation [59].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Methodological Components for Residual Motion Research

Research Component	Function	Example Implementation
Framewise Displacement	Quantifies volume-to-volume head motion	Power FD or Jenkinson FD [13]
Denoising Pipelines	Removes motion-related variance from BOLD signal	ABCD-BIDS, aCompCor, ICA-AROMA [20] [27]
Motion Impact Scoring	Quantifies trait-specific motion bias	SHAMAN framework [20]
Censoring Methods	Removes high-motion volumes from analysis	Framewise displacement thresholding [20] [27]
Global Signal Regression	Reduces global motion-related variance	Regression of whole-brain signal average [27] [13]
Prospective Motion Correction	Real-time adjustment during acquisition	PACE, optical tracking [60]

Efficacy of Mitigation Strategies

Differential Impact on Overestimation vs. Underestimation

Mitigation strategies have divergent effects on directional motion biases. Censoring at FD < 0.2 mm dramatically reduces significant overestimation from 42% to 2% of traits but does not decrease the number of traits with significant underestimation scores [20]. This suggests that:

Motion overestimation may be driven by large, abrupt head movements that are effectively removed by censoring.
Motion underestimation may result from more subtle, persistent motion components that remain after standard denoising and are not adequately addressed by censoring alone [20].

Limitations of Current Denoising Approaches

Even advanced denoising pipelines show differential efficacy across functional conditions. In task-based fMRI, many pipelines fail to balance artifacts between high-motion and low-motion conditions, potentially introducing new biases while removing others [27]. The most effective approaches, including optimized aCompCor and global signal regression, still struggle to eliminate the spurious distance-dependent association between motion and connectivity [27].

Evidence from large-scale datasets confirms that residual motion after denoising remains a substantial threat to inference in functional connectivity research. The systematic nature of this artifact introduces directional biases that can either inflate or obscure trait-FC relationships, with particularly pronounced effects in studies of motion-prone populations. Future methodological development should focus on:

Trait-specific correction approaches that account for the unique motion profile of different clinical and demographic groups.
Integrated prospective and retrospective methods that combine real-time motion correction with advanced post-processing.
Standardized reporting of motion impact scores alongside traditional FC results to enhance interpretation and reproducibility.

The field must move beyond considering motion as a generic nuisance variable and develop more sophisticated frameworks that address its trait-specific, directional influences on inference.

In resting-state functional magnetic resonance imaging (rs-fMRI) research, head motion introduces systematic bias that can significantly distort measurements of functional connectivity (FC) between brain regions. These motion-induced artifacts present a particularly critical challenge in studies investigating trait-functional connectivity (trait-FC) relationships, where researchers seek to understand how individual differences in brain organization correlate with specific characteristics, behaviors, or clinical conditions. The confounding influence of motion is especially pronounced when studying populations that tend to move more during scans, such as children, older adults, or individuals with certain psychiatric or neurological disorders [1]. Without proper correction, motion artifacts can produce both false positive findings (overestimation) by creating spurious brain-behavior associations and false negative findings (underestimation) by obscuring genuine biological relationships [1] [3].

The scientific community has developed numerous methodological approaches to mitigate motion-related artifacts, with volume censoring (also called "scrubbing") emerging as a particularly prominent strategy. Volume censoring involves identifying and excluding motion-contaminated frames from statistical analysis, typically using metrics like framewise displacement (FD) to quantify head movement between consecutive image volumes [61]. However, emerging evidence indicates that censoring techniques do not uniformly affect all trait-FC relationships, with differential impacts on overestimation versus underestimation of effects [1]. This comparative guide synthesizes current experimental evidence to objectively evaluate the efficacy of different censoring approaches, with particular focus on their asymmetric effects on trait-FC research outcomes.

Comparative Performance of Censoring Methods

Quantitative Comparison of Censoring Efficacy

Table 1: Impact of Censoring Thresholds on Motion Overestimation and Underestimation in the ABCD Study

Censoring Threshold (FD in mm)	Traits with Significant Overestimation	Traits with Significant Underestimation	Key Findings
No censoring	42% (19/45 traits)	38% (17/45 traits)	Substantial motion impact in both directions
< 0.2 mm	2% (1/45 traits)	38% (17/45 traits)	Dramatic reduction in overestimation only

Source: Adapted from Kay et al. (2025) analysis of n=7,270 participants from the ABCD Study [1] [3].

Table 2: Multi-Dataset Evaluation of Frame Censoring for Task-Based fMRI

Evaluation Metric	Censoring with Moderate Data Loss (1-2%)	Standard Motion Regressors (RP6/RP24)	Other Methods (WDS, rWLS, uICA)
Maximum group t-statistic	Consistent improvements	Baseline performance	Variable, dataset-dependent
ROI mean activation	Moderate gains	Baseline performance	Comparable gains in some cases
Split-half reliability	Improved consistency	Baseline performance	Mixed results
Spatial overlap (Dice)	Slight improvement	Baseline performance	Method-dependent performance

Source: Adapted from multi-dataset evaluation across 11 tasks and 8 publicly available datasets [61].

Specialized Applications Across Populations and Modalities

Table 3: Censoring Efficacy Across Different Populations and Imaging Modalities

Population/Modality	Optimal Censoring Threshold	Impact on Overestimation	Impact on Underestimation	Key Considerations
Fetal rs-fMRI	1.5 mm FD	Significant reduction	Moderate reduction	Improved neurobiological prediction accuracy (55.2% vs. 44.6%) [62]
Adult task-based fMRI	FD 0.2-0.5 mm (1-2% data loss)	Effective reduction	Minimal impact	Gains comparable to other methods [61]
PET imaging	Data-driven deep learning	Substantial reduction	Substantial reduction	DL-HMC++ outperforms state-of-the-art methods [63]

Experimental Protocols and Methodologies

The SHAMAN Protocol for Quantifying Motion Impact

The Split Half Analysis of Motion Associated Networks (SHAMAN) represents a novel methodological approach specifically designed to quantify trait-specific motion impact scores. The protocol capitalizes on the observation that traits (e.g., cognitive measures, clinical symptoms) remain stable over the timescale of an MRI scan, while motion is a state that varies from second to second [1].

Experimental Workflow:

Data Acquisition: Acquire resting-state fMRI data using standardized protocols (e.g., 144 volumes, TR=3000ms for fetal imaging; 284-1146 frames depending on task for adult studies) [61] [62].
Preprocessing: Apply standardized preprocessing pipelines (e.g., ABCD-BIDS for the Adolescent Brain Cognitive Development Study, including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter regression) [1].
Motion Quantification: Calculate framewise displacement (FD) for each volume.
Data Splitting: Divide each participant's timeseries into high-motion and low-motion halves based on median FD.
Trait-FC Correlation: Compute correlation between trait measures and functional connectivity separately for high-motion and low-motion halves.
Impact Score Calculation: Calculate motion impact score as the difference between trait-FC correlations in high-motion versus low-motion halves.
Directionality Assessment:
- Motion Overestimation: Impact score aligned with direction of trait-FC effect
- Motion Underestimation: Impact score opposite to direction of trait-FC effect
Statistical Testing: Use permutation testing and non-parametric combining across connections to determine significance [1].

Multi-Dataset Evaluation Protocol for Task-Based fMRI

A comprehensive evaluation of frame censoring was conducted across eight publicly available datasets representing 11 distinct tasks in child, adolescent, and adult participants [61]:

Experimental Parameters:

Comparison Methods: Frame censoring (using FD and DVARS thresholds) versus six canonical motion regressors (RP6), 24-term expansions (RP24), wavelet despiking (WDS), robust weighted least squares (rWLS), and untrained ICA (uICA).
Performance Metrics: Maximum group t-statistics (whole-brain and ROI), mean ROI activation values, split-half reliability, and spatial overlap of thresholded statistical maps.
Censoring Implementation: Frames exceeding specific FD thresholds (typically 0.2-0.5mm) were excluded using scan-nulling regressors in the general linear model.
Analysis Pipeline: Implemented using Automatic Analysis version 5.4.0 with reproducible workflows.

Visualizing Methodological Approaches and Workflows

SHAMAN Analytical Methodology

Differential Efficacy of Censoring Thresholds

The Researcher's Toolkit: Essential Materials and Methods

Table 4: Essential Research Reagents and Computational Tools for Motion Censoring Studies

Tool/Resource	Type	Primary Function	Application Context
ABCD-BIDS Pipeline	Software	Default denoising for large-scale studies; includes global signal regression, respiratory filtering, motion parameter regression	Large cohort studies (e.g., ABCD with n=7,270) [1]
Framewise Displacement (FD)	Metric	Quantifies head motion between consecutive image volumes; used to identify frames for censoring	All fMRI modalities (rs-fMRI, task-fMRI)
SHAMAN Algorithm	Analytical Tool	Quantifies trait-specific motion impact scores; distinguishes overestimation vs underestimation	Trait-FC association studies [1]
Bioimage Suite	Software	Fetal motion correction with rigid body transformation; calculates motion regressors	Fetal rs-fMRI studies [62]
DL-HMC++	Deep Learning Tool	Data-driven PET head motion correction using cross-attention mechanisms; eliminates need for hardware tracking	PET imaging studies [63]
andl-datasets Python Package	Simulation Tool	Generates realistic simulated trajectory data for benchmarking motion analysis methods	Method development and validation [64]
Automatic Analysis v5.4.0	Pipeline Tool	Reproducible workflow implementation for multi-dataset method comparisons	Cross-study method evaluation [61]

The empirical evidence synthesized in this comparison guide demonstrates that censoring techniques exert differential effects on overestimation versus underestimation biases in trait-FC research. While censoring at appropriate thresholds (typically FD < 0.2mm for rs-fMRI) effectively mitigates false positive findings resulting from motion overestimation, it shows limited efficacy for addressing false negative findings caused by motion underestimation [1] [3]. This asymmetric efficacy has profound implications for study design and interpretation in neuroimaging research.

For researchers investigating traits associated with motion (e.g., psychiatric disorders, neurodevelopmental conditions), implementing the SHAMAN methodology or similar approaches to quantify motion impact scores is essential for validating findings. The current evidence supports a multi-pronged approach combining nuisance regression with moderate censoring (1-2% data loss) as optimal for most research contexts [61] [62]. However, investigators must recognize that censoring alone does not fully address underestimation biases and should interpret null findings with appropriate caution when studying motion-correlated traits. Future methodological developments should focus on techniques that simultaneously address both overestimation and underestimation artifacts to advance the reliability of trait-FC research.

Comparative Performance of ICA-AROMA, aCompCor, and Volume Censoring

Resting-state functional magnetic resonance imaging (rs-fMRI) has become a cornerstone of modern neuroscience for mapping large-scale brain networks. However, estimates of functional connectivity (FC) derived from rs-fMRI are exquisitely sensitive to confounding artifacts, particularly those arising from in-scanner head motion and physiological noise [65] [21]. The presence of these artifacts can systematically bias connectivity measures, potentially leading to spurious brain-behavior associations in research studies [1]. This challenge is especially acute in clinical and developmental populations where motion often correlates with the traits of interest, creating a persistent confound that can drive both false positive and false negative findings [1] [66].

In response to this challenge, numerous retrospective denoising methods have been developed. Among these, ICA-AROMA (Independent Component Analysis-based Automatic Removal Of Motion Artifacts), aCompCor (anatomical Component Based Noise Correction), and volume censoring (also known as "scrubbing") have emerged as prominent strategies with distinct theoretical approaches and practical implications [21] [67] [66]. ICA-AROMA employs a data-driven approach to identify and remove motion-related components from fMRI data [67]. aCompCor applies principal component analysis to signals from noise regions of interest (white matter and cerebrospinal fluid) to derive nuisance regressors [65] [66]. Volume censoring takes a more direct approach by identifying and removing motion-contaminated volumes from the analysis [21] [66].

Understanding the relative performance of these methods is crucial for researchers investigating trait-FC relationships, particularly in the context of motion-related artifacts that can lead to either overestimation or underestimation of true effects [1]. This comparison guide synthesizes evidence from multiple empirical evaluations to provide an objective assessment of each method's efficacy, reliability, and suitability for different research contexts.

Methodological Approaches

Experimental Protocols for Method Evaluation

Studies evaluating denoising methods typically employ standardized benchmarks to assess performance across multiple dimensions. The most common evaluation framework involves comparing denoising pipelines according to several quality control metrics [21] [66]:

Residual motion-connectivity relationship: Measures the remaining correlation between head motion (e.g., framewise displacement) and functional connectivity after denoising, with lower values indicating better performance [21] [66].
Distance-dependent artifact assessment: Evaluates whether motion artifacts disproportionately affect short-range versus long-range connections, which can systematically bias network topology [66].
Network identifiability: Assesses how clearly known resting-state networks can be identified after denoising, often quantified by calculating the difference between within-network and between-network connectivity [68] [66].
Test-retest reliability: Measures the reproducibility of connectivity metrics across repeated scanning sessions [68] [21].
Temporal degrees of freedom (tDOF) loss: Quantifies the reduction in statistical power due to the removal of time points or the addition of nuisance regressors [68] [67].
Biological sensitivity: Evaluates the method's impact on detecting clinically or biologically relevant group differences in functional connectivity [21].

These benchmarks are typically applied across multiple datasets with varying motion characteristics and participant populations, including healthy controls, clinical samples, and different age groups to ensure generalizability [21] [67].

Key Research Reagents and Tools

Table 1: Essential Research Tools for fMRI Denoising Method Implementation

Tool Name	Primary Function	Implementation Details
ICA-AROMA	Automatic identification and removal of motion-related ICA components	Uses four features: edge fraction, CSF fraction, high-frequency content, and correlation with motion parameters [67].
aCompCor	Noise correction via PCA signals from WM/CSF masks	Derives noise estimates from anatomical ROIs; requires high-quality tissue segmentation [65] [66].
Volume Censoring	Removal of high-motion volumes from analysis	Typically uses framewise displacement (FD) threshold (e.g., FD > 0.2-0.5mm); may involve interpolation [21] [66].
Framewise Displacement (FD)	Quantifies volume-to-volume head motion	Calculated from translational and rotational realignment parameters; critical for censoring decisions [1] [21].
Global Signal Regression (GSR)	Removal of whole-brain average signal	Often combined with other methods; controversial due to potential removal of neural signal [65] [21].

Performance Comparison

Quantitative Efficacy Metrics

Table 2: Comparative Performance Across Key Benchmarks

Performance Metric	ICA-AROMA	aCompCor	Volume Censoring
Motion Artifact Removal	High efficacy; minimizes motion-connectivity relationships [67] [21]	Moderate efficacy; may leave residual motion artifacts in high-motion data [21] [66]	High efficacy; particularly effective for sudden motion spikes [21] [66]
Network Identifiability	Improved network reproducibility and identifiability [68] [67]	Variable performance; may reduce network contrast in high-motion data [66]	Good network identifiability when sufficient data remains [21]
Distance-Dependence	Moderate distance-dependent effects [66]	Minimal distance-dependent effects when used without GSR [66]	Minimal distance-dependent effects [66]
tDOF Loss	Limited loss (preserves temporal structure) [67]	Limited loss (uses regressors rather than removing data) [66]	Substantial and variable loss (directly removes volumes) [21] [67]
Biological Sensitivity	Preserves age-related FC differences in aging studies [65]	Associated with higher age-related FC differences [65]	Can bias samples by excluding high-motion individuals [21]

Differential Impact on Trait-FC Effects

The choice of denoising method has profound implications for trait-FC research, particularly regarding motion-induced biases. Recent research introducing the Motion Impact Score and SHAMAN framework has highlighted how residual motion artifacts can lead to both overestimation and underestimation of trait-FC effects [1].

In studies of participants with traits correlated with motion (e.g., certain psychiatric disorders or developmental conditions), ICA-AROMA demonstrates a balanced approach, effectively reducing motion artifacts while preserving neuronal signals [67]. The aggressive variant of ICA-AROMA has shown particularly high network reproducibility in older adult populations, making it suitable for longitudinal studies of aging and neurodegeneration [68].

aCompCor shows variable performance depending on motion characteristics. While effective in low-motion data, it may leave residual artifacts in high-motion datasets [21] [66]. Notably, aCompCor is associated with relatively higher age-related FC differences, though whether this reflects better preservation of neuronal signals or residual artifact remains debated [65].

Volume censoring is highly effective at removing motion artifacts but introduces significant methodological challenges. By excluding high-motion individuals from analysis, it can systematically bias sample composition and reduce statistical power [21]. Censoring at FD < 0.2 mm has been shown to reduce motion overestimation effects significantly (from 42% to 2% of traits) but does not decrease motion underestimation effects [1].

The following diagram illustrates the decision framework for selecting an appropriate denoising method based on research goals and data characteristics:

Discussion

Contextual Recommendations for Method Selection

The comparative evidence indicates that no single denoising method achieves perfect performance across all benchmarks, necessitating careful selection based on research context [21].

ICA-AROMA represents a favorable balance for most general research applications, particularly when studying populations with moderate motion. Its strong motion removal capabilities combined with preserved tDOF make it statistically efficient [67]. The method's automatic classification without need for retraining across datasets enhances reproducibility and practical implementation [68] [67]. For aging studies and longitudinal designs, the aggressive variant of ICA-AROMA has demonstrated superior network reproducibility [68].

aCompCor may be preferable in studies where distance-dependent artifacts are a primary concern or when investigating biological traits strongly correlated with motion [66]. However, researchers should exercise caution when applying aCompCor in high-motion datasets, as its performance may be insufficient to fully mitigate motion artifacts [21]. The method's association with higher age-related FC differences suggests potential utility in aging research, though the neuronal versus artifactual nature of these differences requires careful interpretation [65].

Volume censoring remains a powerful approach for removing severe motion artifacts, particularly sudden motion spikes [21]. However, its substantial and variable reduction in tDOF, coupled with the potential for systematic exclusion of high-motion participants, limits its utility in many research contexts [1] [21]. Volume censoring is most appropriately deployed when data quantity is sufficient to withstand substantial frame removal without compromising statistical power, or when used in combination with other methods for severe motion cases.

Implications for Trait-FC Research

The motion overestimation versus underestimation framework highlights critical considerations for trait-FC research [1]. Methods that more aggressively remove motion artifacts may simultaneously remove neural signals of interest, potentially leading to underestimation of true trait-FC relationships [65] [1]. Conversely, methods that preserve more neural signal may leave residual motion artifacts, potentially causing overestimation of effects in motion-correlated traits [1].

This trade-off necessitates careful matching of denoising approaches to specific research questions. For traits known to correlate with motion (e.g., ADHD, autism), more aggressive denoising with methods like ICA-AROMA or volume censoring may be necessary to avoid spurious findings [1] [67]. For traits unrelated to motion, less aggressive approaches may better preserve true biological signals.

Future methodological development should focus on optimizing the balance between artifact removal and signal preservation, potentially through combination approaches or context-specific parameter tuning. Additionally, standardized reporting of denoising methodologies and motion diagnostics will enhance comparability across studies and facilitate more accurate interpretation of trait-FC relationships.

The Sensitivity of Case-Control Differences to Preprocessing Choices

Case-control studies represent a fundamental methodological approach in biomedical research, particularly for investigating rare diseases or establishing initial associations between risk factors and outcomes. The core of this design involves comparing patients with a disease or outcome of interest (cases) to patients without the disease (controls), looking back retrospectively to compare exposure to risk factors [69]. While this design offers significant advantages for studying conditions that would be impractical to investigate through cohort studies, its validity hinges on appropriate methodological execution, particularly during preprocessing stages where cases and controls are selected and matched.

The sensitivity of case-control differences to preprocessing choices becomes especially critical when studying traits associated with motion, such as psychiatric disorders, where head motion can systematically bias functional connectivity measurements [1]. This article explores how preprocessing decisions—from denoising algorithms to motion censoring thresholds—can lead to either overestimation or underestimation of trait-functional connectivity effects, potentially generating spurious findings or masking genuine biological relationships. We frame this discussion within the broader thesis of motion impact assessment in functional connectivity research, providing researchers with practical guidance for optimizing their case-control preprocessing pipelines.

Theoretical Foundations of Case-Control Studies

Core Principles and Design Considerations

Case-control studies are observational investigations that identify subjects based on their outcome status rather than exposure status. Cases are individuals with the disease or condition of interest, while controls are individuals from the same source population who do not have the disease [70]. The critical design principle is that controls should represent the population that gave rise to the cases; they "should" have been included as cases had they developed the disease [70]. This fundamental concept guides all preprocessing decisions in case-control matching.

The key advantage of case-control designs lies in their efficiency for studying rare conditions, as researchers need not follow large cohorts over extended periods to observe sufficient outcome events [69]. Additionally, this design allows for simultaneous investigation of multiple risk factors and can establish initial associations that inform subsequent research. However, the retrospective nature introduces vulnerabilities, particularly recall bias and confounding, which preprocessing strategies must address [69].

Common Pitfalls in Control Selection

Improper control selection represents one of the most significant threats to case-control validity. Common issues include:

Population mismatch: Controls that do not represent the source population of cases, such as comparing cases with a mean age of 73 years to controls with a mean age of 26 years [70]
Overmatching: Matching cases and controls on too many characteristics, which can obscure genuine associations
Inadequate description: Failure to thoroughly document how both cases and controls were selected [70]
Exposure-dependent sampling: Selecting controls based on characteristics correlated with exposure status

These pitfalls are particularly problematic in neuroimaging research, where motion-prone populations (e.g., children, older adults, psychiatric patients) may systematically differ from controls on the very variables that introduce artifact into imaging data [1].

The Motion Artifact Challenge in Functional Connectivity Research

Systematic Bias from In-Scanner Head Motion

In-scanner head motion represents the largest source of artifact in functional MRI signals, introducing systematic bias to resting-state functional connectivity (FC) measurements that persists despite denoising algorithms [1] [3]. This bias manifests spatially in characteristic patterns, notably decreasing long-distance connectivity while increasing short-range connectivity, most prominently in the default mode network [1].

The problem is particularly acute in case-control studies of populations with inherent motion differences, such as children with neurodevelopmental disorders versus typically developing controls. Early studies spuriously attributed decreased long-distance FC in autism to neural mechanisms when the effects were actually driven by increased head motion in autistic participants [1]. This cautionary example underscores how motion artifacts can generate false positive findings that misdirect scientific understanding.

Quantifying Motion Impact: The SHAMAN Framework

The Split Half Analysis of Motion Associated Networks (SHAMAN) framework was developed to assign a motion impact score to specific trait-FC relationships [1] [3]. SHAMAN capitalizes on the observation that traits (e.g., weight, intelligence) remain stable over the timescale of an MRI scan, while motion varies from second to second. The method measures differences in correlation structure between split high- and low-motion halves of each participant's fMRI timeseries [1].

SHAMAN distinguishes between two types of motion impact:

Motion overestimation score: Motion impact aligned with the direction of trait-FC effects, causing inflation of effect sizes
Motion underestimation score: Motion impact opposite the direction of trait-FC effects, causing masking of genuine relationships

In assessments of 45 traits from n=7,270 participants in the Adolescent Brain Cognitive Development (ABCD) Study, SHAMAN revealed that after standard denoising without motion censoring, 42% (19/45) of traits had significant (p<0.05) motion overestimation scores and 38% (17/45) had significant underestimation scores [1] [3].

Preprocessing Choices and Their Impact on Case-Control Differences

Motion Denoising Strategies

Multiple approaches have been developed to mitigate motion artifact in functional connectivity data, including global signal regression, motion parameter regression, spectral filtering, respiratory filtering, principal component analysis, independent component analysis, multi-echo pulse sequences, and despiking of high-motion frames [1]. The ABCD-BIDS pipeline incorporates many of these approaches, including global signal regression, respiratory filtering, spectral filtering, despiking, and motion parameter timeseries regression [1].

Table 1: Effectiveness of Denoising Strategies on Motion Artifact Reduction

Processing Approach	Variance Explained by Motion	Relative Reduction vs. Minimal Processing
Minimal processing (motion correction only)	73%	Baseline
ABCD-BIDS denoising	23%	69%
ABCD-BIDS + censoring (FD < 0.2mm)	Not reported	Additional reduction

Despite sophisticated denoising, residual motion artifact persists. After minimal processing (motion correction only), 73% of signal variance was explained by head motion. ABCD-BIDS denoising achieved a 69% relative reduction, leaving 23% of variance still explained by motion [1]. This residual artifact continues to systematically influence case-control differences, particularly for motion-correlated traits.

Motion Censoring Thresholds

Censoring (removing high-motion frames from analysis) represents a complementary approach to denoising. The framewise displacement (FD) threshold chosen for censoring represents a critical preprocessing choice with profound implications for case-control differences.

Table 2: Impact of Censoring Threshold on Motion Overestimation and Underestimation

Censoring Threshold	Traits with Significant Overestimation	Traits with Significant Underestimation
No censoring	42% (19/45 traits)	38% (17/45 traits)
FD < 0.2mm	2% (1/45 traits)	38% (17/45 traits)

Censoring at FD < 0.2mm dramatically reduced significant overestimation from 42% to just 2% of traits. However, it did not decrease the number of traits with significant motion underestimation scores, which remained at 38% [1] [3]. This differential impact highlights the complex relationship between preprocessing choices and directional bias in case-control differences.

Case-Control Matching Approaches

The algorithm used for case-control matching represents another critical preprocessing choice. Two primary approaches exist:

Greedy matching: Randomly sorts cases and controls, matching the first case with the closest control using the smallest distance measure, repeating until all cases are matched
Optimal matching: Produces the optimal set of matches by minimizing the total distance across all case-control pairs [71]

Optimal matching guarantees the closest available control is selected, while greedy matching produces good matches but doesn't guarantee minimization of total distance [71]. Additionally, the decision to sample controls with or without replacement affects statistical efficiency, with without-replacement sampling preferred when large control pools are available [71].

Experimental Protocols for Assessing Preprocessing Impact

The SHAMAN Methodology

The SHAMAN framework provides a rigorous approach for quantifying motion's impact on specific trait-FC relationships:

Data Acquisition: Collect resting-state fMRI data with associated motion parameters (framewise displacement) and trait measurements
Data Splitting: For each participant, split the fMRI timeseries into high-motion and low-motion halves based on median framewise displacement
Connectivity Calculation: Compute functional connectivity matrices separately for high-motion and low-motion halves
Trait-FC Effect Estimation: Calculate trait-FC effects separately for high-motion and low-motion halves using regression models
Motion Impact Score Calculation: Compare trait-FC effects between high-motion and low-motion halves across participants
Statistical Testing: Use permutation testing and non-parametric combining across pairwise connections to derive significance values for motion impact scores [1]

This protocol enables researchers to determine whether specific trait-FC relationships in their case-control studies are vulnerable to motion-related bias and the direction of that bias.

Optimal Case-Control Matching Protocol

For robust case-control matching in observational studies:

Define Matching Variables: Identify exact matching variables (e.g., gender, practice) and varying matching variables (e.g., age, comorbidity index)
Calculate Distance Metric: Compute distance between each case and potential control using weighted sum of absolute differences: Dij = Σ|xⁱᵢₖ - x⁰ⱼₖ| × Wₖ
Implement Matching Algorithm: Apply optimal matching algorithm to minimize total distance across all case-control pairs
Assess Balance: Evaluate distribution of matching variables between cases and controls post-matching
Sensitivity Analysis: Conduct multiple matching scenarios with different variable configurations to assess robustness of findings [71]

This protocol emphasizes the importance of selecting controls from the same source population as cases and using optimal rather than greedy matching to maximize statistical efficiency.

Visualizing Preprocessing Workflows and Motion Impact

Case-Control Matching with Motion Impact Assessment

Figure 1: Integrated workflow for case-control studies with motion impact assessment

Motion Impact on Trait-FC Effects

Figure 2: Pathways through which motion artifact influences trait-FC effect estimation

Research Reagent Solutions for Motion-Resilient Case-Control Studies

Table 3: Essential Tools for Motion-Impact-Aware Case-Control Studies

Research Tool	Function	Implementation Examples
SHAMAN Framework	Quantifies motion impact on specific trait-FC relationships	Split-half analysis of high-motion vs. low-motion frames; motion overestimation/underestimation scores
Optimal Matching Algorithms	Maximizes case-control similarity on confounding variables	ccoptimalmatch R package; distance minimization without replacement
Motion Denoising Pipelines	Reduces motion-related artifact in fMRI data	ABCD-BIDS pipeline; global signal regression; motion parameter regression
Motion Censoring Approaches	Removes high-motion frames from analysis	Framewise displacement (FD) thresholds; volume censoring
Quality Control Metrics	Identifies low-quality data requiring exclusion	Framewise displacement; DVARS; mitochondrial read fraction

Preprocessing choices in case-control studies exert profound influence on resulting trait-FC differences, particularly through motion-related artifacts that can systematically bias findings. The tension between removing motion-contaminated data to reduce false positives and retaining sufficient data to avoid selection bias requires careful consideration of denoising strategies, censoring thresholds, and matching approaches [1].

The SHAMAN framework provides a methodological advance by quantifying motion impact directionality—distinguishing between overestimation and underestimation of effects—allowing researchers to determine whether their specific trait-FC relationships are vulnerable to motion artifacts [1] [3]. Combined with optimal case-control matching approaches that maximize similarity between groups while maintaining statistical efficiency [71], these methods enable more rigorous and reproducible case-control comparisons in functional connectivity research.

For researchers studying motion-correlated traits, particularly in psychiatric neuroscience, incorporating motion impact assessment directly into preprocessing workflows represents a critical step toward distinguishing genuine neurobiological relationships from methodological artifacts. Future methodological development should focus on integrated approaches that simultaneously optimize case-control matching and motion artifact mitigation while maintaining statistical power and biological interpretability.

{Abstract} Functional connectivity (FC) has emerged as a potential "fingerprint" capable of identifying individuals from a population [72]. However, in-scanner head motion and physiological noise induce systematic biases in FC, posing a critical challenge for trait-based research [20] [12]. This guide compares methodologies for achieving subject identifiability, focusing on their efficacy in separating neural signals from motion artifacts. We frame this within the critical context of motion overestimation and underestimation in trait-FC effects, providing researchers with objective data and protocols to navigate this complex landscape.

{1. The Promise and The Peril: Identifiability and Motion Artifacts}

The foundational work by Finn et al. (2015) demonstrated that whole-brain functional connectivity profiles act as a reliable fingerprint, achieving over 99% identification accuracy between resting-state scans [72] [73]. Notably, frontoparietal and default mode networks were found to be most distinctive for identification [72] [73]. This individual fingerprint is intrinsic, persisting across different brain states, including between rest and task conditions [72].

However, this promising individuating signal is confounded by non-neural physiological processes and head motion. These artifacts are also highly subject-specific and can themselves yield above-chance identifiability [12]. For instance, head motion systematically alters FC, typically reducing long-distance connections and increasing short-range ones [20]. When studying traits correlated with motion propensity (e.g., certain psychiatric disorders), this can lead to two types of spurious findings:

Motion Overestimation: Residual motion artifact causes an overestimation of a trait-FC relationship.
Motion Underestimation: Artifact leads to an underestimation of the true trait-FC effect [20] [3].

A study of 45 traits in the ABCD dataset found that after standard denoising, 42% of traits had significant motion overestimation scores, and 38% had significant underestimation scores [20]. This underscores the pervasive risk of motion in confounding brain-behavior associations.

{2. Quantitative Comparison of Identification Performance and Motion Impact}

The table below summarizes key performance metrics for different identification approaches and the measured impact of motion artifact.

Table 1: Performance Metrics of Identification Methods & Motion Impact

Metric	Finn et al. (2015) - Original Framework [72]	CVAE with SDL - Refined Connectomes [73]	Motion Impact (SHAMAN Analysis) [20]
Rest1-Rest2 ID Accuracy	94.4% - 99% (depending on networks used)	99.6% - 99.7%	Not Applicable
Task-Task ID Accuracy	54% - 87.3%	94.2% - 98.8%	Not Applicable
Most Discriminative Networks	Medial Frontal & Frontoparietal	Frontoparietal & Default	Not Applicable
Traits with Motion Overestimation	Not Assessed	Not Assessed	42% (19/45 traits)
Traits with Motion Underestimation	Not Assessed	Not Assessed	38% (17/45 traits)
Overestimation after Censoring (FD<0.2mm)	Not Assessed	Not Assessed	Reduced to 2% (1/45 traits)

{3. Core Methodologies and Experimental Protocols}

{3.1 The Foundational Identification Protocol} The standard protocol for establishing subject identifiability involves a cross-session matching process [72] [74].

Data Acquisition & Preprocessing: Acquire multiple fMRI sessions (e.g., rest, tasks) per subject. Preprocess data with motion correction, normalization, and nuisance regression (e.g., global signal, motion parameters).
Connectome Generation: Parcellate the brain into defined regions (e.g., 268-node atlas [72]). For each session, calculate a functional connectivity matrix using Pearson correlation between the time series of all node pairs.
Cross-Session Identification: Iteratively compare a "target" connectivity matrix from one session against a "database" of matrices from a different session. The similarity is typically measured using the Pearson correlation between the vectorized edges of the matrices. A match is declared if the highest similarity is with the matrix from the same subject [72].

The following diagram illustrates this core workflow and the underlying neural-vs-artifact question.

Diagram 1: Core identification workflow and signal composition.

{3.2 The SHAMAN Protocol for Quantifying Motion Impact} The Split Half Analysis of Motion Associated Networks (SHAMAN) is a recently developed method to assign a motion impact score to specific trait-FC relationships [20].

Data Preparation: Start with preprocessed fMRI time series. Split each subject's time series into high-motion and low-motion halves based on framewise displacement (FD).
Trait-FC Effect Estimation: Calculate the trait-FC effect (e.g., the correlation between a behavioral trait and the strength of each functional connection) separately for the high-motion and low-motion halves of the data.
Motion Impact Score Calculation: Compute the difference between the trait-FC effects from the high-motion and low-motion halves. A score aligned with the trait-FC effect's direction indicates motion overestimation; a score in the opposite direction indicates motion underestimation.
Statistical Significance: Use permutation testing and non-parametric combining across connections to determine if the motion impact score is statistically significant [20].

Diagram 2: SHAMAN workflow for motion impact scoring.

{4. The Scientist's Toolkit: Key Reagents and Solutions} This table outlines essential methodological "reagents" for robust identifiability and artifact mitigation research.

Table 2: Research Reagent Solutions for fMRI Identifiability Studies

Tool / Solution	Function / Purpose	Key Considerations
High-Quality Dataset (HCP, ABCD)	Provides test-retest data with high spatial/temporal resolution and behavioral metrics essential for validation [72] [20].	HCP focuses on healthy adults; ABCD is a large-scale pediatric cohort.
Framewise Displacement (FD)	Quantifies volume-to-volume head motion, used for censoring and evaluating data quality [20].	A common censoring threshold is FD < 0.2 mm [20].
ICA-AROMA	A robust, ICA-based strategy for automatic removal of motion artifacts from fMRI data without reducing temporal degrees of freedom [75].	Preserves data autocorrelation structure; is generalizable across datasets without re-training.
Structured Low-Rank Matrix Completion	A advanced computational method to recover missing entries from censored fMRI data, mitigating discontinuities and improving FC estimates [76].	More computationally intensive than simple censoring; helps recover usable data.
Conditional Variational Autoencoder (CVAE)	A deep learning approach to refine connectomes by enhancing inter-subject variability, boosting both ID accuracy and behavior prediction [73].	Requires significant computational resources and expertise; represents a state-of-the-art processing technique.

{5. Discussion and Synthesis} Achieving true subject identifiability requires carefully disentangling neural signatures from motion artifacts. While methods like CVAE with SDL can push identification accuracy near perfection [73], this high accuracy is only meaningful for neurobehavioral inference if motion confounds are controlled. The SHAMAN framework provides a crucial tool for diagnosing motion's spurious effects on specific trait-FC relationships, revealing a high prevalence of both overestimation and underestimation [20].

Aggressive motion censoring (e.g., FD < 0.2 mm) effectively mitigates overestimation but does not resolve underestimation and can bias samples by excluding high-motion individuals [20]. Therefore, a combination of rigorous denoising (e.g., ICA-AROMA [75]), structured matrix completion [76], and post-hoc motion impact assessment (e.g., SHAMAN [20]) is recommended for robust, interpretable subject identifiability and brain-behavior association studies.

Conclusion

The distinction between motion-induced overestimation and underestimation is paramount for the integrity of trait-FC research. Overestimation poses a clear threat of false positives, while underestimation can obscure genuine neurobiological relationships, particularly in clinical populations. Methodological advances like the SHAMAN framework provide crucial tools for diagnosing these specific biases. However, no single denoising pipeline offers a perfect solution; optimal strategy depends on the specific research context and requires a careful balance between artifact removal and signal preservation. For the future of clinical neuroscience and drug development, these findings underscore the necessity of robust, transparent motion-handling protocols. Moving forward, the field must prioritize methods that account for trait-motion dependencies to ensure that neuroimaging biomarkers are reliable and valid for informing therapeutic development.