Why Missing Data Threatens Digital Health
Imagine if your smartphone could detect signs of depression before you could. This isn't science fiction—it's the promise of digital phenotyping, a revolutionary approach that uses smartphone sensors to study human behavior and health.
By analyzing data from GPS, accelerometers, and other sensors, researchers can gain unprecedented insights into our mental health, mobility patterns, and daily behaviors.
But what happens when this data is incomplete? New research reveals a troubling answer: the data missing from our digital footprints isn't random. It follows patterns influenced by sociodemographic factors and technology choices, creating invisible biases that could undermine the very promise of equitable digital medicine.
Digital phenotyping represents a groundbreaking shift in how researchers study human behavior and health.
Unlike traditional studies that rely on periodic surveys or clinical visits, digital phenotyping collects data continuously and unobtrusively as people go about their daily lives 1 .
GPS
Mobility patternsAccelerometer
Physical activityMicrophone
Speech patternsUsage Logs
Social behaviorOccurs when researchers intentionally configure sensors to sample intermittently to preserve battery life 1 .
When data that should be collected isn't, due to behavioral factors (users disabling sensors) or technological limitations 1 .
This missing data isn't merely a technical inconvenience—it represents a fundamental challenge to research validity. When data gaps follow systematic patterns rather than occurring randomly, they can skew research findings and potentially reinforce health disparities if certain groups are systematically underrepresented in digital studies 1 .
To understand the scope and patterns of missing data in digital phenotyping, researchers conducted a comprehensive meta-study analyzing individual-level data from six different studies 1 .
Participants
Person-Days
Measurements
The research team analyzed timestamp metadata from 211 participants, representing over 29,500 person-days of observation and 8.3 billion individual measurements from accelerometer and GPS sensors 1 .
iOS users had significantly lower GPS non-collection rates compared to Android users, suggesting platform-level differences in how sensors are managed or accessed 1 .
For accelerometer data, Black participants had higher rates of non-collection, while no significant differences were found for GPS data based on race/ethnicity 1 .
| Factor | Impact on GPS Data | Impact on Accelerometer Data |
|---|---|---|
| Operating System | iOS users had lower non-collection | Not specified |
| Race/Ethnicity | No significant differences | Black participants had higher non-collection |
| Education | No significant differences | No significant differences |
| Age | No significant differences | No significant differences |
| Gender | No significant differences | No significant differences |
| Study Duration | 0.5-0.9% increase per week | 0.5-0.9% increase per week |
For both sensors, non-collection increased steadily over time—by approximately 0.5% to 0.9% per week—suggesting participant engagement gradually declines throughout studies 1 .
The systematic patterns in missing data have profound implications for the future of digital phenotyping and its applications in healthcare.
When data isn't missing randomly but follows sociodemographic patterns, research findings can become biased and unrepresentative. If certain groups systematically provide less data, algorithms trained on that data may perform poorly for those populations, potentially exacerbating health disparities rather than alleviating them 1 .
These missing data patterns create what might be called "digital blind spots"—gaps in our understanding that systematically exclude certain populations or behaviors. Just as urban planners need complete maps to design effective transportation systems, healthcare researchers need complete digital footprints to build accurate models of human behavior and health 1 .
The finding that Black participants had higher accelerometer non-collection rates highlights how technology can potentially perpetuate existing inequities if these patterns aren't identified and addressed.
Conducting rigorous digital phenotyping research requires specialized tools and approaches to manage the unique challenges of smartphone data collection.
| Tool Category | Purpose | Examples |
|---|---|---|
| Research Platforms | Data collection from smartphones | Beiwe Research Platform 1 |
| Statistical Methods | Analyzing missing data patterns | Bayesian hierarchical negative binomial regression 1 |
| Data Visualization | Identifying patterns in complex datasets | StatFaRmer for plant phenotyping (analogous tools for human data) 4 |
| Privacy Protection | Managing identifiable timestamp data | Timestamp masking and noise addition techniques |
The discovery that missing data in digital phenotyping follows sociodemographic patterns represents both a challenge and an opportunity. It complicates the dream of perfectly objective digital biomarkers, but also pushes the field toward more sophisticated and equitable approaches.
As digital phenotyping evolves, researchers must develop strategies to address these gaps—whether through improved engagement techniques, statistical methods to account for missing data, or study designs that proactively include diverse populations.
The ultimate goal remains unchanged: harnessing the power of digital technology to understand human health in all its complexity, for all people.
The path forward requires acknowledging that our digital footprints, like our healthcare systems, must be continuously refined to ensure they serve everyone equally. Only then can the promise of digital phenotyping be fully realized in creating a healthier, more understood world.
This article was based on the study "Sociodemographic characteristics of missing data in digital phenotyping" published in Scientific Reports (2021) and other recent research in the field.