Google makes wearable health data easier to understand
- August 4, 2025
- Steve Rogerson

Google Research is working to make health data from wearables more understandable and actionable.
It has established a foundation for unlocking the understanding of wearable sensor data through natural language, enabled by a hierarchical captioning pipeline and the largest sensor-language dataset to date.
The SensorLM family of models is said to represent a major advance in making personal health data understandable and actionable. By teaching AI to comprehend the language of bodies, it hopes to move beyond simple metrics and towards truly personalised insights.
“Looking forward, we plan to scale pre-training data into new domains, including metabolic health and detailed sleep analysis, to address the messy reality of consumer health devices,” said Google researchers Yuzhe Yang and Kumar Ayush. “We envision SensorLM leading to a future generation of digital health coaches, clinical monitoring tools and personal wellness applications that can offer advice through natural language query, interaction and generation.”
SensorLM is a family of sensor-language foundation models trained on 60 million hours of data, connecting multimodal wearable sensor signals to natural language for a deeper understanding of health and activities.
Wearable devices, from smartwatches to fitness trackers, have become ubiquitous, continuously capturing a rich stream of data about people’s lives. They record heart rate, count steps, track our fitness and sleep, and much more. This deluge of information holds immense potential for personalised health and wellness.
“However, while we can easily see what our body is doing, for example a heart rate of 150bpm, the crucial context of why (say, a brisk uphill run versus a stressful public speaking event) is often missing,” said the researchers. “This gap between raw sensor data and their real-world meaning has been a major barrier to unlocking the full potential of these devices.”
The primary challenge lies in the scarcity of large-scale datasets that pair sensor recordings with rich, descriptive text. Manually annotating millions of hours of data is prohibitively expensive and time-consuming. To solve this, and to let wearable data speak for itself, models are needed that can learn the intricate connections between sensor signals and human language directly from the data.
In “SensorLM: Learning the Language of Wearable Sensors” (arxiv.org/abs/2506.09108), the researchers present SensorLM, a family of sensor-language foundation models that bridges this gap. Pre-trained on a 59.7 million hours of multimodal sensor data from over 103,000 individuals, SensorLM learns to interpret and generate nuanced, human-readable descriptions from high-dimensional wearable data.
To create the sensor dataset needed for SensorLM, the researchers sampled nearly 2.5 million person-days of de-identified data from 103,643 people across 127 countries. These data were collected between March and May 2024, from Fitbit or Pixel Watch devices, with participants consenting to the use of their de-identified data for research to contribute to general knowledge about health and science.
To overcome the annotation bottleneck, the researchers developed a hierarchical pipeline that automatically generates descriptive text captions by calculating statistics, identifying trends and describing events from the sensor data themselves. This process allowed them to curate the largest-known sensor-language dataset to date, orders of magnitude larger than those used in previous studies.
“We evaluated SensorLM on a wide range of real-world tasks in human activity recognition and healthcare,” said the Google (research.google) researchers. “The results demonstrate significant advances over previous state-of-the-art models. Our experiments also revealed that SensorLM’s performance consistently improves with more data, larger model sizes and increased computation, aligning with established scaling laws. This sustained growth suggests we have only scratched the surface of what is possible with large-scale sensor-language pre-training, indicating that further investigation into this paradigm is highly valuable.”


