More Personalization Didn't Help: A 2026 Pilot of an Adaptive Stress App

"Adaptive" and "personalized" are the words health-tech pitch decks reach for when they want to sound like the next generation of behavior change. The premise is intuitive. A system that learns your patterns and delivers the right intervention at the right moment should beat a static app that pings everyone at 9 AM. That premise is mostly untested in the products people actually ship.

A pilot published in JMIR mHealth and uHealth on May 18, 2026 put a piece of it to the test, and the result is a useful splash of cold water. When the researchers cranked up personalization, nothing measurable improved.

The study

Kunas and colleagues built an app called RELAX, designed around a just-in-time adaptive intervention (JITAI) framework for managing occupational stress. A JITAI is the formal version of the "right intervention, right moment" idea: the system uses real-time signals to decide whether and what to deliver at each decision point, rather than running a fixed schedule. The canonical formulation comes from Nahum-Shani and colleagues, who laid out the design components, tailoring variables, decision points, and intervention options, that a JITAI has to specify.

RELAX combined three inputs: ecological momentary assessments (short in-the-moment questionnaires), continuous heart rate variability from a Polar Verity Sense wearable, and standard stress questionnaires. When it detected a stress moment, it could offer something like a guided breathing exercise.

The trial ran 46 employees through two three-week phases. The second phase is where it gets interesting. The researchers split the group: a high-personalization condition, where the app adapted its intervention probabilities to each individual's feedback, and a low-personalization condition, where it only used aggregate group ratings. Same app, same interventions, different amount of individual tailoring. That's a clean test of whether the personalization itself buys you anything.

Three findings worth your attention

Personalization made no measurable difference. There were no significant differences between the high- and low-personalization groups. The version that adapted to each individual didn't beat the version running on group averages. For a field that treats "more personalized" as self-evidently better, that's a finding to take seriously, even from a small pilot.

I want to be careful here. A 46-person pilot split into two groups is underpowered to detect a small effect, so this is not proof that personalization is worthless. It's evidence that the personalization, as implemented, wasn't strong enough to show up over the noise. That distinction is the whole game. The cost of building per-user adaptation is high. If the version on group averages performs about the same in a pilot, the burden shifts to proving the expensive version earns its keep before you commit engineering to it.

The sensor data fell apart. Subjective questionnaire completion ran 85 to 100 percent across phases. Sensor data completion was 45 to 60 percent. People were willing to answer questions. The wearable, the thing the whole adaptive engine depends on, only produced usable data a little over half the time, which the authors attribute to technical barriers. A JITAI that decides when to intervene based on physiological signals is only as good as the signal. If the heart rate stream is missing 40 to 55 percent of the time, the "adaptive" part is frequently flying blind, falling back to whatever default it uses when the sensor is dark.

One physiological marker moved the wrong way. Self-reported daily chronic stress went down (F₅,₁₆₅ = 2.89; P = .048) and arousal tied to the most recent stress event dropped (F₅,₁₅₀ = 4.52; P = .02). Those are the wins. But the low-frequency to high-frequency HRV ratio went up (F₅,₁₀₉ = 3.91; P = .03), which the authors read as increased physiological stress. So the subjective measures say people felt less stressed while a physiological measure pointed the other way. That mismatch is exactly the kind of thing you'd miss if you only instrumented self-report, and exactly the kind of thing that should make you cautious about declaring a stress app "works."

The engagement angle people will overlook

There's a quieter result that I think is the most encouraging part of the study. The dropout rate was 8.7 percent. Four people left, both in the first two weeks, and the rest finished. For an intensive protocol with daily in-the-moment questionnaires and a wearable to put on, that retention is genuinely good. The digital health "law of attrition," the well-documented pattern where most users abandon an intervention long before completing it, didn't bite hard here.

Usability scores also climbed significantly between the midpoint and the end of the study (F₁,₄₅ = 24.80; P < .001). People got more comfortable with the app over time. But satisfaction drifted down at a trend level (F₁,₄₅ = 7.12; P = .05). So the picture is mixed in an honest way: people stuck around, they found it more usable, and they were slightly less satisfied by the end. That's a real product on real people, not a demo.

How this maps to behavior change theory

In COM-B terms, a JITAI is mostly an Opportunity and Capability play. It tries to catch the user in the physical and emotional moment (Opportunity) when a brief, low-effort intervention (Capability) can land. The theory is sound. The RELAX results expose where the theory meets implementation reality: if your read on the user's moment depends on a sensor that's missing half the time, your Opportunity targeting degrades to guesswork, and the elaborate personalization machinery on top of it has nothing reliable to personalize against.

This is the order-of-operations problem I keep running into with adaptive health products. Teams want to build the smart adaptation layer first because it's the differentiated, fundable part. But adaptation is downstream of measurement. If the measurement is shaky, more sophisticated adaptation just amplifies the noise. The pilot is a small but concrete reminder that the boring infrastructure work, getting the sensor to actually report, comes before the clever part.

What I'd take into a product

Prove the measurement before you build the adaptation. Before any JITAI logic ships, instrument the data pipeline and find out what your real-world signal completion looks like. If a research team with a study protocol and a paid wearable got 45 to 60 percent sensor coverage, your consumer app handing people a cheaper device with no researcher chasing them will do worse. Design the fallback behavior for when the signal is missing, because it will be missing often.

Personalization needs a measured payoff, not a vibe. This pilot didn't show a high-personalization advantage. That doesn't kill personalization, but it does mean you shouldn't assume it. If you're going to build per-user adaptation, ship it behind a comparison against a simpler group-average version and check whether the expensive one actually wins. Often the simple version is most of the value.

Instrument both subjective and objective outcomes, and expect them to disagree. RELAX users felt less stressed while an HRV marker said otherwise. If you measure only the thing that's easy and flattering (self-reported mood), you'll ship a product that feels effective and may not be. The disagreement is information, not an error to hide.

Don't read good retention as proof of efficacy. The 8.7 percent dropout is a real achievement, and it's tempting to treat low churn as evidence the product works. It isn't. People stayed and used it, and the efficacy signal was still mixed. Engagement is necessary for behavior change. It is not the same as behavior change, and conflating the two is how a well-retained app quietly fails to help anyone.

The honest summary

This is a 46-person pilot, underpowered, mixed results, one wrong-direction physiological finding. I'm not going to pretend it settles anything. But pilots like this are where the gap between the adaptive-health pitch and the adaptive-health reality shows up cleanly. The personalization didn't measurably help. The sensor that the whole adaptive premise rests on was missing a lot of the time. The subjective and objective outcomes pointed different directions.

The most useful thing a product team can take from it is an ordering. Get the measurement reliable first. Earn the personalization second. Treat engagement and efficacy as separate questions the whole way through.

Paper reference: Kunas, B., Jung, O., Schranz, C., Schmoigl-Tonis, M., Mehlis, J., & Laireiter, A.-R. (2026). Multimodal Personalized Mobile Health Just-in-Time Adaptive Intervention for Occupational Stress Management: Pilot Study. JMIR mHealth and uHealth. https://doi.org/10.2196/79642 | PMC: PMC13183344

Framework: Nahum-Shani, I., Smith, S. N., Spring, B. J., Collins, L. M., Witkiewitz, K., Tewari, A., & Murphy, S. A. (2018). Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support. Annals of Behavioral Medicine, 52(6), 446–462. https://doi.org/10.1007/s12160-016-9830-8

Framework: Michie, S., van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implementation Science, 6, 42. https://doi.org/10.1186/1748-5908-6-42