What a 5,419-person trial just told us about which BCT actually moves behavior

A factorial trial I've been waiting on for a while finally published on April 2. It's called Coach, and the team ran 5,419 people through a digital multiple health behavior change intervention that turned six distinct components on and off in a proper factorial design. So you end up with something most digital health research can't give you: a clean read on which active ingredient is actually doing the work, separate from the others.

I've read a lot of digital health RCTs and most of them test a whole app against a waitlist. Which tells you nothing about why it worked, if it worked. This one's different. The authors (Crawford, Åsberg, Blomqvist and colleagues, DOI 10.2196/88881) isolated six components and measured their independent and interactive effects on four different behaviors. That's a factorial, and factorials are how you actually learn anything from a trial.

The answer is a little boring, which is part of why I want to talk about it.

The component that kept winning

The six components in Coach were: screening and feedback, goal-setting and planning, motivation enhancement, skills and know-how, mindfulness, and self-authored SMS reminders. In BCT taxonomy terms, they roughly map to self-monitoring (1.6, 2.2, 2.3), action planning (1.1, 1.4, 7.1), self-efficacy work (5.1, 9.1), behavioral instruction (4.1, 8.2), stress awareness, and personalized reminders.

The one that moved the needle most consistently was screening and feedback. Not goal-setting. Not motivation. Not mindfulness. The component where you ask the person about their current behavior in a structured way and then reflect what they told you back to them.

Specifics: at two months, screening and feedback increased daily fruit and vegetable portions by 0.17 (posterior probability over 99.9%), and at four months it was 0.13 portions with 99.9% probability. It also reduced heavy episodic drinking at four months. Across the four behaviors and two time points, this was the component that kept showing up.

Motivation enhancement was the effective component for smoking cessation. 2.4x increased odds of quitting at four months. Goal-setting helped for diet, particularly for candy and snack reduction, and especially when paired with screening and feedback. The combination of screening/feedback plus motivation bumped fruit and vegetable intake by 0.20 portions at two months.

Mindfulness cut smoking cessation odds in half at two months (0.35x), though the effect faded by four months. I don't want to over-read a single finding, but I will note that in a field where mindfulness-based components get slotted into apps by default, this is worth paying attention to.

Why the boring answer is the interesting answer

If you've read any of the umbrella review work on effective BCTs in digital health, self-monitoring and feedback come up near the top of pretty much every table. This has been true for years. Coach is one more data point in a very consistent pile.

So why do I keep looking at product briefs where the first-designed feature is a goal-setting flow, with self-monitoring added as a nice-to-have in a later release?

I think it's because self-monitoring and feedback don't feel like behavior change. They feel like UI. You're asking the user to tell you what they ate, what they drank, how they slept, and then you're showing it back to them in a nicer layout. There's no dopamine moment to design, no badge to reward, no coaching conversation to scope. It reads like a data collection chore, so it ends up in the retention-curve-fix bucket of the backlog.

Goal-setting, by contrast, feels like the real thing. It has narrative. You sit down, you decide you're going to walk 10,000 steps, and you've Done A Behavior Change Thing. That's the intuition. The evidence says it's the wrong intuition. Goal-setting works best as a supporting component, not a leading one, and its effect is mostly mediated through what happens next (self-monitoring against the goal, feedback on the gap).

The Coach paper won't settle this argument on its own. But combined with the existing literature it's a fairly specific prescription: if you can only build one thing for a health behavior app, build the screening and feedback loop. Make it habitable. Make it read accurately. And resist the urge to cover it up with a coaching persona.

What this means if you're designing the thing

A few concrete things I'd pull from the Coach results into a product build.

First, the screening you do at onboarding isn't onboarding. It's the intervention. Teams usually treat behavioral screening as a setup step, rushing through it so the user can get to the "main product." If the trial evidence is right, the main product for most health apps is a well-designed screening and feedback loop, and the rest of the features are scaffolding. That reframes how long you can take, how much structure you can use, and how often you can re-run it.

Second, feedback is not a dashboard. I keep seeing feedback components that are just a graph of the last 30 days. A graph is a display; it's not feedback. Feedback in the BCT sense is: here is what you did, here is what your goal or norm is, here is the delta, here is what that delta means for the thing you care about. The Coach component did this across alcohol, diet, activity, and smoking. It's specific, it's interpretable, and it's personalized. Most app dashboards aren't any of those things.

Third, don't assume your multi-component thing is additive. One of the quieter findings in Coach is that some component combinations made things worse. Goal-setting combined with motivation, mindfulness, or SMS messaging actually increased heavy episodic drinking frequency. The authors are careful not to over-interpret, but the mechanism isn't hard to imagine: if you ask someone to set a drinking goal, send them motivational content, and then ping them about it, you may be raising the salience of drinking at times when they weren't thinking about it. That's the opposite of what you want. More components isn't always better, and the "throw in every BCT we can find" product instinct needs to be checked against behavior-specific evidence.

Fourth, pay attention to the behavior. Coach tested four behaviors and the best-performing component varied by behavior. Motivation worked for smoking. Screening and feedback worked most broadly but was especially clean for diet. This tracks with the BCT literature generally: different behaviors have different active ingredients, and research syntheses in digital health have been saying this for a while. The product implication is that a general-purpose "behavior change engine" is probably a myth, and the team has to pick the behavior and then pick the component set to match.

The part most teams will miss

If I had to bet on what most product teams will take from a paper like Coach, it's the wrong thing. They'll see "screening and feedback works" and add another chart to their dashboard. They'll see "motivation helps smoking" and buy a motivational content library. They'll see "mindfulness hurt smoking at two months" and quietly remove the meditation feature from their Q2 roadmap.

The thing they'll miss is the factorial design itself. This is how you figure out what's doing the work in a multi-component product. You build it so you can turn components on and off. You measure per-component effects. You notice when two components interact badly. Without that, you're just shipping a pile of BCTs and hoping the stack helps on average.

I think this is the real lesson from Coach. Not the specific winners, though those are interesting. The method. Digital health teams don't run factorials. They run versus-control trials, they find a small effect, they ship, and they can't tell you which part of their product caused it. The field deserves better. The ENGAGE framework has been pointing at this since 2017: mechanism of action matters, and you can't improve what you can't attribute.

If you're building a digital health product in 2026 and you can't explain, with evidence, which two or three BCTs are doing the work, you don't have a behavior change product. You have a wrapper around a hunch. That used to be the state of the art. It isn't anymore.

What a 5,419-person trial just told us about which BCT actually moves behavior

The component that kept winning

Why the boring answer is the interesting answer

What this means if you're designing the thing

The part most teams will miss

More writing