What the Coach factorial trial actually tells us about which BCTs do work

In the two years I've been writing about behavior change for digital products, the most common question I get is some version of "OK but which behavior change techniques actually work, can you give me a list."

I always wince at the question. Not because it's a bad one. Because the honest answer is "it depends on the outcome you're trying to move, and the population, and what other components you bolt onto it, and what backfires when you combine two." Which is not a list. Which is annoying for the asker.

A trial published on April 2, 2026 in the Journal of Medical Internet Research finally gives me something better than a wince. The Coach trial is a 5,419-person factorial RCT, and it tested six digital behavior-change components in every possible combination across five different health behaviors, and it published the component-by-component results. Read the trial here.

I want to walk through what it found, because I think a few of the findings are not what most product teams expect.

The setup, briefly

The Coach intervention is a Dutch digital tool for adults seeking help online with multiple health behaviors at once. The trial randomized 5,419 participants. Roughly 3,371 made it to the 2-month follow-up and 2,786 to the 4-month follow-up.

The team tested six components. Every participant got some subset, the subset was randomized factorially, and the team measured outcomes across alcohol, fruit and vegetable intake, physical activity, smoking, and heavy episodic drinking.

The six components, in their words:

Screening with personalized feedback. Self-monitoring of current behavior, then graphs showing where the user is relative to a guideline.
Goal-setting and action planning. Set a goal, plan when and where you'll do the thing, plan how you'll cope when something gets in the way.
Motivation to change. Information about health consequences, plus self-efficacy work.
Skills and know-how. Practical tips for the behavior.
Mindfulness. Guided meditations for stress reduction.
Self-authored SMS reminders. The user writes their own reminder texts.

This maps cleanly onto the Behavior Change Technique Taxonomy v1, the 93-BCT taxonomy from Susan Michie's group that most of digital health uses as its shared vocabulary.

What worked

Screening with personalized feedback was the strongest single component. It was the most consistent driver of reduced alcohol consumption, and it improved fruit and vegetable intake. This is BCTs 2.2 (feedback on behaviour) and 2.3 (self-monitoring of behaviour) in Michie's taxonomy, and it's the workhorse of nearly every digital behavior change trial that ever shipped a positive result.

Motivation to change was the strongest driver of smoking abstinence. Information about consequences plus self-efficacy work. BCTs 5.1 and 15.3 if you care about the codes.

The biggest single finding, honestly, is that certain combinations amplified each other in ways that single components didn't. Screening-plus-motivation outperformed either component on its own for fruit and vegetable consumption. The whole was greater than the sum.

This is the part that product teams routinely miss. You can ship a great self-monitoring feature, see modest results, and conclude that the technique is weak. The Coach trial says the technique is fine, you just shipped it without the partner component that activates it.

What didn't work the way the demo suggested

Goal-setting on its own was not the standout. This is genuinely surprising if you've read the planning and implementation intentions literature, where if-then plans show medium effect sizes across a stack of meta-analyses. Meta-analytically it should have been a top performer. Empirically, in this factorial test, it wasn't.

I think the reason is the implementation. Coach's goal-setting was guided but largely user-defined. The literature suggests guided with cues beats guided without cues, and that user-defined plans without good prompts about salient situational cues underperform. So the BCT was there, but the most active ingredient (specific situational cues) was thin.

Mindfulness barely moved any primary outcome on its own. Self-authored SMS reminders were a mixed bag. Skills/know-how was a small contributor when paired with motivation.

What backfired

The trial reports that some component combinations produced harmful effects. Specifically, certain pairings increased heavy episodic drinking instead of reducing it. The paper is careful about which combinations, and I'm not going to single one out without the full paper open in front of me, but the headline matters by itself.

A behavior change intervention can make the behavior worse. Not just fail to improve it. Make it worse. The mechanism the field usually invokes is that increased awareness without sufficient self-regulation can drive consumption (the "I tracked it, so now I notice when I'm under" effect). It's a real phenomenon. It shows up in published trials. It is almost never accounted for in the product roadmap of a digital health tool that ships an "alcohol awareness" module.

If you're a product manager at a digital health company, this is the finding I want you to sit with for a minute. The thing you're shipping is an active intervention. It can do harm. The same way a drug can do harm. The same way a surgical device can do harm. The fact that yours is a screen instead of a scalpel doesn't change the underlying mechanism question.

What this means if you're shipping software

A few takeaways I think are durable, not just specific to the Coach trial.

Screening with feedback is the cheapest and strongest thing you can ship. If you have to pick one BCT to start, this is it. Get the user to log the behavior, show them where they sit relative to a benchmark, watch what happens. Honestly, most digital health products that "work" work because of this and not the cleverer stuff sitting on top of it.

Combinations matter as much as components. The single-feature A/B test is the wrong unit of analysis for a behavior change product. You're not testing "does goal-setting work." You're testing "does goal-setting work when paired with the four other things in our app." The factorial design is the only honest answer, and almost no product team runs one.

Action planning needs cues to work. If your "goal-setting" feature lets the user type "I will exercise more" and then nudges them weekly, the literature says you've shipped a placebo. Add specific situational cues ("when X happens, I will do Y"). Then the BCT is doing what the BCT does in the lab.

Track for harm, not just for benefit. If your dashboard only measures "did the behavior improve," you'll miss the people for whom it got worse. Build the comparison. Look for the tail.

Mindfulness is a popular feature that doesn't pull a lot of weight on its own. I'm sorry. I didn't write the trial. If you're shipping mindfulness as the primary intervention for a non-mental-health outcome, the Coach data does not support you. It might still be a worthy adjunct. It is not a workhorse.

The deeper point

The Coach trial is the rare digital health study where the design lets you actually compare components head-to-head in the same population at the same time. Most of what we have is component-X-vs-control trials, run by different research groups, in different populations, with different operationalizations. You can metaanalyze them, and people do, but the noise is huge.

A factorial design is expensive. 5,419 participants. Five years from protocol to publication, probably. Most product teams will never run one. But you can read this one, and you can update on it, and you can stop assuming that all 93 BCTs are equally well-supported in your context.

The list of what works is short. Screening and feedback. Motivation in the right pairings. Maybe action planning if you do it well. The rest of the taxonomy is real, but the evidence for it varies a lot.

If a vendor walks into your office with a slide deck claiming they "use 14 evidence-based BCTs," you're allowed to ask which trial those 14 came from, what the effect sizes were, and what the harms looked like. If they don't know, you've learned something useful about the vendor.

I think this trial is the most important behavior change paper to land in 2026 so far. If you build digital health products, please go read it directly. Don't take my summary as a substitute. The supplement, in particular, has the component-by-component effect estimates I had to compress here, and you'll want them.

If you work on the Coach team, or if you've run a factorial trial and think I'm getting something wrong about how to read this one, please let me know. I will update the piece.