Social Norms Messaging in Health Apps: What 85,759 Participants Actually Tell Us

Social comparison is everywhere in health apps. Step counts ranked against friends. Workout streaks displayed on leaderboards. "Most users like you completed 7 days" prompts nudging someone back after a lapse. The assumption baked into all of it: showing people what their peers do changes what they do.

That assumption just took a serious hit.

A pre-registered systematic review and meta-analysis published in Nature Human Behaviour in late 2025 analyzed 89 randomized controlled trials covering 85,759 participants across developed countries. The research team, led by Papakonstantinou and Flecke, set out to evaluate social norms messaging as a health behavior change strategy, pooling studies across alcohol use, physical activity, dietary intake, sexual health, and more.

The raw headline number looked modest but defensible: Cohen's d = 0.10 (95% CI [0.09, 0.19], p < 0.001). Small, but statistically significant. Something there.

Then they corrected for publication bias using robust Bayesian meta-analysis. The effect disappeared.

What social norms messaging actually is

Before unpacking what zero means in practice, it's worth being precise about what's being tested. Social norms messaging works through one of two mechanisms. Descriptive norms tell people what most others do ("8 in 10 adults get less than the recommended amount of sleep"). Injunctive norms tell people what most others approve or disapprove of ("Most people in your area think exercise is important").

Health apps typically deploy these in a few forms: aggregate statistics ("Users like you average 6,200 steps a day"), social leaderboards, community activity feeds, or streak sharing. The 89 studies in this meta-analysis covered all of these delivery modalities, across one-on-one messaging, app-based delivery, and broader public health campaigns.

The moderator analyses found no significant differences across any of them. Message type didn't matter. Delivery channel didn't matter. Target health domain didn't matter. Clinical patients, healthcare professionals, and general population samples all showed the same non-significant pattern once publication bias was accounted for.

Ninety-three studies searched, 89 included. And after adjusting for the studies that didn't get published because they found nothing, the signal vanishes.

Why this matters more than any single RCT

Publication bias is a known, chronic problem in behavioral science. Studies showing that a nudge "worked" get submitted and accepted. Studies showing the nudge did nothing accumulate in file drawers. The 2025 meta-analysis explicitly incorporated grey literature sources, and even so, the correction had to do heavy lifting.

This isn't unique to social norms. The nudge literature broadly has faced similar reckonings. What makes this paper significant is scale. Eighty-nine RCTs, nearly 86,000 participants, pre-registered methodology. This isn't one inconvenient null result. It's the field taking honest stock of itself.

The authors are careful to note they're not saying social influence doesn't matter as a psychological mechanism. They're saying social norms messaging, as typically implemented, doesn't appear to move health behavior in a durable way. That's a distinction worth holding onto.

What this doesn't invalidate

A separate 2025 Nature Human Behaviour meta-analysis examined social comparison specifically as a behavior change technique (SC-BCT), covering 79 RCTs with N = 1,356,521. That study found a small but meaningful positive effect (Hedges' g = 0.17 vs. passive control; g = 0.23 vs. active control) that held up across health, performance, and climate behavior domains.

The difference in findings is instructive. Social comparison is not the same thing as social norms messaging. Showing someone that a peer ran three miles yesterday is different from telling someone that "most people in your area exercise regularly." The first is specific, personalized, and tied to a real relationship. The second is a statistical abstraction.

The social comparison meta-analysis also found that more intervention sessions produced larger effects, and that framing around desired behaviors (what to do) outperformed framing around undesired behaviors (what to avoid). Directionality and repetition matter in a way that broad norm statements don't.

The product design implication

Teams building social features into health products typically justify them on behavior change grounds. The research says that justification needs more precision.

A leaderboard that shows a user their friend's step count, tied to an ongoing relationship and visible repeatedly, sits closer to the social comparison evidence base that does hold up. A one-time prompt that says "most users log their meals daily" sits closer to the social norms messaging literature that doesn't.

A few specific questions worth asking at the product level:

Is the comparison anchored to a real relationship? Generic aggregate statistics strip out the motivational mechanism that makes social comparison work: feeling connected to a specific other person whose behavior feels relevant to your own situation. Peer features that show a friend's actual activity tend to be more motivationally potent than anonymized "users like you" framing.

Is the norm perception accurate? Only ten of the 89 studies in the Papakonstantinou meta-analysis actually measured whether the messaging changed participants' beliefs about what peers do. This is a significant gap. If users already believe most people exercise regularly, a message confirming that changes nothing. Norm-correction, the specific technique of correcting a mistaken belief that bad behavior is more common than it is, may be where social norms messaging has its narrowest but most defensible use case.

How often does the comparison appear? The social comparison research suggests multiple exposures matter. A feature that surfaces peer behavior once during onboarding and never again doesn't accumulate the repetitions needed to drive behavior.

What's the counterfactual? Social features have real costs: engineering time, data privacy considerations, potential for upward comparison triggering negative affect. If the only rationale is behavior change, this evidence base says to think carefully. If the rationale also includes retention, social accountability, and product differentiation, the calculus changes, but that's not the same argument.

What the research doesn't yet answer

The meta-analysis points to several genuine gaps. Most studies don't track whether the normative message actually updated the user's beliefs. Most don't distinguish between descriptive and injunctive norms rigorously enough to test which mechanism does more work. And the outcome measures vary widely enough across studies that pooling them at all involves some judgment calls.

The researchers are explicit that future work should interrogate mechanisms: did the normative message actually update the user's beliefs, and did those updated beliefs mediate any behavior change? Until that chain is tested systematically, the null result is real but incomplete.

For product teams, that uncertainty cuts both ways. It's not a reason to rip out social features. It is a reason to stop treating social norms messaging as a reliable behavior change mechanism and start testing it more rigorously against the specific designs being shipped.

The standard "users like you" prompt may be doing less than anyone assumed. That's worth knowing.

Paper reference: Papakonstantinou, T., Flecke, S. L., et al. (2025). A systematic review and meta-analysis of the effectiveness of social norms messaging approaches for improving health behaviours in developed countries. Nature Human Behaviour, 9, 2632–2650. https://doi.org/10.1038/s41562-025-02275-6

Related: Meta-analysis of randomized controlled trials examining social comparison as a behaviour change technique across the behavioural sciences. Nature Human Behaviour (2025). https://www.nature.com/articles/s41562-025-02209-2