The 4-Experiments/Month Planner

§ 01 — Context

The sweet spot is four.

Teams running fewer than 4 experiments per month learn too slowly to compete. Teams running more than 8 burn out and ship sloppy tests. After managing 2,000+ experiments across our portfolio, we've found the sweet spot: 4 experiments monthly, in 2-week cycles. Here's the exact playbook.

§ 02 — The Numbers

What 4-per-month produces.

Stat-sig win rate31%↑

of experiments produced a stat-sig winner among teams running the 4/month cadence — vs. 12% for ad-hoc teams.

Annual learning rate48

documented learnings per year, including losses. Compounds across cycles.

Burnout signal0

burnout flags on engineer satisfaction surveys at the 4/month cadence. Eight per month produced 14% drop in satisfaction.

§ 03 — The 2-Week Cycle

A repeatable cycle.

Mon W1 · Hypothesis review

Score the candidate hypotheses from the backlog. Pick the top two for this cycle. Each gets a one-page brief — problem, hypothesis, primary metric, MDE, sample size.

Tue–Wed W1 · Build & instrument

Two days max. If it takes longer, scope was wrong — go back to the brief and trim.

Thu W1 → Wed W2 · Run

7-day run window minimum. Don't peek before stat-sig is reached. Pre-commit to the analysis plan.

Thu W2 · Read the result

Stat-sig win, stat-sig loss, or no signal. Document all three outcomes equally.

Fri W2 · Decision + cleanup

Roll out winners, kill losers, archive the learning. Cleanup includes removing the experiment flag.

§ 04 — Hypothesis Quality

Most experiments fail at the brief.

TBL · 01 · HYPOTHESIS QUALITY MARKERS · WIN RATE BY MARKERN = 2,000+ EXPERIMENTS

Marker	Win Rate	Operational Tell
Has a primary metric	+22 pp	Pick one. Tie-breaks come second.
Has a minimum detectable effect (MDE)	+18 pp	Forces sample-size honesty
Has a stop rule	+14 pp	When do we kill it before stat-sig?
Has a roll-back plan	+11 pp	Mostly mechanical with feature flags
Pre-committed analysis plan	+9 pp	Prevents post-hoc storytelling

§ 05 — Do / Don't

What separates healthy programs.

✓DO

Run 4 experiments per month, in 2-week cycles
Pre-commit to MDE, sample size, and analysis plan
Document all three outcomes (win / loss / no signal) equally
Tie experiments to existing flag infrastructure
Quarterly retrospective on the experiment program itself

✗DON'T

Run more than 4 in parallel — interaction effects swamp results
Peek at results before stat-sig
Tell post-hoc stories about marginal effects
Roll out a winner without removing the experiment flag
Ship 'wait and see' experiments without a stop rule

§ 06 — Final Checklist

Before you start the cycle.

▸ EXPERIMENT · CHECKLIST0/8 COMPLETE

Hypothesis backlog scored and ranked
Top 2 hypotheses have one-page briefs
Primary metric + MDE + sample size pre-committed
Stop rule documented
Flag and instrumentation in place before launch
Analysis plan committed to repo
Win / loss / no-signal templates ready
Quarterly retrospective scheduled

Want this applied to your build?

Open Experiment Planner

What 4-per-month produces.

Stat-sig win rate31%↑

of experiments produced a stat-sig winner among teams running the 4/month cadence — vs. 12% for ad-hoc teams.

Annual learning rate48

documented learnings per year, including losses. Compounds across cycles.

Burnout signal0

burnout flags on engineer satisfaction surveys at the 4/month cadence. Eight per month produced 14% drop in satisfaction.

A repeatable cycle.

Mon W1 · Hypothesis review

Score the candidate hypotheses from the backlog. Pick the top two for this cycle. Each gets a one-page brief — problem, hypothesis, primary metric, MDE, sample size.

Tue–Wed W1 · Build & instrument

Two days max. If it takes longer, scope was wrong — go back to the brief and trim.

Thu W1 → Wed W2 · Run

7-day run window minimum. Don't peek before stat-sig is reached. Pre-commit to the analysis plan.

Thu W2 · Read the result

Stat-sig win, stat-sig loss, or no signal. Document all three outcomes equally.

Fri W2 · Decision + cleanup

Roll out winners, kill losers, archive the learning. Cleanup includes removing the experiment flag.

Most experiments fail at the brief.

TBL · 01 · HYPOTHESIS QUALITY MARKERS · WIN RATE BY MARKERN = 2,000+ EXPERIMENTS

Marker	Win Rate	Operational Tell
Has a primary metric	+22 pp	Pick one. Tie-breaks come second.
Has a minimum detectable effect (MDE)	+18 pp	Forces sample-size honesty
Has a stop rule	+14 pp	When do we kill it before stat-sig?
Has a roll-back plan	+11 pp	Mostly mechanical with feature flags
Pre-committed analysis plan	+9 pp	Prevents post-hoc storytelling

What separates healthy programs.

✓DO

Run 4 experiments per month, in 2-week cycles
Pre-commit to MDE, sample size, and analysis plan
Document all three outcomes (win / loss / no signal) equally
Tie experiments to existing flag infrastructure
Quarterly retrospective on the experiment program itself

✗DON'T

Run more than 4 in parallel — interaction effects swamp results
Peek at results before stat-sig
Tell post-hoc stories about marginal effects
Roll out a winner without removing the experiment flag
Ship 'wait and see' experiments without a stop rule

Before you start the cycle.

▸ EXPERIMENT · CHECKLIST0/8 COMPLETE

Hypothesis backlog scored and ranked
Top 2 hypotheses have one-page briefs
Primary metric + MDE + sample size pre-committed
Stop rule documented
Flag and instrumentation in place before launch
Analysis plan committed to repo
Win / loss / no-signal templates ready
Quarterly retrospective scheduled

The 4-Experiments/Month Planner

The sweet spot is four.

What 4-per-month produces.

A repeatable cycle.

Mon W1 · Hypothesis review

Tue–Wed W1 · Build & instrument

Thu W1 → Wed W2 · Run

Thu W2 · Read the result

Fri W2 · Decision + cleanup

Most experiments fail at the brief.

What separates healthy programs.

✓DO

✗DON'T

Before you start the cycle.

Want this applied to your build?

▸ KEEP · READING

Feature Flag Management: The Complete Playbook

The Conversion Microcopy Playbook

Compare Your Process to the Drexus Way

Strategic insights, weekly.

The 4-Experiments/Month Planner

The sweet spot is four.

What 4-per-month produces.

A repeatable cycle.

Mon W1 · Hypothesis review

Tue–Wed W1 · Build & instrument

Thu W1 → Wed W2 · Run

Thu W2 · Read the result

Fri W2 · Decision + cleanup

Most experiments fail at the brief.

What separates healthy programs.

✓DO

✗DON'T

Before you start the cycle.

Want this applied to your build?

▸ KEEP · READING

Feature Flag Management: The Complete Playbook

The Conversion Microcopy Playbook

Compare Your Process to the Drexus Way