Decouple deploy from release.
Feature flags are the difference between shipping weekly and shipping daily. After implementing feature flags across 50+ production systems, we've refined a complete framework that reduces deployment risk by 80% while enabling rapid experimentation.
median deploy frequency among teams using flags as the default release unit.
fewer rollbacks because flags isolate releases from deploys.
elite teams release behind a flag in the same deploy. Median teams release 3-7 days later.
Four types, four lifecycles.
| Flag Type | Purpose | Typical Lifespan | Risk if Stuck |
|---|---|---|---|
| Release | Hide unfinished work in trunk | Days – weeks | Low — but tech debt accumulates |
| Experiment | A/B test or holdout group | Weeks – months | Medium — analytics drift on stale flags |
| Operational | Kill-switch / circuit-break | Permanent | Low — but mark explicitly |
| Permission | Plan tier / role gating | Permanent | Low — but document ownership |
A five-stage rollout.
Internal-only (employees)
Flag is on for staff. Validates production behavior in a safe blast radius.
Canary (1–5% of users)
Sample real traffic. Tag the flag in observability so error rates can be sliced by cohort.
Progressive (25% → 50%)
Increment by no more than 25% per cycle. Watch error rate, latency P95, business metrics.
Full release (100%)
Flag remains in code as a kill-switch for one full release cycle, then is removed.
Cleanup
Remove the flag from code, the dashboard, and the feature flag platform. Stale flags are silent debt.
Patterns that scale.
✓DO
- Use flags as the default — every change behind a flag
- Tag every flag with owner + sunset date
- Tie flags to observability (error rates by cohort)
- Test the rollback path before the rollout
- Audit and remove stale flags weekly
✗DON'T
- Add a flag without a sunset date
- Use one giant flag for an entire release
- Couple flag values to caches without invalidation
- Skip the kill-switch test 'because it'll never fire'
- Treat experiment flags like permanent permissions
The risks no one warns you about.
| Risk | Frequency | Mitigation |
|---|---|---|
| Stale flags causing dead code | 67% | Sunset-date enforcement + weekly audit |
| Flag dashboard drift | 44% | Source-of-truth = code, dashboard reflects |
| Cached flag values | 38% | TTL ≤ 30s on flag fetches |
| Flag value mid-request | 29% | Read-once at request start; constant for the request |
| Permission flag misuse | 21% | Move plan-gating to a separate system, not flags |
Before you adopt flags org-wide.
- Flag platform chosen + integrated
- Flag types documented (release / experiment / ops / permission)
- Owner + sunset date required for new flags
- CI lints prevent merging flag-less changes
- Observability slices by flag cohort
- Weekly stale-flag audit on the calendar
- Kill-switch test runs in canary deploys
- Permission gating moved off feature flags