Example 5: Simon's two-stage Phase II design¶
A Phase II oncology trial is testing a new agent. The uninteresting response rate is p₀ = 0.10; the response rate that would warrant further development is p₁ = 0.30. We want type-I error α ≤ 0.10 and type-II error β ≤ 0.10 (i.e. at least 90% power).
Simon's two-stage design minimises the expected sample size under H₀, allowing early stopping for futility after the first stage.
Compute the optimal design¶
from samplesize.tests.phase_ii import simon_optimal_two_stage
result = simon_optimal_two_stage(
p0=0.10,
p1=0.30,
alpha=0.10,
beta=0.10,
)
d = result["design"]
print(f"Stage 1: enrol n1={d['n1']}, stop if ≤ r1={d['r1']} responses")
print(f"Stage 2: enrol to n={d['n']}, reject if ≤ r={d['r']} responses")
print(f"Achieved alpha = {d['alpha_actual']:.4f}")
print(f"Achieved power = {result['achieved_power']:.4f}")
print(f"E[N | H0] = {d['EN_under_h0']:.1f}")
print(f"PET (H0) = {d['PET']:.4f}")
Expected output:
Stage 1: enrol n1=18, stop if ≤ r1=2 responses
Stage 2: enrol to n=26, reject if ≤ r=4 responses
Achieved alpha = 0.0995
Achieved power = 0.9037
E[N | H0] = 20.1
PET (H0) = 0.7338
Interpret the design¶
| Quantity | Value | Meaning |
|---|---|---|
| n1 | 18 | Patients in stage 1 |
| r1 | 2 | Stop for futility if ≤ 2 responses in stage 1 |
| n | 26 | Maximum total patients (both stages) |
| r | 4 | Declare inactive if ≤ 4 total responses |
| E[N | H₀] | 20.1 | Expected enrolment if drug is truly inactive |
| PET | 0.734 | Probability of stopping after stage 1 under H₀ |
A PET of 73% means that in nearly three-quarters of trials where the drug is truly inactive, the study will stop after only 18 patients — the key efficiency gain over a single-stage design.
Inspect the envelope¶
{
"method_id": "simon_optimal_two_stage",
"solve_for": "n",
"n": 26,
"design": {
"r1": 2, "n1": 18,
"r": 4, "n": 26,
"alpha_actual": 0.0995,
"beta_actual": 0.0963,
"EN_under_h0": 20.1,
"PET": 0.7338,
},
"achieved_power": 0.9037,
"inputs_echo": {"p0": 0.1, "p1": 0.3, "alpha": 0.1, "beta": 0.1, ...},
"citations": ["Simon, R. (1989)..."],
}
Sensitivity table¶
How the design changes as the target response rate p₁ varies:
for p1 in (0.20, 0.25, 0.30, 0.35, 0.40):
r = simon_optimal_two_stage(p0=0.10, p1=p1, alpha=0.10, beta=0.10)
d = r["design"]
print(
f"p1={p1:.2f} → n1={d['n1']:3d}, n={d['n']:3d}, "
f"r1={d['r1']}, r={d['r']}"
)
p1=0.20 → n1= 49, n= 90, r1=5, r=12
p1=0.25 → n1= 21, n= 50, r1=2, r= 7
p1=0.30 → n1= 18, n= 26, r1=2, r= 4
p1=0.35 → n1= 11, n= 19, r1=1, r= 3
p1=0.40 → n1= 5, n= 18, r1=0, r= 3
Detecting only a doubling of response rate (p₁ = 0.20) requires 90 patients versus 26 for a tripling (p₁ = 0.30) — a common argument for selecting agents with larger expected signals for Phase II evaluation.
Minimax alternative¶
simon_minimax_two_stage minimises the maximum sample size instead of
the expected sample size. For the same design parameters:
from samplesize.tests.phase_ii import simon_minimax_two_stage
r = simon_minimax_two_stage(p0=0.10, p1=0.30, alpha=0.10, beta=0.10)
d = r["design"]
print(f"Minimax: n1={d['n1']}, n={d['n']}, r1={d['r1']}, r={d['r']}")
print(f"E[N | H0] = {d['EN_under_h0']:.1f}, PET = {d['PET']:.4f}")
The minimax design saves one patient at maximum (25 vs 26) but has a lower PET (0.51 vs 0.73), so it is less efficient at early stopping. The choice between optimal and minimax depends on whether the trial prioritises expected or worst-case enrolment.
Audit record¶
Every call writes a JSON audit record to .samplesize/<timestamp>.json
containing inputs, outputs, library versions, and the method citation —
ready to attach to a study protocol or IRB submission.