What is the minimum sample size for a valid Z-test for proportions?

The large-sample condition requires np0 >= 5 and n(1-p0) >= 5 for a one-sample test. For two-sample, all four expected counts using the pooled proportion must be at least 5. If not met, use Fisher's Exact Test or the exact binomial test.

Z-Test for Proportions Calculator & Masterclass

Q: What is a Z-test for proportions?

A Z-test for proportions is a statistical hypothesis test that determines whether an observed sample proportion differs significantly from a hypothesized value (one-sample) or whether two independent sample proportions differ from each other (two-sample). It relies on the normal approximation to the binomial distribution when sample sizes are sufficiently large (np0 >= 5 and n(1-p0) >= 5).

Q: When should I use a Z-test instead of a chi-square test?

For a 2x1 comparison of a proportion against a known value, use the Z-test. For a 2x2 contingency table, Z-squared is mathematically equivalent to chi-square with 1 degree of freedom. The Z-test is preferred when directionality matters because it supports one-tailed tests, whereas chi-square is inherently two-tailed.

What Is a Z-Test for Proportions?

A proportion is the fraction of a group possessing a specific attribute — the recovery rate among patients, the pass rate of candidates, or the conversion rate of web visitors. The Z-test for proportions is a parametric hypothesis test that determines whether an observed proportion differs meaningfully from a theoretical value (one-sample), or whether two independently observed proportions differ from each other (two-sample).

The test is grounded in the Central Limit Theorem: for sufficiently large samples, the sampling distribution of a proportion is approximately normal with mean p and standard deviation √(p(1−p)/n). This justifies using the standard normal (Z) distribution as the reference distribution for inference.

Practical Example — Quality Control A pharmaceutical company claims its defect rate is no more than 3%. An auditor samples 400 units and finds 18 defective (4.5%). Is this significantly higher? A right-tailed one-sample Z-test, set at α = 0.05, answers this with a defined error rate.

Two Variants of the Test

One-Sample Z-Test

Tests whether your observed sample proportion p̂ differs from a known or hypothesized benchmark value p₀. You have one group and one reference value.

Example: Does our school's 68% graduation rate differ significantly from the national rate of 62%?

Two-Sample Z-Test

Tests whether two independent groups differ in proportion. Uses a pooled proportion p̂c as the shared estimate of variance under H₀. Groups must be fully independent.

Example: Does a treatment group (42% recovered) differ significantly from a control group (31% recovered)?

The Formulas — Fully Explained

One-Sample

Given a random sample of n observations where x have the characteristic of interest:

\( \hat{p} = \dfrac{x}{n} \qquad\qquad z = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}} \)

The denominator \(\sqrt{p_0(1-p_0)/n}\) is the standard error under the null hypothesis. It uses p₀, not p̂, because under H₀ we assume the true proportion equals p₀. Using the observed p̂ in the denominator would be circular and statistically incorrect.

Two-Sample

\( \hat{p}_c = \dfrac{x_1 + x_2}{n_1 + n_2} \qquad\qquad z = \dfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}_c(1-\hat{p}_c)\!\left(\dfrac{1}{n_1}+\dfrac{1}{n_2}\right)}} \)

The pooled proportion p̂c is the combined estimate assuming H₀: p₁ = p₂ is true. Using pooled variance maximises power and gives a properly calibrated standard error. The confidence interval for the difference, however, uses unpooled variance (no H₀ assumption needed for estimation).

Confidence Interval

\( \hat{p} \pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}(1-\hat{p})}{n}} \qquad\quad \text{(one-sample)} \)

\( (\hat{p}_1 - \hat{p}_2) \pm z_{\alpha/2}\sqrt{\dfrac{\hat{p}_1(1-\hat{p}_1)}{n_1}+\dfrac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \qquad\quad \text{(two-sample)} \)

Effect Size — Cohen's h

\( h = 2\arcsin\!\bigl(\sqrt{\hat{p}}\bigr) - 2\arcsin\!\bigl(\sqrt{p_0}\bigr) \)

Cohen's h uses the arcsine transformation, which stabilises the variance of a proportion across the entire [0,1] range. This makes the magnitude of h interpretable independently of where on the scale the proportions fall — a crucial advantage over raw differences which are misleading near 0 or 1.

Hypothesis Types

Two-Tailed

H₁: p ≠ p₀ — Tests for any difference regardless of direction. Most conservative choice. Required when no directional expectation exists prior to data collection.

Right-Tailed

H₁: p > p₀ — Predicts the true proportion is higher than the null value. More statistically powerful than two-tailed when the correct direction is pre-specified.

Left-Tailed

H₁: p < p₀ — Predicts the true proportion is lower than the null value. Reject when Z < −z*. Must be specified before observing results.

Pre-specification requirement The tail direction must be determined before data collection, based on theory or prior evidence — not after observing the results. Choosing the tail after seeing data (p-hacking) inflates the true Type I error rate to approximately 10% when using a nominal α = 0.05 test.

Significance Level (α) and Error Types

α = 0.10 — Exploratory

90% confidence. Use for early-stage research, product ideation, or situations where missing a real effect (Type II error) is more costly than a false positive.

α = 0.05 — Standard

95% confidence. The conventional gold standard for most published research. Provides a reasonable balance between Type I and Type II error risks.

α = 0.01 — Confirmatory

99% confidence. Required for clinical trials, regulatory submissions, or any high-stakes decision where a false positive could cause harm or major financial consequences.

Critical Assumptions

Four conditions that must be verified before running the test

Random sampling. Observations are drawn by a random mechanism. Convenience samples invalidate population inference.
Independence of observations. Each individual's outcome is unrelated to others'. For two-sample tests, the groups must also be independent of each other.
Large-sample normality. One-sample: np₀ ≥ 5 and n(1−p₀) ≥ 5. Two-sample: n₁p̂c ≥ 5, n₁(1−p̂c) ≥ 5, n₂p̂c ≥ 5, n₂(1−p̂c) ≥ 5.
10% condition. Each sample must be less than 10% of its population to ensure near-independence when sampling without replacement.

When assumptions are not met If large-sample conditions fail, use the exact binomial test (one-sample) or Fisher's Exact Test (two-sample). These methods require no normal approximation and remain valid for small n. The Z-test p-values become unreliable when expected counts fall below 5.

Interpreting the p-Value Correctly

The p-value is the probability of observing a test statistic at least as extreme as the one obtained, assuming H₀ is true. It is not the probability that H₀ is true, nor the probability that the finding occurred by chance. A p-value below α means the data is inconsistent with H₀ — it justifies rejecting H₀, but does not prove H₁, and does not quantify practical importance. Always report effect size and confidence intervals alongside the p-value.

Frequently Asked Questions

When should I use a Z-test instead of a chi-square test?

For testing one proportion against a known value, use the Z-test. For a 2×2 contingency table, Z² is mathematically identical to the chi-square statistic with 1 degree of freedom. The Z-test is preferred when direction matters, since it supports one-tailed tests while chi-square does not.

What is the minimum sample size needed?

Derived from the large-sample condition: n ≥ 5 / min(p₀, 1−p₀). For p₀ = 0.05, you need n ≥ 100. For two-sample tests, both groups must independently satisfy the condition using the pooled proportion as the estimate.

Can I compare two dependent (paired) proportions?

No. The two-sample Z-test requires independent groups. For paired proportions — such as before-and-after measures on the same individuals — use McNemar's test instead.

Z-Test for Proportions — Masterclass & Calculator