Why is 25 the minimum number of subgroups for SPC and CPK?

Mathematically, 25 subgroups are required for the degrees of freedom to converge, ensuring standard deviation and control limits are accurately estimated. Practically, the time it takes to collect 25 subgroups covers enough normal production variations (like temperature changes or shift handovers) to provide a true picture of process stability.

Why shouldn't I use 500 samples on a single SPC control chart?

Plotting 500 samples often spans too long of a timeframe (e.g., several months). If a point from two months ago shows an out-of-control signal, the original 'crime scene' is gone. The operators, tooling, and materials will have changed, making accurate root cause analysis impossible.

Why Does SPC Sampling Insist on Exactly 25 Subgroups? | BLOG

In Statistical Process Control (SPC), using 25 subgroups is the optimal balance between statistical accuracy and engineering practicality. Mathematically, 25 subgroups provide enough degrees of freedom to accurately estimate standard deviation and control limits, minimizing false alarms. Practically, it spans a sufficient time frame to capture routine "common cause variations" without extending so long that root cause analysis (tracing the 5Ms) becomes an impossible "cold case."

In daily manufacturing quality management, whether you are conducting Process Capability Analysis (CPK) or creating SPC control charts, the default software templates always ask for 25 subgroups of samples. Even the basic templates you find online are rigidly set up for 25 subgroups.

Diligent professionals might naturally ask themselves: Why does it have to be exactly 25? Is 24 not enough? Or wouldn't collecting 500 samples be much more accurate? In reality, this "25" is not a random number pulled out of thin air; it hides profound engineering and statistical logic. If you are deploying Web-based SPC software across your enterprise, understanding this fundamental logic is crucial. Today, we will unlock this quality code in the most straightforward way possible.

25 Subgroups: The Baseline for "Seeing the Truth"

Many people think calculating CPK is just plugging data into a formula and getting a result. In fact, before calculating CPK, we must figure out one critical thing: Over time, just how stable is this production process?

In real-world manufacturing, product dimensions or characteristics are never perfectly fixed. With every sampling subgroup, we get a "mean" (indicating if the overall level is off-center) and a "discrete statistic" (like range or standard deviation, indicating how concentrated the products are).

Once you collect 25 subgroups of samples, you accumulate 25 means and 25 discrete statistics. The trajectory of these 25 data groups over time (plotted on a line chart) forms the foundation for observing the "true behavior" of the process.

In long-term statistical practice, 25 is widely recognized as the empirical lower limit. If you have fewer subgroups—say, only 5 or 10—the variation you see is likely just an accidental "illusion." Any judgments made based on that will be highly unreliable. The number 25 represents the optimal balance between statistical precision and engineering practicality:

Mathematically: 25 subgroups serve as the critical threshold for degrees of freedom to converge. This ensures that the calculated standard deviation and control limits are precise enough, thereby minimizing the probability of "false alarms" (Type I errors) and "missed alarms" (Type II errors) on your control charts.
On the Engineering Floor: The time span required to collect 25 subgroups is usually just long enough to capture real "common cause variations" on the shop floor—such as day/night temperature shifts, operator shift changes, or minor equipment wear. This avoids the "perfect illusion" caused by intensive sampling over a very short period, ensuring that the calculated CPK and control charts reflect the true dynamic picture of the production line.

Therefore, the requirement of at least 25 samples is designed to give you enough data to truly understand how your production line performs over the long haul.

If More is Better, Can I Use 500 Samples for a Control Chart?

Since 25 is only the lower limit, perfectionists might ask: "If I grab 500 samples at once to create an SPC control chart, won't the massive data volume make the analysis more accurate?"

This touches upon a highly critical aspect of quality management: Anomaly Traceability (Root Cause Analysis).

The ultimate goal of an SPC control chart is to enable you to rush to the shop floor and "catch the culprit red-handed" the moment an abnormal variation is detected. If you densely pack 500 points onto a single control chart, the data might span across several months of production.

When you stare at the chart and discover that a specific point went "out of control" two months ago, you might eagerly run to the workshop to investigate, only to face a frustrating reality:

Man (Personnel): The day-shift veteran from back then has long been replaced by a night-shift rookie.
Machine: The equipment's tooling and fixture parameters have already been adjusted multiple times since then.
Material: That specific batch of raw materials was used up ages ago.
Method/Environment: The temperature, humidity, and operating techniques from that day have all become untraceable cold cases.

The shop floor is like a crime scene that has been completely destroyed; finding the true root cause becomes virtually impossible.

The Perfect Balance Between Statistics and Engineering

The rule of 25 subgroups is a delicate compromise between mathematical statistics and practical quality management.

It ensures that our judgment of the process distribution isn't distorted by too little data, while also guaranteeing that when the system flags an anomaly, the "crime scene" hasn't gone cold. We can still follow the clues and promptly investigate the truth behind the Man, Machine, Material, Method, and Environment (5M1E).

SPC (Statistical Process Control) is never about doing statistics for the sake of statistics; it is about discovering and solving problems. Relying on too much or too little data will both lead your quality initiatives to a dead end.