Core judgment criteria, common misunderstandings and key points of process management for GRR measurement system analysis

GRR Measurement System Analysis: Judgment Criteria and Common Mistakes

I. Core judgment criteria for GRR

　　The essence of GRR (Measurement System Analysis) is to verify "whether the measurement system can reliably reflect the true value", which needs to be judged by combining visual graphics with *quantitative indicators*. The core logic is "to disassemble the sources of variation and verify the stability and resolution of the system".

1. Sixpack Chart (SIXPACK): Visually disassemble variations

　　The six-panel chart reveals the characteristics of the measurement system from different dimensions through six sub-charts and is the "microscope" for GRR analysis.

　　Column chart: Qualitative assessment of the proportion of variation

　　The column chart intuitively shows the magnitudes of GRR% (the ratio of measurement system variation to the total process variation) and P/T% (the ratio of measurement system variation to the specification tolerance). The meaning of these two percentages is "the degree of interference of the measurement system with the results" - the larger the ratio, the more seriously the measurement error masks the true value (for example, GRR% = 50% means that half of the differences in the results come from the measurement system rather than the product itself). The core value of the column chart is "rapid qualitative assessment": if the proportions of the two columns (GRR%, P/T%) exceed 30%, the problems of the measurement system need to be investigated intensively.

　　R Chart (Range Chart): Verify repeatability and resolution

　　The R chart reflects "the range (result difference) of multiple measurements of the same sample by the same operator". It is a core tool for evaluating repeatability (the consistency of the same person using the same measuring instrument to measure the same sample) and needs to meet three rigid criteria:

　　Must be controlled: An out-of-control R chart (e.g., points going beyond the upper and lower control limits) indicates that there are special causes in the measurement process (e.g., the operator temporarily changes the measurement method, the measuring instrument suddenly malfunctions). At this time, the repeatability is unstable, and the subsequent GRR results are completely invalid - an unstable process cannot produce reliable data.

　　Number of stratifications ≥ 5: Stratification refers to the distribution levels of the range (which can be understood as the number of different intervals of the range). If there are stratifications..

　　≤1/4 of the points fall on the 0 line: If more than 1/4 of the points fall on the "0 range line", it indicates that the range of a large number of repeated measurements is 0 (that is, the results of measuring the same sample multiple times are exactly the same). This is not "good repeatability", but rather insufficient resolution (for example, when measuring a difference of 0.1 mm, the measuring tool simply cannot detect it). In this case, even if the number of layers is ≥ 5, the resolution is still insufficient to support reliable measurement.

　　X-BAR Chart (Mean Chart): Evaluate reproducibility and systematic deviation

　　The X-BAR chart shows "the mean distribution of the same batch of samples measured by different operators". The core judgment logic is "whether the sample differences can be distinguished".

　　≥50% of the points are out of bounds: A point being out of bounds means that the operator can identify the real differences between samples. If less than 50% of the points are out of bounds, it indicates that the precision of the measurement system is too large relative to the product variation (for example, if the product has a variation of 10 mm and the measurement system has an error of 5 mm), and the differences between samples are masked by the measurement error, making it impossible to effectively identify good and bad products.

　　The trends of operators are consistent: If the average value of Operator A is generally 2 mm higher than that of Operator B, it indicates that the reproducibility error (the difference when different people measure the same sample with the same measuring tool) is too large. The root cause is usually a "human" issue - the operators have different understandings of the measurement procedure (for example, when measuring the height, A aligns with the top while B aligns with the bottom).

　　BY PART Dot Plot: Locate abnormal samples

　　Group and display all measurement values by "sample". If the distribution range of the measurement values of a certain sample is significantly larger than that of other samples (for example, the measurement values of sample 3 fluctuate between 4 - 8 mm, while those of other samples fluctuate between 5 - 6 mm), the sample itself needs to be prioritized for investigation: whether it is damaged (e.g., a fragile item is deformed by being squeezed), deteriorated (e.g., a liquid sample evaporates), or misnumbered (e.g., the wrong sample is taken).

　　BY OPERATOR Dot map: Locate abnormal operators*

　　Display the measured values grouped by "operators". If the measured values of Operator 1 are generally 1 mm higher than those of other operators, it indicates that the reproducibility error mainly comes from this operator. An in - depth investigation is required: Is the measuring tool not calibrated? Is there an error in the selection of measurement points? Is the training inadequate?

　　Interaction plot: Identify the special association between samples and operators

　　Display the combination of measurement values of "sample × operator". If a certain sample (e.g., sample 5) is measured as 6 mm by operator A and 9 mm by operator B (while the difference between the two for other samples is ≤ 1 mm), it indicates that there is a special interaction between this sample and the operator. This could be due to the special shape of the sample (e.g., there are scratches on the surface, and operator B hit the scratches during measurement), the operator's misunderstanding of the measurement method for this sample, or the confusion of sample numbers.

2. Session Window: Scenario-based verification of quantitative indicators

　　The output of the Session window is the "conclusive indicator" of GRR, which needs to be interpreted in combination with the business scenario rather than mechanically applying the standards.

　　*GRR% and P/T%*

　　- GRR% (Proportion of measurement system variation to total process variation):<10% is considered "acceptable" (the measurement system has minimal interference with the process); for 10% - 30%, it depends on the critical-to-quality characteristics (ctq) — if it is a ctq (such as the dimensions of medical equipment), an error of 10% - 30% may lead to product non - compliance; if it is a non - ctq (such as the secondary dimensions of packaging), it can be monitored and used.

　　-P/T% (Proportion of measurement system variation to specification tolerance):<１0%为“可接受”；>30% is unacceptable (the measurement system cannot distinguish between acceptable and unacceptable).

　　*DI/NDC* (Discrimination Index/Minimum Number of Distinguishable Categories*):

　　NDC

II. Nine common misunderstandings in the application of GRR

　　The effectiveness of GRR depends on the strict control of the experimental process. The following misunderstandings will directly lead to invalid results or mislead decision-making:

1. Sampling was not random: Sample variation cannot represent the process

　　If samples with large differences are deliberately selected (e.g., only samples of 1 mm and 10 mm are selected), the resolution of the measurement system will be overestimated; if samples with small differences are selected (e.g., only samples around 5 mm are selected), it will be underestimated. The correct approach is to use the *random number table* to draw samples from the process that cover the "minimum - maximum" range, ensuring that the sample variation can represent the total process variation - for example, draw 1 sample per hour from the production line for a total of 20 consecutive samples.

2. Overly trust numbers and ignore the measurement process

　　The measurement system is a combination of "people + machines + methods + environment + measurements", rather than just measuring tools. For example, a GRR% of 15% may seem "acceptable", but in reality, it's because the operator failed to calibrate the measuring tools according to the regulations. If one doesn't understand the process, they may mistakenly think that the measurement system is fine and overlook the need for operator training. One must conduct a preliminary investigation: Do the operators have the required certificates? Is the measurement method written into the SOP? Does the environment (temperature, humidity) affect the measurement?

3. Without blind testing: The operator's memory undermines the objectivity of the data

　　If the operator knows the sample number or the true value, they will unconsciously adjust the measurement value (for example, if they know the sample is "qualified", they will adjust the marginal value towards the qualified range). Blind testing refers to "the operator does not know the information of the sample being tested" - which can be achieved by using "coded samples" (for example, covering the sample number with a sticker and removing it after the measurement) to ensure that the measurement behavior is not affected by subjective expectations.

4. The experimental plan is not randomized: The true error cannot be detected

　　If measurements are taken in the order of "Sample 1 → Sample 2 → … → Sample 10", a "sequence effect" will occur (the operator will measure faster and the error will increase*). If measurements are taken in the order of "Operator 1 measures all samples → Operator 2 measures all samples", a "sample aging effect" will occur (the samples will deteriorate due to long exposure time). Randomization needs to cover "sample sequence, operator sequence, and measurement rounds" – use software to generate a random sequence (such as the RANDBETWEEN function in Excel) to ensure that each combination is random.

5. Unnecessary GRR: Wasting resources

　　Not all measurements require a GRR - a judgment is needed: *Will measurement errors affect decision - making?* For example, when time is measured in "days" and a stopwatch (with an accuracy of 0.1 seconds) is used to measure days, the GRR result will definitely be "excellent", but it is completely unnecessary (the measurement accuracy is far higher than the requirement). The correct approach is to first evaluate "whether the measurement resolution matches the requirement" - if the requirement is to "distinguish days", the accuracy of the measurement system only needs to reach the "hour" level.

6. The last digit was not estimated: insufficient resolution

　　The minimum scale of the measuring tool is 1 mm, and it is necessary to estimate the reading to 0.1 mm during measurement (e.g., 1.3 mm). If you directly read 1 mm, the measurement results of 1.1 mm and 1.2 mm will both be 1 mm, with a range of 0. In this case, the R chart has few layers and insufficient resolution. Estimating the last digit is the *key to improving resolution*. It can capture tiny differences and avoid "false repeatability".

7. The basic characteristics of MSA have not been verified. Conduct GRR first

　　The core characteristics of MSA include: stability, bias, linearity, and resolution. GRR is the "last step". If the stability is poor (e.g., the measuring instrument drifts by 0.5 mm every day) or the resolution is insufficient (e.g., it cannot detect a difference of 0.1 mm), conducting GRR first will result in invalid results - if there are problems with the basic characteristics, GRR cannot reflect the true variation of the measurement system. The correct sequence is: verify stability → bias → linearity → resolution first, and then conduct GRR.

8. Data collection out of control: Incorrect order leads to wrong results

　　If the data of Sample 1 measured by Operator 1 is recorded as that of Operator 2, or the data of Sample 3 is recorded as that of Sample 4, it will mislead the analysis (for example, the BY PART dot plot shows "large discrepancy in Sample 3", but actually it is due to a data recording error). Data collection needs to be strictly managed: use a spreadsheet to automatically link "sample number + operator + measured value", or have an immediate signature confirmation after measuring each sample to avoid manual misplacement.

9. Only look at the numbers and ignore the graphs: overlook the underlying reasons

　　The GRR% of the Session window is 15%, which may seem "acceptable". However, the X-BAR chart shows that the average value of Operator A is generally 1 mm higher. If only looking at the numbers, one can conclude that "GRR is acceptable". But by combining with the graphs, it can be found that "there is a large difference among operators" and improvement is needed. Graphs are *key tools for revealing the root causes of problems*: Fewer layers in the R chart → insufficient resolution; Large differences in the BY OPERATOR dot plot → operator problems; Intersections in the interaction plot → special associations between samples and operators. Looking only at the numbers will lead to misleading conclusions. It is necessary to conduct a combined analysis of "numbers + graphs".

The essence of GRR: The management art of the experimental process

　　GRR is not about "running a software and generating a report", but rather *a comprehensive verification of the measurement process* - from sampling to data collection, from graphical analysis to quantitative indicators, every step needs to be strictly controlled. Any oversight in any link (such as non - random sampling or incorrect data entry) can lead to invalid results or even mislead decision - making. The goal of GRR is not "to obtain acceptable results", but "to find the pain points of the measurement system and ensure that it can reliably support business decisions".