Analysis of the application conditions and suitable scenarios for various statistical analysis methods such as Cpk and Cpm

  

1. Conditions for using Cpk (Process Capability Index)

  The application of the Cpk index requires, first and foremost, that the manufacturing process must be in a state of statistical control, meaning the process is stable and predictable. This implies that there are only common - cause variations in the process and no special - cause variations. Usually, control charts (such as X - R charts and X - s charts) are used to determine whether the process is stable.Secondly, the quality characteristic data of the process output must approximately follow a normal distribution. This is because the calculation and interpretation of Cpk largely rely on the assumption of a normal distribution. Normality allows us to use the mean and standard deviation to accurately describe the distribution characteristics of the process and evaluate the process's ability to meet specification requirements based on this. If the data shows a non - normal distribution, directly calculating Cpk may lead to a misjudgment of the process capability. In this case, appropriate transformations (such as Box - Cox transformation, logarithmic transformation, etc.) need to be applied to the non - normal data to try to convert it into an approximately normal distribution before calculating Cpk, or non - parametric process capability analysis methods can be considered.In addition, the calculation and interpretation of Cpk implicitly assume that the target value (Target) of the process coincides with the specification center (Spec Center). When the target value does not match the specification center, Cpk may not accurately reflect the degree of deviation of the process from the target value. In this case, indices such as Cpm may be more applicable.

  

2. Conditions for using Cpm (Process Capability Index, considering the target value)

  Cpm, often referred to as the "second-generation process capability index", is mainly applied in scenarios where there is a clear target value for the process characteristic and this target value is not exactly at the center of the specification tolerance. Different from Cpk, which mainly focuses on whether the process output falls within the specification limits, Cpm emphasizes the degree of concentration of the process output around the target value. By taking into account the deviation between the process mean and the target value, it can more comprehensively reflect the process's ability to meet customer expectations (i.e., approach the target value), and is particularly suitable for scenarios where the target value deviates from the specification center and deviation from the target leads to quality losses (as described by the Taguchi loss function). Therefore, the precondition for using Cpm clearly points to the assessment of process capability for processes with a specific target value that is not at the center of the specification.

  

3. Prerequisites for continuous measurement system analysis (MSA)

  The effectiveness of measurement system analysis for continuous data (such as bias, linearity, stability, repeatability, and reproducibility studies) is based on two key prerequisites. First, the instruments or gauges used for measurement must undergo regular and qualified calibration. This ensures the accuracy of the measurement equipment itself, meaning that the difference between its indicated value and the true value of the measured quantity is within an acceptable range, which is the basis for obtaining reliable measurement data. Second, the resolution of the measurement system needs to comply with the "1/10 rule". This rule states that the minimum distinguishable unit of the measurement system (usually the smallest scale of the instrument or 1/2 of it) should be able to resolve at least 1/10 of the smaller of the process variation range (6σ) or the specification tolerance (USL - LSL). Only by meeting this condition can the measurement system effectively distinguish the actual differences in product characteristics during the process, avoiding data information loss or misjudgment due to insufficient instrument resolution.

  

4. Prerequisites for Analysis of Variance (ANOVA)

  Analysis of variance (ANOVA) is a statistical method used to compare whether there are significant differences among the means of multiple populations. Its application depends on several basic assumptions. I. Each sample data should come from a population with a normal distribution. That is, for the observations in each treatment group or at each level, the underlying population distribution should approximately follow a normal distribution. This assumption can be verified through normality tests of residuals (such as the Shapiro - Wilk test, the Kolmogorov - Smirnov test, or the Q - Q plot). II. The population variances of the comparison groups should be homogeneous (equal variances). That is, the population variances under different treatment groups or levels should be approximately equal. Commonly used test methods include the Levene test, the Bartlett test, etc. If the variances are heterogeneous, correction methods such as Welch ANOVA or non - parametric alternatives may need to be adopted. In addition, for certain specific types of experimental designs, such as non - replicated experiments (for example, two - way ANOVA with only one observation in each cell), in the model setting, the variances of certain effects (usually interaction effects) are assumed to be zero or cannot be estimated. In this case, ANOVA can still be applied under specific conditions, but its interpretation needs to be cautious, and it is usually used for preliminary screening or simplified analysis in specific situations.

  

5. Applicable conditions and interpretation of correlation analysis (Correlation)

  Correlation analysis, such as the Pearson correlation coefficient, is mainly applicable to evaluating the strength and direction of the linear relationship between two continuous variables. Its calculation is based on the degree of linear covariance between variables. However, it must be clearly stated that when the correlation coefficient value is low (close to 0), it only indicates that there is no significant linear correlation between the variables, and it cannot be asserted that they are completely "uncorrelated". There may be a strong non - linear relationship between variables (such as quadratic, exponential, logarithmic relationships, etc.), and in this case, the linear correlation coefficient may not be able to capture this association. In addition, for data showing a non - linear relationship, if it is transformed into a linear relationship form through appropriate mathematical transformations (such as introducing higher - order terms, logarithmic terms, etc. as new variables), then the correlation analysis method (or linear regression based on it) can still be used to analyze and model this transformed "linear" relationship. However, it should be noted that what is interpreted at this time is the linear relationship between the transformed variables, rather than the direct linear relationship of the original variables.

  

6. Conditions for using the Mann-Whitney U test (Wilcoxon rank-sum test)

  The Mann-Whitney U test is a non-parametric test method used to compare whether there are differences in the distribution positions of the populations from which two independent samples are drawn. One of its key preconditions is that the shapes of the population distributions represented by the two samples should be approximately equal. Here, "equal shape" does not mean that the distributions are exactly the same, but rather that the distribution of one population can be obtained by translating the distribution of the other population (i.e., changing the location parameter), and they have similar shape characteristics such as dispersion and skewness. Simply put, the two distributions are consistent in form, but may differ in the central position (e.g., the median). If the shapes of the distributions of the two populations differ significantly (for example, one is significantly skewed and the other is approximately normal), the Mann-Whitney U test may produce misleading results even if the positions are the same. Therefore, before applying this test, it is necessary to roughly evaluate the similarity of the distribution shapes of the two samples through graphical methods (such as box plots and histograms).

  

7. Conditions for using the Friedman test

  The Friedman test is a non-parametric rank-sum test, which is mainly used to compare whether there are differences in the population distribution positions of three or more related samples (or randomized block designs). Similar to the Mann-Whitney U test, the Friedman test also requires that the population distribution shapes from which each treatment group or sample is drawn are roughly equal. Its core idea is to rank the data within blocks and determine whether there is a treatment effect by comparing the ranked ranks. Here, "equal distribution shapes" also means that the population distributions of different treatment groups can be obtained from each other through translation, that is, their shape characteristics (such as variance and skewness) are similar, and the differences only lie in the location parameters. This assumption ensures that the comparison of ranks can effectively reflect the potential location differences rather than the confusion caused by different distribution shapes. If the distribution shapes differ greatly, the effectiveness of the Friedman test will be affected.